Skip to main content
Philosophical Transactions of the Royal Society B: Biological Sciences logoLink to Philosophical Transactions of the Royal Society B: Biological Sciences
. 2012 Apr 5;367(1591):896–905. doi: 10.1098/rstb.2011.0254

Multistability in perception: binding sensory modalities, an overview

Jean-Luc Schwartz 1, Nicolas Grimault 2, Jean-Michel Hupé 3, Brian C J Moore 4,*, Daniel Pressnitzer 5,6
PMCID: PMC3282306  PMID: 22371612

Abstract

This special issue presents research concerning multistable perception in different sensory modalities. Multistability occurs when a single physical stimulus produces alternations between different subjective percepts. Multistability was first described for vision, where it occurs, for example, when different stimuli are presented to the two eyes or for certain ambiguous figures. It has since been described for other sensory modalities, including audition, touch and olfaction. The key features of multistability are: (i) stimuli have more than one plausible perceptual organization; (ii) these organizations are not compatible with each other. We argue here that most if not all cases of multistability are based on competition in selecting and binding stimulus information. Binding refers to the process whereby the different attributes of objects in the environment, as represented in the sensory array, are bound together within our perceptual systems, to provide a coherent interpretation of the world around us. We argue that multistability can be used as a method for studying binding processes within and across sensory modalities. We emphasize this theme while presenting an outline of the papers in this issue. We end with some thoughts about open directions and avenues for further research.

Keywords: multistability, multisensory, binding, perceptual organization

1. Introduction

Multistability occurs when a single physical stimulus produces alternations between different subjective percepts. For more than two centuries, it has been a major conceptual and experimental tool for investigating perceptual awareness in vision. This special issue of the Philosophical Transactions of the Royal Society B presents recent advances in the study of multistability not only for vision but also for audition and speech, with a combination of psychophysical, physiological and modelling approaches.

This introduction is not intended as a review of multistability, as many excellent reviews are already available [17] and more reviews are available in the present issue [810]. Rather, the next section presents the motivation for extending the study of visual multistability to other modalities. The third section describes how the papers presented in the issue contribute to this relatively recent field of research. Finally, some open questions arising from the current state of the field are listed.

Multistability provides a window into the mind, since it gives a natural and unique dissociation between objective properties of the stimulus and subjective sensations: the stimulus properties are constant, whereas sensations change in a dynamic fashion. The study of multistability in several perceptual modalities has the potential to provide a powerful framework for understanding how the different attributes of objects in the environment are bound together, within our perceptual systems, to provide a coherent interpretation of the world around us. This process is known as binding, and it occurs both within and across sensory modalities. As demonstrated by the rich collection of papers in this issue, the expected benefits of the approach are broad, from fundamental theories of the psychology of perception to their underlying neural mechanisms, and from computational neuroscience to neuro-genetics and the role of spontaneous brain activity in perceptual decision-making.

2. Multistability for different sensory modalities

(a). Extending the study of multistability and binding from vision to other senses

Historically, multistability was considered as a visual phenomenon. O'Shea [11] ascribes the first published report of visual multistability to Dutour [12]. This report describes what is now termed ‘binocular rivalry’ (figure 1). Dutour observed that, when presenting a disc of blue taffeta to one eye and a disc of yellow taffeta to the other eye, he did not see a mixture of the blue and yellow colours. Rather, he was ‘unable to detect even the least tint of green’. His conscious experience alternated between blue and yellow. The percept seemed to be dominated by the signal from one of the two eyes at any one time and the eye that was dominant alternated in apparently random fashion. This illustrates the basic characteristic of multistable perception: a static physical stimulus may induce the subjective experience of a percept that is stable over short times, but changes from time to time.

Figure 1.

Figure 1.

Illustration of binocular rivalry. Different images are presented to the left and right eyes (‘Stimulus’). The subject experiences switches from perception of one image (face) to the other (house) (‘Percept’). Note that ‘mixed percepts’ (composed of parts of both images) are also experienced (‘piecemeal rivalry’). The phenomenology of binocular rivalry can be experienced with monocular rivalry (see demonstration under the Wikipedia entry).

Multistability was also described for ambiguous figures involving depth interpretation (like the two-dimensional outline of a cube, first described by Necker [13]; figure 2a), figure/ground organization (like the Rubin's vase; figure 2b) or motion perception (as in ambiguous motion displays; figure 3). Multistability in binocular rivalry involves perceptual competition between two images, while the multistable perception of ambiguous figures involves competition between interpretations of a single image. Accordingly, the two phenomena have been studied independently during the past two centuries. According to Leopold [14], Walker [15] was the first to suggest that ‘a parallel may exist between binocular rivalry and the perceptual reversal of ambiguous figures’. Such a parallel was popularized by Leopold & Logothetis [2]. Indeed, these apparently disparate stimuli all have some crucial features in common: (i) they have more than one plausible perceptual organization; and (ii) these organizations are not compatible with each other. The perception of such stimuli also shares many similarities: (i) only one interpretation at a time is experienced by observers (and not an ‘average’ interpretation); (ii) ‘flips’ in perceived organization occur with prolonged viewing; and (iii) the statistical properties of the multistable alternations are similar across different types of stimuli; they show similar distributions of dominance phases (which percept is dominant), and the distributions are unimodal and asymmetric [2].

Figure 2.

Figure 2.

Illustration of ambiguous images. The figure in (a) may be perceived as a cube with either the lower left face or the upper-right face in front. The figure in (b) may be perceived as either a vase or two faces in silhouette. The subject experiences switches from perception of one interpretation to perception of the other.

Figure 3.

Figure 3.

Illustration of moving plaids. Two series of oblique lines (gratings) with orthogonal directions of movements are superimposed (‘Stimulus’). The subject may perceive the image (‘Percept’) as two gratings moving in opposite directions, or as a single cross-hatched object moving upwards (indicated by the arrow). The percept alternates between the two interpretations.

Until recently, studies of other perceptual modalities did not capitalize on the large body of accumulated knowledge on visual multistability. However, ambiguous stimuli that gave rise to perceptual alternations had been described for other modalities. For example, when a word or short phrase is presented repeatedly (e.g. ‘life life life’), the words or phonemes that are perceived change over time (the ‘verbal transformation effect’ [16]); for this example, ‘fly fly fly’ might be perceived after a while (figure 4). Also, when a rapid sequence of tones with different frequencies is presented, the tones may be perceived as coming from a single source, as if played by one instrument (called coherence or fusion), or as multiple sources, as if played by more than one instrument (called stream segregation or fission), and the percept may ‘flip’ between the two (figure 5, [17]; see Moore & Gockel [8] for a review). It had been noted that the verbal transformation effect provided ‘an auditory analogue of the visual reversible figure’ and that auditory stream segregation presented a ‘striking parallel’ with visual apparent motion ([18], p. 21; see also [19]), but the theoretical and experimental tools used to investigate visual multistability were not applied to those stimuli until recently.

Figure 4.

Figure 4.

Illustration of the verbal transformation effect. A word is presented repeatedly (‘Stimulus’, here ‘life life life’). After some time, the percept may change (‘Percept’), reflecting a different perceptual organization of the sound segments (e.g. ‘fly fly fly … ’), and then may alternate between the two organizations (or other organizations may occur).

Figure 5.

Figure 5.

Schematic spectrogram of stimuli used to study auditory streaming. A succession of tones with two different frequencies, A and B, is presented (‘Stimulus’). The subject may perceive either a single stream with a ‘gallop’ rhythm (ABA–ABA–ABA … , illustrated by the green lines connecting A and B in ‘Percept’) or as two regular streams (A–A–A and B—B—B, illustrated by the blue line connecting the A tones and the red line connecting the B tones). The percept can alternate between the two interpretations.

Evidence for multistability has been presented for modalities other than vision and audition. Carter et al. [20] extended the dynamic dot displays previously used to study visual multistability to touch (figure 6). Zhou & Chen [21] extended binocular rivalry to olfaction and reported alternating olfactory percepts when different odorants were presented to the two nostrils (binaral rivalry; figure 7), as well as when presented to the same nostril (mononaral rivalry, supposedly analogous to monocular rivalry). Illusory motion reversals were also reported in proprioception; biceps vibration induces illusory forearm extension, and it was proposed that this phenomenon could be an instance of multistability [22]. To our knowledge, no instance of multistability has been reported for taste, but the paradigms developed for olfaction could probably be adapted to taste. In the motor domain, experiments on bimanual rhythmic coordination patterns in response to visual input revealed the presence of a few stable or preferred coordination patterns, suggesting that multistability is also a property of the organization of motor commands [23,24], an argument developed by Kelso [25].

Figure 6.

Figure 6.

Illustration of motion quartets (visual and tactile). When a subject is presented with two successive visual images (‘Visual loop’) with two black dots moving from one configuration (on one diagonal) to another (on the inverse diagonal), the subject may perceive either a horizontal or a vertical displacement of the two black dots, and switch from one percept to the other (‘Percept’). The same bistability illusion may be obtained with tactile stimuli (‘Tactile loop’), using motion touch zones on the thumb.

Figure 7.

Figure 7.

Illustration of binaral rivalry (olfactory). When a subject is presented with two different odours (‘Stimulus’), one in each nostril, perception may switch from one odour to the other (‘Percept’).

The main rationale for this special issue is that detailed comparisons of the phenomenology of multistability across modalities are now emerging, mostly between vision and audition. Auditory streaming was studied independently by several groups for this purpose [2628]. For all studies, it was found that the distributions of the random durations of switches in perceptual organization were very similar to those observed for visual multistability. In fact, when measured using the same observers, the dynamics of auditory and visual switching revealed almost identical patterns [26]. Interestingly, bistability for streaming seemed to be the rule rather than the exception, as it could be observed over a surprisingly broad range of stimulus parameters [27,28]. Similar dynamic properties have been observed for the verbal transformation effect [29,30]. Auditory multistability has also been reported with very different stimuli, using rhythmic cues [31].

Leopold & Logothetis [2] proposed that similar mechanisms underlie binocular rivalry and ambiguous figures (a proposal that is still debated; see, for example, Kleinschmidt et al. [32]). We propose an extension of this idea and suggest that some common principles might be at work in perceptual organization for different sensory modalities. ‘Common principles’ could be understood in two ways. Leopold & Logothetis [2] proposed that perceptual decision-making in multistable perception is triggered by some central, supramodal mechanism. An alternative model is based on the idea that there is more distributed competition [6]. When considering auditory bistability, Pressnitzer & Hupé [26] suggested that functionally similar mechanisms were implemented independently across sensory modalities (see also Hupé et al. [33]). According to this view, the specific mechanisms and implementations are likely to differ from one modality to the other, depending on the nature of the physical information and the structure of the sensory inputs. But whatever the modality, the perceptual system must organize the sensory data into a coherent interpretation of the outside world that can be used to guide behaviour. Importantly, when there is more than one plausible interpretation of the sensory evidence, the same phenomenology is observed for all modalities: multistable perception arises, or, in other words, a kind of ‘stable instability’ seems to be the rule, as Zeki [34] put it for vision.

Extending the study of multistability to sensory modalities other than vision is of interest for at least three reasons. Firstly, it provides a method for studying the neural bases of perceptual organization in those modalities (for hearing, see [30,3540]). Secondly, the intrinsic characteristics of each sensory modality may extend the scope of the original visual multistability paradigm in important ways. For instance, in audition, the stimuli are by nature time-varying. Competition between perceptual organizations is thus not limited to space or motion direction, but must also involve the time dimension [41]. The interactions between perceptual and motor processes are also quite different between vision (eye movements, [42,43]), touch [20] and speech (e.g. perceptuo-motor theory of speech, [44]). The influence of motor processes can also be more easily controlled in audition, as eye movements are less likely to produce confounding effects. Thirdly, and perhaps most importantly, the extension from vision to other modalities strengthens the hypothesis that multistability is a general property of perceptual systems. Therefore, current research using multistability to probe cognitive processes such as attention, decision-making and consciousness in the visual modality gains further relevance for other modalities.

(b). Binding stimulus information within modalities

Among the great variety of stimuli that evoke multistable perception, it is striking that most are based on competition in selecting and binding stimulus information. An obvious case is provided by binocular rivalry [45]. In this case, multistability involves a perceptual selection between subsets of information from two incompatible sources, one for each eye. The resulting perceived image is sometimes made out of local patches from each of the two eyes, a phenomenon known as piecemeal rivalry. However, most of the time, binding occurs within the image from one eye, and the predominant percept is based on the entire image from that eye. The role of binding is most dramatically illustrated by ‘interocular grouping’ rivalry [46,47] for which scrambled images are presented to each eye (figure 8). Patches of images belonging to a face and a house are presented to one eye, while the complementary patches are presented to the other eye. If multistable competition were purely eye-based, observers would experience alternations of scrambled images. This is not what occurs: instead, binding of elements that form a coherent image occurs, and the result of the multistable competition is usually the percept of either a face or a house. Thus, information from the two eyes is selected and combined to form meaningful objects.

Figure 8.

Figure 8.

Stimuli producing visual interocular grouping. Different parts of two images are presented to the left and the right eyes (‘Stimulus’). The subject perceptually reconstructs the original images and experiences switches from perception of one figure to perception of the other (‘Percept’).

The case of ambiguous images (figure 2) also involves selection and binding. In classical examples such as Rubin's vase–face image, selection involves deciding which parts of the image are assigned to the foreground and which are assigned to the background; the components making up the foreground should be bound together and segregated from components of the background. In other cases, such as the Necker cube, binding may occur within the foreground to determine which segments form the front face of the cube. Dynamic displays also involve binding, both in space and time. The ‘moving plaid’ stimulus (figure 3) is perceived as one object moving in one direction or two superimposed objects moving in opposite directions, depending on whether the grating components of the plaid are bound or segmented. These represent two very different bindings of components within and across the moving images.

Binding also plays an important role in multistability for audition and speech. For verbal transformations, sound segments are bound in different orders or with different segment boundaries to generate new percepts (figure 4). In the case of auditory streaming, successive sounds are either bound into one stream (heard as if coming from one source), or bound into two streams (heard as if coming from two sources) (figure 5).

We argue here that it is revealing that multistability always involves perceptual binding, especially if one considers that the most common situation leading to perceptual ambiguity is not based on binding and selection. Consider ‘boundary’ stimuli, which have features close to a boundary between two perceptual categories along a perceptual continuum. For example, in vision, one boundary stimulus is a colour between blue and green. Such a stimulus would appear to possess the correct properties for being ambiguous: one is not certain whether the colour is green or blue. However, to the best of our knowledge, there are no experimental data showing that multistability can occur for such stimuli. Thus, boundary stimuli that do not involve ambiguous binding do not seem to produce multistability. This issue has seldom been considered (but see the discussion of the possible epistemological distinction between multistability and ambiguity in Egré [48]) and remains to be addressed experimentally.

Overall, the view that emerges from studies of different modalities is that multistability reflects processes of competition between different perceptual organizations of the same scene, where the binding of sensory information is always involved.

(c). Binding across perceptual modalities

Since multistability involves binding in various modalities, what kind of multistable phenomena might emerge when more than one modality is simultaneously involved? This raises the question of the level at which multisensory interactions happen, relative to binding within modalities, and more generally, leads to the possibility that multistability could be used as a tool for studying multisensory perceptual organization.

Let us begin by considering the possibility that multistable effects in one modality can be modified by stimuli in another modality. It has been shown that binocular rivalry can be influenced by sounds congruent with one or the other image, but only when the visual stimulus is consciously perceived, not when it is suppressed from awareness [49,50]. In the same vein, Munhall et al. [51] have shown a McGurk effect (the identity of a speech sound being influenced by visual information from the face of the talker) with moving lips on an ambiguous face/vase stimulus only when it was perceived as a face. These results suggest that audio-visual integration in these situations happens only after binding is resolved within each modality. Moreover, stimuli in one modality may influence the bistable perception of ambiguous stimuli in a second modality, but only when subjects pay attention to the stimulus in the first modality [52]. Hupé et al. [33] presented multistable stimuli in the auditory and visual modalities, using auditory stimuli that led to streaming and visual stimuli that led to ambiguous motion. They reported large cross-modal influences, whose magnitude depended on audio-visual congruence; perceptual switches in one modality could modulate switches in the other modality. However, the timing of the modulation was quite sluggish, suggesting that it was mediated by contextual processes and hence that perceptual organization initially occurred separately in each modality. However, Takahashi & Watanabe [53] showed that changes in auditory stimuli that the subjects were not aware of had weak (and delayed) effects on the dynamics of visual apparent motion. Also, the (supposedly implicit) semantic content of auditory stimuli influenced the balance of percepts in binocular rivalry by a few per cent [54]. This leaves open the possibility that some cross-modal interactions happen before the completion of perceptual organization within each modality. Klink et al. [9] further discuss cross-modal effects as a form of ‘contextual information’ used to disambiguate the sensory input.

These (partly conflicting) results raise the question of how multisensory perceptual organization occurs. The issue here is to know at what level perceptual objects are best defined: are perceptual objects constructed independently for each modality before interactions occur at a relatively high level of processing, or can a common object representation be formed for different modalities to mediate interactions at an early stage of processing? This question is well exemplified by the contrast between two positions: Kubovy & Van Valkenburg [55] argue that auditory objects differ from visual objects because of the intrinsic structure of auditory and visual processing, whereas the classical assumption in speech perception is that speech objects are multisensory and hence that there is a common representational format for auditory, visual and motor speech at some level of processing [56,57].

Assuming that multistability is a result of the competition between perceptual organizations, the existence of audio-visual objects would be indicated by simultaneous switches of auditory and visual organizations under conditions of multisensory multistability. The finding of Hupé et al. [33] that audio-visual capture for apparent motion did not lead to simultaneous switches in the two modalities led them to argue against the existence of a specific ‘audio-visual apparent motion’ object (see also Kubovy & Yu [58]). However, synchronous lights and sounds are not necessarily perceived as a unified perceptual object, while the speech percept routinely depends on combining information from the auditory and visual modalities [59]. This is what led Sato et al. [60] to claim that ‘multistable speech perception is indeed a multisensory effect’. Their set of experiments involved for the first time audio-visual verbal transformations, that is, multistability in speech with multisensory inputs. They showed that the visual input modifies the perceptual stability of the auditory input and that switches applied to the visual input could largely drive the audio-visual percept by inducing rather synchronous switches in perception. Altogether, their results demonstrate the capacity of visual information to control the multistable perception of speech in its phonetic content and temporal course. Hence, the two modalities seem to be bound together in the multistability phenomenon in this case. This suggests that the multistability paradigm may provide an effective tool for determining if ‘multisensory’ objects exist for speech [61].

3. Contribution of the papers from this issue

This issue covers most of the facets of multistability and binding that we have just described, in audition and vision. For all of the contributions, a useful distinction to keep in mind is between what competes and how competition takes place [33]. What competes is the content of sensory experience, the components of the stimulus that have to be bound into perceptual objects, corresponding to the ‘neural events associated with the representation of a given perceptual state’ [62]. How competition occurs depends on the neural processes ‘that are responsible for switches between alternative perceptual states’ [62]. The contents of perceptual experience, what competes, are obviously different for visual and auditory multistability. The question highlighted in this introduction is whether the mechanisms of switching share some principles and/or neural processes in vision and audition. The question is thus related to how competition takes place. Phrased differently, the question is ‘what determines the change in perceptual organization after the observer has been perceiving the stimulus in a particular way’ [63].

While the what and how questions are independent in principle, empirical evidence does not always provide a basis for distinguishing them unequivocally. For example, the effect of intention on the dynamics of bistable perception may be interpreted as revealing the mechanisms of switching, or as ‘simply’ affecting the content of one or the other representation. The same can be said for the effects of attention, adaptation or even ‘noise’. It is therefore paramount to know precisely what factors influence competing percepts in audition and vision before being able to address the question of the switching mechanisms.

The two papers following this introduction mostly focus on the question of what competes in individual sensory modalities. One deals with auditory streaming, which is a topic of several contributions in this issue. As we have seen, auditory streaming has recently been a key paradigm for building bridges between studies of visual and auditory multistability. Moore & Gockel [8] provide an up-to-date overview of the streaming paradigm. Importantly, they provide a comprehensive survey of the many kinds of acoustic cues and other experimental parameters (like attention, time and sudden changes in the stimulus sequence) that can affect auditory binding in the streaming paradigm. In the next contribution, Klink et al. [9] review the many factors influencing multistability, mostly for vision but also including cross-modal influences. They consider multistability as an optimal paradigm for studying how perceptual systems can produce context-driven inference and decisions. They consider four major kinds of context (temporal, spatial, multisensory, and associated with the subject's internal state).

The next three contributions explore the what question by comparing multistability for audition and vision. Hupé & Pressnitzer [64] present experimental data comparing the initial phase of perceptual organization for visual plaids and auditory streaming. This phase exhibits a peculiar pattern: it is longer than later phases and is biased towards one object for both modalities. In vision, they show that it is the tristable nature of plaid perception that produces the longer percept, whereas the evidence is less clear cut for audition. In fact, tristability for auditory streaming remains to be shown. The bias towards integration is discussed in terms of local versus global organization cues. Kubovy & Yu [58] re-examine the similarities and differences between the requirements of perceptual scene analysis in audition and vision. They suggest that cross-modal causality is a necessary prerequisite for efficient cross-modal binding, but they express doubts that such cross-modal binding could result in multisensory multistability, because of the intrinsic difference in nature between perceptual objects in different modalities (here, audition and vision). However, they consider the hypothesis that speech could be an exception to this general view. Directly related to this conjecture, Basirat et al. [61] demonstrate how the verbal transformation effect can be a valuable tool for studying the perceptual organization of speech, returning to old but key questions in speech perception, such as the role of perceptuo-motor interactions and the nature of the representation units. They claim that since the objects of speech perception are intrinsically multisensory, they may lead to multisensory multistability.

The last four contributions examine the switching mechanisms (how competition takes place) from the perspective of its neural bases in audition [65] and vision [32], from a theoretical perspective in audition [10], or in terms of computational processes and dynamic systems, whatever the sensory modality [25].

Kashino & Kondo [65] compare the neural bases of switching for two multistability paradigms in audition, auditory streaming and the verbal transformation effect. Functional MRI data acquired using similar paradigms for both phenomena allow them to make a direct comparison, and reveal the role of motor-based processes in multistability for both non-speech and speech sounds, in addition to the involvement of sensory regions dedicated to audition (auditory cortex and thalamus). Moreover, activity in the motor structures is shown to be correlated with individual switching rates. This variability may be a result of genetically determined differences in the catecholaminergic system. Kleinschmidt et al. [32] review the variations of neural activity that have been observed in relation to visual multistability, mostly with ambiguous figures. Importantly, they use the association between neural fluctuations and switches in perceptual states to decide ‘where does brain activity reflect perceptual dominance’ (our what question) ‘and where does brain activity reflect perceptual alternations’ (our how question), and they discuss to what extent studies of the relative timing of neural and perceptual events and studies using transcranial magnetic stimulation can resolve this issue. Like Kashino & Kondo, Kleinschmidt et al. discuss individual differences in brain state fluctuations, and relate them to neuroanatomical substrates, suggesting that there could be a genetic basis for differences between subjects in how they behave in multistability paradigms, focusing on a causal role of parietal regions in perceptual inference.

Winkler et al. [10] present a theoretical framework for auditory streaming based on the idea of ‘predictive coding’. In this approach, the goal of perceptual organization is to find regularities in the incoming sensory information, in order to predict the pattern of future sounds. Interestingly, the competition in their framework is not between sensory representations, but rather between abstract rules that bind successive sounds together. They also suggest a new computational approach for understanding the competition between those rules, related to the how question.

Finally, Kelso [25] describes how multistable perception can be considered as part of a wide range of multistable phenomena in living systems. He relates these to adaptability and the ability to dynamically define self-organizing functional grouping of individual elements to optimize specific behaviours. In this sense, he switches from the how to a possible why question, in which multistability appears as one component of a global process that allows a creative organism to adapt and invent solutions to deal with a highly complex environment.

4. Open questions and directions

(a). Extending the range of multistable phenomena in various modalities

We have seen how the study of multistability has been extended from vision to other modalities. However, the range of multistable phenomena in these modalities is still rather limited compared with the rich set available to visual scientists. This range will probably be extended in the future. For instance, until now, auditory multistability has involved stimuli that unfold over time, such as rapid sound sequences, but there has been no published report of an effect of ‘binaural rivalry’ comparable to binocular rivalry. Deutsch [66] discovered a kind of ‘interaural grouping’ illusion, but it is unclear whether this produces multistable perception. The difference between vision and hearing could occur because totally different images in the two eyes are highly unlikely and thus incompatible, whereas sounds often differ somewhat at the two ears owing to head-shadow effects, and the sounds at the two ears can be very different when the sound sources are very close to the ears. This usually results in the perception of multiple sound sources at different positions in space, rather than rivalry between perceived sources. A possible auditory analogue of interocular grouping rivalry could occur under conditions where perception of a sound depends on combining information across the two ears, for example, when part of a speech sound (e.g. the first and second formants) is presented to one ear and the remainder (e.g. the third formant) to the other ear [67]; see also the further experiments on ‘duplex perception’ conducted by Liberman et al. [68].

Analogues of the verbal transformation effect might also be found for non-speech sounds or modalities other than hearing. An aspect of verbal transformations is that they involve different ways of sorting sounds into segments (the ‘segmentation’ problem, which is crucial for speech perception). Similar transformations might occur for non-speech sounds or visual or even tactile stimuli, provided that there are multiple possible ways of segmentation, each of which gives rise to a plausible perceptual interpretation of the input. We could also ask whether multistability in touch extends beyond motion perception, by looking for a touch analogue of the auditory streaming paradigm.

(b). Is conflicting binding required for multistability?

As discussed earlier, boundary stimuli can have more than one interpretation, but such stimuli do not seem to trigger multistable perception, perhaps because they do not involve conflicting binding cues. This conjecture requires further experimental testing. It seems likely that a given boundary stimulus, such as an image containing a colour between green and blue or a synthetic speech sound between ‘ba’ and ‘da’, could lead to responses that vary over time. But this may reflect the ambiguous nature of what is perceived rather than reflecting a flip from one percept to another. This could be assessed by collecting confidence judgements of the subject and not only categorical decisions. Another assessment method would be to use reaction times: ambiguity in categorical judgements leads to increased response latency [69] while bistable perception does not [70]. Comparison of response latencies for stimuli involving conflicting binding cues (e.g. binocular rivalry) and those not involving such cues (e.g. a colour between green and blue) could reveal whether or not the latter involve bistability.

Such methodologies could be used in other modalities, for example olfaction. The results of Zhou & Chen [21] on mononaral rivalry suggested that it is not possible to experience two different odours at the same time. The generality of this finding should be tested for a large variety of pairs of odours. Also, it needs to be determined whether the responses reflect a categorical judgement (see also Gottfried [71]). Further evidence for multistability in mononaral olfaction would clarify the issue of what constitutes a ‘perceptual object’ in olfaction. Following the general framework set up in this introduction, the rivalry between two odours may imply that each odour represents a different perceptual object. Similar methods could be applied to assess the phenomenological level of perceptual organization in touch, proprioception and perhaps even taste.

(c). Subjectivity, individual differences and multistability

Multistability opens a window on the subjective experience of the perceiver, by using stimuli that are physically stable but lead to a rich and a diverse phenomenology. The study of multistability can therefore play a crucial role in understanding the characteristics of the construction of perceptual awareness. It has been known for a long time that individuals differ markedly in the rate of the alternation between alternative perceptual organizations, but not in the general distribution of stability periods. The extent to which these individual differences are consistent across stimuli is still a matter of debate.

Intra-subject consistency has been observed within a modality ([72] for vision; and, in this issue, within an auditory non-speech and a speech task [65]). Results across modalities are contradictory, with negative results for audition and vision [26], and for touch and vision [20], but significant correlations across all bistable paradigms, in vision and audition [73]. However, the development of measures and adequate paradigms for assessing inter-task and inter-modality correlations is far from trivial. The switching rate between alternative percepts depends on the stimuli used for a given task, with more switches when the two interpretations are equally likely, which leads to equal dominance of the percepts [74]. Hence, the stimulus parameters should be carefully matched across tasks and modalities and calibrated for each subject, in order to achieve equal dominance, which has not always been done and is, in any case, difficult to do. A number of contextual parameters (such as the general level of attention and arousal) and possible artefacts (e.g. the role of eye movements) are likely to introduce variability and biases into the results.

In conclusion, the study of multistability, extending from vision to audition and speech in the present issue, and potentially to many other sensory modalities, provides a window for examining many factors that are associated with or influence perceptual binding, both within and across sensory modalities. These include: attention, decision-making and consciousness; the cognitive state of the perceiving brain; mental disorders, pharmacology and genetics; and even creativity and culture. A striking example of the latter is a type of French slang called ‘verlan’, in which words are created by reversing the order of the syllables. The word verlan itself is a verlan word: the French for ‘reverse’ is ‘l'envers’, and pronouncing ‘l'envers’ with the syllables in reverse order results in ‘verlan’. The word ‘verlan’ and more generally most slang constructions in verlan, may have arisen through repetition of ‘l'envers’ via the verbal transformation effect. Artists such as Salvador Dali have used ambiguous figures in their work, for example, ‘Slave market with the disappearing bust of Voltaire’, and composers such as Bach have exploited auditory streaming to create the impression of two melodic lines coming from an instrument such as the flute, which produces only one note at a time (see the cover of this issue).

In summary, multistability in vision and audition has long fascinated researchers, artists, composers, and philosophers. The study of multistability is being extended to most sensory and motor modalities and to cross-modal perception. This fascinating landscape is explored in the wide-ranging papers in this volume.

Acknowledgements

This work was supported by the ‘Agence Nationale de la Recherche’ (ANR-08-BLAN-0167-01, project Multistap). The work of BCJM was supported by the MRC (UK). We thank Agnès Léger for creating the figures used in this article.

References


Articles from Philosophical Transactions of the Royal Society B: Biological Sciences are provided here courtesy of The Royal Society

RESOURCES