Perceptual Constancy With a Novel Sensory Skill

Liam J Norman; Lore Thaler

doi:10.1037/xhp0000888

. 2020 Dec 3;47(2):269–281. doi: 10.1037/xhp0000888

Perceptual Constancy With a Novel Sensory Skill

Liam J Norman ^1,^*, Lore Thaler ¹

Editor: Isabel Gauthier

PMCID: PMC7818673 PMID: 33271045

Abstract

Making sense of the world requires perceptual constancy—the stable perception of an object across changes in one’s sensation of it. To investigate whether constancy is intrinsic to perception, we tested whether humans can learn a form of constancy that is unique to a novel sensory skill (here, the perception of objects through click-based echolocation). Participants judged whether two echoes were different either because: (a) the clicks were different, or (b) the objects were different. For differences carried through spectral changes (but not level changes), blind expert echolocators spontaneously showed a high constancy ability (mean d′ = 1.91) compared to sighted and blind people new to echolocation (mean d′ = 0.69). Crucially, sighted controls improved rapidly in this ability through training, suggesting that constancy emerges in a domain with which the perceiver has no prior experience. This provides strong evidence that constancy is intrinsic to human perception.

Keywords: constancy, blindness, echolocation, perceptual learning

Public Significance Statement

This study shows that people who learn a new skill to sense their environment - here: listening to sound echoes - can correctly represent the physical properties of objects. This result has implications for effectively rehabilitating people with sensory loss.

Making sense of the world requires perceptual constancy—the stable perception of an object across changes in one’s sensation of it. A classic example of this is size constancy in vision, which can be described as the accurate judgment that an object has remained the same physical size despite viewing it from different distances (Holway & Boring, 1941). It is at present unclear to what degree perceptual constancy is intrinsic to human sensory processing. Supporting this idea, constancy can be found across all modalities (e.g., in vision, hearing, and touch; Fieandt, 1951; Sperandio & Chouinard, 2015; Yoshioka et al., 2011; Zahorik & Wightman, 2001) and in some limited forms can be present from birth (Slater et al., 1990) or from a very young age (7 months; Yang et al., 2015). Yet, the question of whether constancy is intrinsic to human sensory processing remains unanswered. Irrefutable evidence in support of this would require that people show constancy in an entirely novel sensory modality. Although it is not possible to demonstrate this, we can nonetheless test whether humans show constancy when using their existing senses to perceive objects in an entirely novel way: that is, using a new sensory skill. A new sensory skill allows a person to use a new sensory substitution or augmentation system. To this end, here we tested adults in their ability to achieve constancy using click-based echolocation—a sensory skill with which humans are typically unfamiliar, but which can be acquired through experience.

Echolocation is an acoustic method of sensing the world through sound reflections (Griffin, 1944). Human echolocators typically use mouth clicks to ensonify the world around them (Kolarik et al., 2014; Thaler & Goodale, 2016), and the returning echoes can be used to identify many physical properties of objects (e.g., size, shape, material; Milne et al., 2014, 2015; Teng & Whitney, 2011). Echolocation is mediated through hearing. It is a skill that most people are unfamiliar with, but which can be acquired through training (Dodsworth et al., 2020; Teng & Whitney, 2011; Tonelli et al., 2016). As such, click-based echolocation is an example of a novel sensory skill which allows a person to use a new sensory substitution or augmentation system (similar to devices like ‘The Voice,” a head-mounted device that translates visual scenes into acoustic signals; Meijer, 1992; and other sensory substitution devices; for a review, see Maidenbaum et al., 2014).

Studying constancy in human echolocation is ideally suited to testing whether humans show constancy in a novel sensory skill because not only is echolocation a sensory skill that most people have no experience with, but it also potentially holds forms of constancy that are entirely native and specific to the processes of echolocation. Constancy in click-based echolocation could be considered as the ability to perceptually represent the physical properties of the reflecting object (i.e., the distal stimulus) and not simply the raw sensory response elicited by the echo (i.e., the proximal stimulus). For expert echolocators (EEs), the level and spectrum of an echo carry information that can be used to recover the physical properties of the reflecting object such as its size, shape, and material (e.g., Milne et al., 2014, 2015; Teng & Whitney, 2011; Yu et al., 2018). This is possible because those properties of the reflecting object determine how much energy of the echolocator’s click is reflected at different wavelengths. The level and spectrum of the echo, however, are also determined by the level and spectrum of the echolocator’s click that is used to ensonify the object (Figure 1). For example, the level of the echo can increase either because (a) the echolocator increases the level of their click, or (b) the reflecting object increases in size. Similarly, the spectrum of the echo is also determined both by the initial spectrum of the click as well the various reflecting properties of the object (e.g., material, size, shape, etc.). Given that there is click-to-click variability in the level and spectrum of an EE’s click (de Vos & Hornikx, 2017; Thaler et al., 2017, 2018; Zhang et al., 2017), it follows that there are problems of perceptual constancy that must be solved by human echolocators.

*Note.* In Scenario a, the echolocator makes mouth clicks at different loudness levels—loud (top) and soft (bottom) at objects that are the same physical size. Due to the difference in the level of the clicks, there is also a relative difference in the level of the echoes reflected from the object—the echo from the top object is louder than that from the bottom object. Alternatively, in Scenario b, the echolocator makes identical clicks (top and bottom) but the object on the bottom is physically smaller than that on the top. As in Scenario a, this also results in a relative difference in the levels of the echoes. Therefore, in order to achieve constancy for the physical size of the object, an echolocator must resolve the ambiguity presented by these two scenarios. Variations in the spectrum of the echo can also vary for similar reasons—either because there is variation in the spectrum of the click or variation in the physical properties of the object (e.g., shape, material, size).

These possible forms of constancy in echolocation are unlike those observed in other forms of novel sensory processing that function through the use of devices that translate information from one modality to another (e.g., see Maidenbaum et al., 2014). Visually impaired people can use such devices to recognize objects whose visual properties are translated into auditory information (Auvray et al., 2007), and can even show constancy by accurately perceiving size and orientation across variations in the angle at which the device captures the visual information (Stiles et al., 2015). With such examples of constancy, however, the relevant sensory relationships that must be disambiguated to achieve constancy are not native to the novel sensory skill—they are defined with respect to their original modality and are translated from their original modality into the modality used for substitution (e.g., from vision to audition in the case of Stiles et al., 2015). Although the stimulus coding space might be entirely different across that translation (e.g., converting a spatial dimension into one of frequency), it remains possible that constancy is solved only through cross-modal imagery in the stimulus’ original coding space (e.g., see Spence & Deroy, 2013, for a discussion of cross-modal mental imagery). It follows that by testing whether people show a form of constancy that is native and specific to echolocation, we can provide the most direct and unambiguous evidence that constancy can be learned by humans using a novel sensory skill.

Here, we define constancy in echolocation as the ability to correctly attribute a change in the echo to a change either in the emission or the reflecting object. This is a performance-based “operational” approach to measuring constancy, which has its origins in studies of color constancy (Craven & Foster, 1992). We chose this approach for two reasons: (a) it does not rely on the subject being able to identify or perceptually match the properties of the reflecting object, and (b) it is a form of constancy that is achieved with high accuracy and little cognitive effort by subjects when compared to alternative measures (Craven & Foster, 1992; Foster et al., 1992). In our constancy tasks, participants listened to two click-echo pairs and judged whether the difference between echoes across the two pairs (either in level or spectrum, separately) was a result of variation in the clicks’ acoustic properties or in the objects’ reflecting properties.

In Experiments 1–3 we tested people’s ability to show constancy across variations in the echo’s spectrum. In Experiments 4 and 5 we tested people’s ability to show constancy across variations in the echo’s level. We also considered the effect of echolocation experience in this context by testing expert echolocators (EEs) as well as blindfolded sighted controls (SCs) and blind controls (BCs) with no prior experience in echolocation. We include both BCs and SCs in order to determine whether any superior ability of EEs is due to visual impairment alone. Given the previous work showing that both spectral composition and sound level are important perceptual features in click-based echolocation in humans (e.g., Norman & Thaler, 2020), and that EEs perform better than both BCs and SCs in tasks that involve passively listening to echolocation sounds (e.g., Norman & Thaler, 2020), we predict that EEs will show constancy across variations in the echo’s spectrum and level, and they will perform better than both BCs and SCs in this ability. We do not expect BCs and SCs to differ in their ability. If the superior constancy in EEs is driven by expertise in echolocation, then SCs should improve in this ability with training for both spectrum and level (Experiments 3 and 5, respectively).

General Materials and Methods

All experiments reported in this study share some common elements, which are described below.

Ethics

All procedures followed the British Psychological Society code of practice and the World Medical Association’s Declaration of Helsinki. The experiment had received ethical approval by the Ethics Advisory Sub-Committee in the Department of Psychology at Durham University. All participants gave written informed consent to take part in this study.

Participants

Three participant groups were tested—EEs, BCs, and SCs. BCs and SCs reported having no prior experience with click-based echolocation, except for two of the BCs, who had taken part in a previous study in our lab which had required them to listen to echolocation sounds and to make clicks, but who did not meet our criteria for EEs in terms of regularity and duration of use of echolocation. In Experiments 3 and 5 (the training experiments), only sighted participants were tested. Those who were classed as EEs reported using click-based echolocation on a daily basis for more than 10 years. Participants had normal hearing, with the exceptions of BC7, BC10, and BC17 who had some loss for frequencies beyond 4 kHz consistent with their age. Table 1 shows relevant details of the EEs and BCs who took part (i.e., age, gender, degree and cause of vision loss, age at onset of vision loss). Some participants took part in more than one of the experiments reported here and Table 1 shows which experiment each participant took part in. Participants were compensated either at a rate of £10/hr or with course credit. SCs were recruited through internal advertising within the Durham University Department of Psychology. BCs were recruited through word-of-mouth. All EEs had taken part in studies with us before, and were recruited for this study through direct invitations.

Table 1. Details of 3 EEs and 17 BCs Who Participated in Experiments 1, 2, and 4.

Participant	Gender	Age	Degree of vision loss	Cause and onset of vision loss	Echolocation use	Experiment
Note. Some participants took part in more than one experiment. EE = expert echolocator; BC = blind control; M = male; F = female.
EE1	M	50	Total blindness	Enucleation due to retinoblastoma at 13 months	Daily; since early childhood/no exact age remembered	1, 2, 4
EE2	M	35	Total blindness	Gradual sight loss since birth due to glaucoma	Daily; since 12 years old	1, 2, 4
EE3	M	24	Total blindness	Enucleation at Age 19 due to sudden loss of vision (exact cause unknown)	Daily; since 12 years old	1, 2, 4
BC1	M	32	Total blindness	Retinopathy prematurity. Some vision in right eye from birth; retinal detachment in right eye at Age 12	None	1, 2, 4
BC2	M	67	Residual bright light perception	Leber’s amaurosis; from birth	None	1, 2, 4
BC3	M	48	Total blindness in left eye; residual bright light perception in right eye	Severe childhood glaucoma; 3 months old	None	2
BC4	M	63	Central vision in right eye; residual bright light perception in both eyes.	Glaucoma; poor vision since birth with increasing severity, registered blind Age 50	None	1, 4
BC5	F	59	Total blindness in left eye; peripheral vision in right eye	Stichler’s syndrome; retinal sciasis; from birth with increasing severity	None	2, 4
BC6	M	53	Residual bright light perception	Retinitis pigmentosa; official diagnosis Age 10. Gradual sight loss from birth	Some experience; very little regular use	1, 2
BC7	M	69	Residual bright light perception; some shape perception	Retinal dystrophy (exact cause unknown); official diagnosis Age 6–7	None	4
BC8	F	64	Total blindness	Undeveloped iris; from birth	None	1, 2, 4
BC9	M	39	Residual bright light perception	Retinitis pigmentosa; from Age 7–8		1, 2
BC10	F	56	Total blindness in left eye; residual bright light perception and some shape perception in right eye	Coloboma; from birth	None	4
BC11	F	62	Residual bright light perception	Retinal development abnormality; from birth	None	1
BC12	M	70	Residual bright light perception	Unknown cause; from birth	None	4
BC13	F	36	Residual bright light perception	Unknown cause; from birth	None	2
BC14	M	45	Total blindness	Ocular albinism. Gradual sight loss from birth	Some experience; very little regular use	1, 2, 4
BC15	M	45	Total blindness	Blood clot damaging optic nerve; Age 15	None	1, 4
BC16	M	37	Tunnel vision in both eyes	Retinitis pigmentosa; gradual from birth; official diagnosis Age 13	None	2
BC17	M	58	Total blindness	Retinoblastoma; enucleation at 2 years	None	1

Open in a new tab

Statistical Power

We had practical limitations on our sample sizes for EEs and BCs. In order to demonstrate that we have sufficient power and precision to support our statistical inferences, we calculated the minimum effect size that can be detected with our sample sizes. We did this separately for the four types of critical statistical tests that we use to support our main conclusions. These tests are: (a) testing whether each participant group performs better than chance in the constancy tasks (Experiments 1, 2, and 4), (b) testing whether there is a difference between groups’ performance in the constancy tasks (Experiments 1 and 4), (c) testing whether constancy performance across variations in spectrum is affected by the intensity of the echoes (Experiment 2), and (d) testing whether performance in the constancy tasks improves with training (Experiments 3 and 5). For all of these tests, we used G*Power 3.1.9.7 (Faul et al., 2007) to compute required effect sizes (for two-tailed tests), setting α to 0.05 and power to 0.8. Where G*Power computes effect sizes as Cohen’s f, these values were converted to η² or η_p² values to be consistent with the units of our reported effect sizes. These computed minimum effect sizes are reported throughout this article alongside the observed effect sizes for any critical tests that are statistically significant, with additional details provided for each test where necessary.

Apparatus and Recording Process

Recording Process

The stimuli for these experiments were created from recordings of echolocation sounds that we made for a previous set of experiments (Norman & Thaler, 2020). The recording process is described in detail in that publication, but some important details are summarized below. The setup of the recording apparatus is shown in Figure 2.

*Note.* A manikin was positioned behind a loudspeaker, which emitted a click. A wooden disk was used as a reflecting object and positioned at a distance of either 1, 2, or 3 m from the loudspeaker, or not present at all. Recordings were made using binaural microphones.

Recording Clicks With Varying Spectra

Three variations in the click’s peak spectrum were used: 3.5, 4.0, and 4.5 kHz—hereafter referred to as low, medium, and high frequencies, respectively—and across these variations the level of the emissions was held constant. We chose these peak frequencies as they reflect a range that is found in natural human mouth clicks of EEs (Thaler et al., 2017; Zhang et al., 2017). It should be noted that emissions containing higher spectral frequencies lead to stronger echoes being reflected from the target object because, for an object of fixed proportions, sound composed of shorter wavelengths will be more strongly reflected than one composed of longer wavelengths. Thus, the echoes are more intense as the peak spectrum of the emission is increased. These natural variations are preserved in Experiment 1, and in Experiment 2 we directly assess whether the presence of these level differences is necessary for constancy.

Recording Clicks With Varying Levels

Three variations in the click’s level were acquired by digitally amplifying the emission sound by factors of 0 dB (i.e., baseline), −3 dB, and −6 dB (using the “Amplify” function in Audacity(R) 2.1.2; Audacity Team, 2016) – hereafter referred to as high, medium and low levels, respectively. The peak spectrum was held constant at 4.5 kHz.

Creating the Stimuli for the Constancy Task

In preparing the sounds to be used as stimuli in the constancy task, and also in one of the training tasks described below, it was first necessary to be able to digitally separate the click and echoes at each target distance level. This was needed in order to be able to digitally recombine clicks and echoes from recordings with different emission levels or spectral frequencies—for example, to create a high level click with a low level echo—which allows us to simulate the presence of an object of varying reflecting properties. While this virtual approach might lead to click-echo combinations that are unlikely to arise in everyday situations, it gives us precise control over the acoustic properties of clicks and echoes. The temporal onset of the echo at each target distance was identified by visual inspection of the waveforms, with the point at which the waveform first rose above the noise floor being taken as the temporal onset of the echo. Any sound data recorded after this point were taken as belonging to the echo, and any before this point were taken as belonging to the click emission.

Behavioral Experiments

Participants were tested in the same sound-insulated and echo-acoustic dampened room in which the sound recordings had been made (described in Norman & Thaler, 2020). Sounds were played to participants through binaural in-ear headphones (Etymotic Research ER4B MicroPro; ETYMOTIC RESEARCH, INC., Elk Grove Village, Illinois) driven by a Dell Latitude E7470 laptop (Intel Core i56300U CPU 2.40 GHz, 8 GB RAM, 64-bit Windows 7 Enterprise) through a USB soundcard (Creative Sound Blaster X-Fi HD sound card; Creative Technology, Creative Labs Ireland, Dublin, Ireland). Sounds were played to participants at a level at which the sound file with the highest peak level was presented at 80-dB sound pressure level. Participants sat upright and gave their response using a keyboard. Participants who were not fully blind wore a blindfold.

For participants to successfully show perceptual constancy for an object across variations in the echo, they must first be able to recognize when an echo is present (compared to when it is absent). They must also be able to discriminate the variations in the acoustic properties of the echo and emission that are relevant to the constancy task. Thus, participants completed three echo-acoustic training tasks prior to completing the constancy task. In each of these tasks they either (a) detected the presence of an echo, (b) discriminated differences in the echo’s spectrum or level, or (c) discriminated differences in the emission’s spectrum or level (with no echo present).

In each task, participants pressed a key to begin each trial. Each task consisted of a two-alternative forced choice task, where two sounds were played to participants consecutively with an inter stimulus interval of 1 s. The two sounds on each trial were played in a random order determined on each trial and participants then pressed one of two keys on a keyboard to indicate their response. During the three echo-acoustic training tasks, but not during the constancy task itself (except in Experiments 3 and 5), participants received auditory feedback (2500-Hz “correct” tone or 600-Hz “incorrect” 50-ms tone) on each trial to indicate whether they were correct or not. Before each task, participants were given a practice block that was one third the length of the main block.

Echo-Acoustic Training: Echo Detection

Participants judged whether an echo was present in the first or second of two sounds. On each trial, emissions of the same spectrum/level were used (both either high or low¹). After one of these emissions the echo from an object at either 1, 2, or 3 m was present, and after the other no echo was present. For each emission level (low/high), each of these target distances was tested 15 times, amounting to a total of 90 trials per block. Proportion correct was then calculated and averaged across levels of target distance for each participant.

Echo-Acoustic Training: Echo Discrimination

Participants judged which of two echoes was higher in pitch or level. On each trial, emissions of the same spectrum/level were used (medium) and, after both of these emissions, the echo from an object at a distance of either 1, 2, or 3 m was present. One of these echoes was taken from the low spectrum/level emission recording, and the other taken from the high spectrum/level emission recording. (see Footnote 1) Each target distance was tested 15 times in each block, amounting to a total of 45 trials per block. Proportion correct was then calculated and averaged across levels of target distance for each participant.

Echo-Acoustic Training: Emission Discrimination

Participants judged which of two click emissions was either higher in pitch or level. The low and high spectrum/level emissions were played in a random order on each trial, and there was no echo present in either sound. This was repeated 15 times, amounting to a total of 15 trials per block. Proportion correct was then calculated for each participant.

Constancy Task

Participants judged whether two echoes were different in their spectrum (Experiments 1–3) or level (Experiments 4–5) either because (a) the clicks were different, or (b) the objects were different. On each trial the echoes were always different in their spectrum/level (high/low), and were reflected from an object at the same distance (either 1, 2, or 3 m). On half of the trials, the emissions were different to one another in their spectrum/level (high/low), and these matched the spectrum/level of their respective echoes (i.e., a low level click, followed by a low level echo). In the remaining half of the trials, the two emissions had the same spectrum/level (either low or high, occurring equally often). Thus, in trials in which the echoes varied with the clicks, the correct response was to judge that the echoes were different because the clicks were different. Alternatively, in trials in which the echoes did not vary with the clicks, the correct response was to judge that the echoes were different because the objects were different. Figures 3 and 4 display examples of the stimuli used in the spectrum and level constancy tasks, respectively. There were 60 trials for each target distance in each block, amounting to a total of 180 trials in each block. Before each constancy task, participants were told explicitly whether the differences would be carried by differences in level or spectrum.

*Note.* In each trial, subjects heard two sound recordings—both containing a click and echo from a target object. The echoes were always different in their spectrum. On 50% of trials, this spectrum difference was due to the clicks also being different in their spectrum (Column A). On the remaining 50% of trials, the clicks were either the same low spectrum (Column B; 25%) or same high spectrum (Column C; 25%). In these latter two cases, the relative difference in spectrum of the echoes can only be explained by differences in the reflecting properties of the object. Subjects’ task was to judge whether the echoes were different either because the clicks were different or because the reflecting objects were different. Only the echoes from a 3-m target are shown here—echoes from 1- and 2-m targets were also used in the experiment. The y-axis shows amplitude in arbitrary units (a.u.).

*Note.* The design was the same as that described for variations in spectrum (see Figure 3), but here the clicks and echoes vary in level and not spectrum. The y-axis shows amplitude in arbitrary units (a.u.).

Unlike the echo-acoustic training tasks, which were two-interval forced choice tasks, the constancy task required participants to classify each trial in one of two ways (i.e., “objects different” or “clicks different”). Thus, it is possible that response bias affected participant’s performance in the constancy task, and therefore a bias-free measure of performance (d′) was calculated from hit rates and false alarm rates [d′ = z(HR) – z(FAR)]. Hits were classed as trials in which the participant correctly identified that the echoes were different because the objects were different. False alarms were classed as trials in which participants judged that the echoes were different because the objects were different, when in fact the clicks were different. A higher d′ indicates a greater ability to accurately classify the two types of trial (i.e., greater constancy ability), and a d′ of zero indicates no ability to do this (i.e., no constancy).