Crossmodal enhancement of speech detection in young and older adults: Does signal content matter?

Nancy Tye-Murray; Brent Spehar; Joel Myerson; Mitchell S Sommers; Sandra Hale

doi:10.1097/AUD.0b013e31821a4578

. Author manuscript; available in PMC: 2012 Sep 1.

Published in final edited form as: Ear Hear. 2011 Sep-Oct;32(5):650–655. doi: 10.1097/AUD.0b013e31821a4578

Crossmodal enhancement of speech detection in young and older adults: Does signal content matter?

Nancy Tye-Murray, Brent Spehar ¹, Joel Myerson ², Mitchell S Sommers ², Sandra Hale ²

PMCID: PMC3151349 NIHMSID: NIHMS289416 PMID: 21478751

Abstract

Objective

The purpose of the present study was to examine the effects of age and visual content on cross-modal enhancement of auditory speech detection. Visual content consisted of three clearly distinct types of visual information: an unaltered video clip of a talker’s face, a low-contrast version of the same clip, and a mouth-like Lissajous figure. It was hypothesized that both young and older adults would exhibit reduced enhancement as visual content diverged from the original clip of the talker’s face, but that the decrease would be greater for older participants.

Design

Nineteen young adults and 19 older adults were asked to detect a single spoken syllable (/ba/) in speech-shaped noise, and the level of the signal was adaptively varied to establish the signal-to-noise ratio (SNR) at threshold. There was an auditory-only baseline condition and three audiovisual conditions in which the syllable was accompanied by one of the three visual signals (the unaltered clip of the talker’s face, the low-contrast version of that clip, or the Lissajous figure). For each audiovisual condition, the SNR at threshold was compared with the SNR at threshold for the auditory-only condition to measure the amount of cross-modal enhancement.

Results

Young adults exhibited significant cross-modal enhancement with all three types of visual stimuli, with the greatest amount of enhancement observed for the unaltered clip of the talker’s face. Older adults, in contrast, exhibited significant cross-modal enhancement only with the unaltered face.

Conclusions

Results of the current study suggest that visual signal content affects cross-modal enhancement of speech detection in both young and older adults. They also support a hypothesized age-related deficit in processing low-contrast visual speech stimuli, even in older adults with normal contrast sensitivity.

According to Spence, Senkowski, and Röder (2009), crossmodal enhancement refers to “situations in which the presentation of a stimulus in one sensory modality [affects our] ability to respond to the stimuli presented in another modality.” Crossmodal enhancement is typically assessed by comparing the signal level necessary to detect a unimodal stimulus with the level necessary to detect a multimodal stimulus. For example, people can detect speech at a lower signal-to-noise ratio (SNR) when it is accompanied by the corresponding visual speech signal than when it is not (Grant & Seitz, 2000; Grant, 2001; Bernstein, Auer, & Takayanagi, 2004). Similarly, the simultaneous presentation of an auditory signal can facilitate detection of a visual signal (Frassinetti, Bolognini, & Ladavas, 2002).

Recently, Andersen and Mamassian (2008) showed that the detection of a visual transient (i.e., luminance change) is facilitated by an auditory transient (i.e., loudness change), regardless of whether the direction of the change is the same in both modalities or not. Based on this finding, they concluded that cross-modal enhancement can be produced by a stimulus irrelevant to the primary detection task. Bernstein et al. (2004) reached a somewhat similar conclusion regarding crossmodal enhancement of the detection of auditory speech signals based on their finding that enhanced detection can occur even with generic non-speech visual stimuli. Their results were particularly noteworthy because they were inconsistent with the hypothesis (Grant & Seitz, 2000) that enhancement is mediated by fine-grained correlations between the dynamics of auditory and visual speech signals.

Only a few studies of crossmodal enhancement have directly assessed the role played by differences in visual signal content, operationally defined here as differences in the degree of similarity between a visual stimulus and an unaltered image of the talker. In the study by Bernstein et al. (2004), young adults’ performance in an auditory-only condition of a speech detection task, was compared to their performance in audiovisual conditions that differed with respect to the content of the visual signal. Four different visual signals were presented: (1) the face of the talker saying /ba/, (2) a dynamic Lissajous figure, (i.e., an undulating horizontal oval) whose vertical extent varied with the amplitude envelope of the token, and which looked somewhat like a mouth opening and closing, (3) a dynamic rectangle whose horizontal extent varied with the speech amplitude envelope of the /ba/ token, and (4) a static rectangle whose onset and offset coincided with those of the acoustic /ba/ token.

Bernstein et al. (2004) reported that the SNR at threshold was higher in the auditory-only condition than in the four audiovisual conditions. Best performance occurred in the audiovisual condition where the face of the talker was the visual stimulus, but there were no significant differences in detection thresholds among the other three audiovisual conditions. Moreover, when a preliminary mouth gesture was deleted, the talker’s face no longer produced superior performance. Based on these results, Bernstein et al. suggested that cross-modal enhancement of speech detection does not require any high-level visual analysis of mouth movements. Furthermore, they suggested that a sub-cortical mechanism, possibly localized in the superior colliculus, may produce the enhancement. At this level of the brainstem, evidence suggests that no feature analysis of the auditory and visual signals occurs beyond simply detecting audiovisual correspondence in time and/or space (Stein & Meredith, 1993). Frassinetti et al.’s (2005) finding that luminance detection in individuals with blindness due to cortical damage is improved by presentation of a simultaneous but irrelevant sound also suggests a subcortical mechanism for crossmodal enhancement.

Although the effects of aging on auditory enhancement (i.e., improvement in recognition of an auditory speech stimulus resulting from simultaneous presentation of the corresponding visual speech stimulus) have been studied extensively (Sommers, Tye-Murray, & Spehar, 2005), to the best of our knowledge no previous studies have addressed the question raised by the research on crossmodal enhancement just discussed. That is, older adults are poorer than young adults at identifying visual speech signals (Shoop & Binnie, 1979; Walden, Busacco, & Montgomery, 1993; Sommers, Tye-Murray, & Spehar, 2005; Feld & Sommers, 2009; Tye-Murray et al., 2010), but does aging modify, either qualitatively or quantitatively, the effect of signal content on crossmodal enhancement?

In the present investigation, we assessed crossmodal enhancement during detection of a spoken /ba/ syllable that was presented either alone or accompanied by one of three visual signals: an unaltered video clip of the talker speaking the syllable /ba/, the same visual signal but with the contrast greatly reduced, and a dynamic Lissajous figure (Bernstein, et al., 2004). Our primary objective was to determine whether the pattern of results obtained by Bernstein et al. (2004) with young adults would also be observed in older adults. Because older adults have a decreased ability to identify the content of visual speech signals, their inclusion provides a natural experiment with implications for the role of signal content in crossmodal enhancement. It was hypothesized that both young and older adults would exhibit reduced enhancement as visual content diverged from the original clip of the talker’s face, either quantitatively (as with the low contrast clip of the talker’s face) or qualitatively (as in with the Lissajous figure), but that the decrease would be greater for older participants.

Method

Participants

Nineteen young adults (mean age = 22.2 yrs, SD = 1.0, range = 19–25) and 19 older adults (mean age = 73.6 yrs, SD = 5.5, range = 67–85) participated in the current investigation. They were recruited through databases maintained by the Aging and Development Program at Washington University and the Volunteers for Health at Washington University School of Medicine. All participants, both young and older adults, were community-dwelling residents and spoke English as their first language. The older adults had participated in previous studies in our laboratory. Testing took approximately two hours, and all participants received $10 per hour for their participation.

Table 1 shows the results of pure-tone testing for both groups. All participants had pure-tone average (PTA) thresholds of less than 30 dB (based on thresholds for 500, 1000, and 2000 Hz tones for the better ear). Although the two groups differed in PTA (t(36) = 7.54, p < .001), their thresholds in the auditory-only condition of the speech detection task did not differ significantly (see Results). Both groups had normal or corrected-to-normal visual acuity (20/30 or better, as assessed using a Snellen Eye Chart) and normal contrast sensitivity (better than 1.6, as assessed using a Pelli-Robson Contrast Sensitivity Chart).

Table 1.

Mean pure-tone averages (PTA) and mean frequency thresholds (and standard deviations) for young and older participants for left (L) and right (R) ears.

		PTA	500 Hz	1000 Hz	2000 Hz	4000 Hz
Young	L	2.6 (5.3)	2.4 (4.8)	2.6 (4.5)	2.9 (6.5)	0.8 (7.5)
Young	R	3.2 (6.3)	4.2 (6.6)	3.9 (6.4)	1.6 (6)	−0.5 (6.9)
Older	L	19.1 (12.9)	20.3 (12.3)	18.7 (13.4)	18.4 (12.9)	40 (24.7)
Older	R	18.3 (9.1)	18.4 (7.6)	17.1 (7.5)	19.5 (12.2)	36.1 (24.9)

Open in a new tab

Apparatus

Participants were tested individually in a sound-treated room while seated approximately 0.5 meters from a 17” TouchSystems touchscreen monitor (ELO ETC-170C, TouchSystems Corporation, Hutto, TX). A program written in LabVIEW (National Instruments, Austin, TX) specifically for the purposes of this experiment controlled the presentation of stimuli and recorded participants’ responses. To determine thresholds in each condition, the SNR was adjusted by routing the audio signal through a real time processor (TDT RP 2.1, Tucker-Davis Technologies, Alachua, FL) under control of the LabVIEW program. Auditory stimuli were presented through a calibrated audiometer over loudspeakers orientated at 45 degrees to each side of the listener. Presentation level was monitored via the audiometer's VU meter.

Stimuli

In the auditory-only (A-only) condition, the computer monitor screen remained a neutral gray while the auditory stimuli were presented. In two of the three audiovisual conditions, the visual stimulus was a 2.2-s video clip showing the head and shoulders of a woman speaking the token /ba/, which was presented either with the video signal unaltered (AV-good condition) or with the contrast reduced by 98% (AV-poor condition), resulting in a ‘ghost image’ of the original video with most of the detail and color removed (see Figure 1). The video clip, as well as the accompanying audio recording, which was used in all four conditions, was taken from the Conversation Made Easy aural rehabilitation training program (Tye-Murray, 2002). The video portion of the original recording was in MPEG format. To prevent possible loss of video quality from codec compression, the file was converted to AVI format before further processing. Digital video and audio processing were accomplished using Adobe Premiere Elements and Adobe Audition (Adobe Systems Inc., San Jose, CA), respectively. The audio portion of the clip was kept in 44.1 kHz, 16-bit format.

In the third audiovisual condition (AV-L), the visual stimulus, which was also presented for 2.2 s, was a Lissajous figure: a horizontal ellipse whose vertical dimension was adjusted dynamically with the amplitude of the audio so that participants saw a mouth-like shape that appeared to open and close rapidly with the speech segment. A band-pass filter centered at 25 Hz with a bandwidth of 50 Hz was used to extract the amplitude envelope of the speech signal and dynamically scale the vertical extent of the ellipse.

Two methods were used to avoid presenting stimuli that were too loud. First, the noise was spectrally shaped to be similar to the long-term average spectrum of speech in order to avoid the potentially uncomfortable shrill sound associated with loud high-frequency noise. More specifically, the noise was generated by filtering white noise to have equal energy up to 1000 Hz, above which it decreased 12 dB per octave (adapted from ANSI S3.6, 1989). Second, the speech-shaped noise was always presented at a comfortable level of 62 dB SPL-A and the /ba/ token plus noise was initially presented at a SNR of −5.0. The level of the signal was then adjusted using an adaptive procedure to determine the SNR at threshold.

Procedure

Each participant completed four sequences of adaptive-tracking runs, an initial practice sequence that was not included in the analyses followed by three experimental sequences. Within each sequence four runs were conducted. One run for each of the four conditions was presented in the following order: AV-good, AV-poor, AV-L, and then A-only.

The participants’ task was always to identify which of two 2.2-s intervals included the auditory /ba/ token. Participants pressed one of the two virtual buttons in the lower portion of the computer touchscreen monitor to indicate which interval they thought contained the /ba/ token. The interval with the /ba/ token was randomly determined. In audiovisual conditions, the visual stimulus was always the same in both intervals. The noise was introduced 0.5 to 1.0 s before presentation of the visual stimulus and turned off 0.5 seconds after the visual stimulus ended. The time between noise onset and the beginning of the visual stimulus was randomly varied to prevent participants from being able to anticipate the onset of the speech token. The timing of auditory stimuli in the auditory-only condition was identical to that in the audiovisual conditions.

As indicated previously, the SNR on the first trial of each run was −5.0. On subsequent trials of that run, the SNR depended on the participant’s previous responses. The audio level of the signal (i.e., the /ba/ token) was decreased after three correct responses and increased after one incorrect response. The size of the adjustment was determined by the number of previous reversals. Each change in the direction of the adjustments constituted a reversal. SNR changes of 3.0 were used for the first three reversals, SNR changes were then reduced to 1.0 for the next three reversals, and were 0.5 for the final 6 reversals. This procedure converged on the SNR needed to produce approximately 80% correct performance. For each of the four conditions of each of the three experimental sequences, participants’ threshold SNR was calculated as the mean SNR averaged over the last six reversals of each of the three experimental runs.

To determine whether performance in each condition was stable across the three experimental sequences, a separate 3 (Sequence) × 2 (Age) analysis of variance (ANOVA) was conducted for each condition. None of the four analyses revealed a main effect of Sequence or a Sequence × Age interaction, and therefore participants’ thresholds in each condition, averaged across the three experimental sequences, were used in the analyses reported below.

Results

Figure 2 depicts the mean SNR at detection threshold for the young adult and older adult groups in all four conditions; note that higher SNR reflects poorer performance. As may be seen, the mean speech detection threshold for the young adult group was higher in the A-only condition than in any of the three AV conditions, suggesting crossmodal enhancement in each case. It may also be seen, however, that the young adults’ thresholds varied considerably across the three AV conditions, implying that the degree of crossmodal enhancement depended on the specific type of visual signal. The talker’s face appeared to produce the most enhancement, at least when the visual contrast was high (AV-good), and the Lissajous figure (AV-L condition) the least, while the low contrast video of the talker’s face produced an intermediate level of crossmodal enhancement.

Mean threshold SNR (and standard error) for the young and older adult groups in the AV-good, AV-poor, AV-L, and A-only conditions.

A different pattern of crossmodal enhancement may be seen in the older adults. Although the older adults, like the young adults, appeared to show crossmodal enhancement when the visual stimulus was the high contrast video of the talker’s face (AV-good), the older adults’ detection thresholds in the other two AV conditions (AV-poor and AV-L) were similar to their thresholds in the A-only condition. Thus, unlike the young adults, the older adults appeared to show crossmodal enhancement in only one AV condition. Scatter-plots comparing young and older adults’ speech detection thresholds in each of the three AV conditions with their performance in the A-only baseline condition are shown in Figure 3; note that points below the diagonal line indicate crossmodal enhancement. Inspection of these plots reveals that the general trends observed in the means effectively capture the different patterns of crossmodal enhancement observed in the vast majority of individual young and older participants.

Scatterplots comparing thresholds in the A-only condition to thresholds in the three crossmodal enhancement conditions; AV-good (top graph), AV-poor (middle graph), and AV-L (bottom graph).

Statistical analyses of the detection thresholds confirmed what is suggested by inspection of Figures 2 and 3. A 2 (Age) × 4 (Condition) ANOVA) revealed significant main effects for both Age, F(1, 36) = 12.5. p = .001, and Condition, F(1, 36) = 32.1, p < .001. There was also a significant Age × Condition interaction, F(1, 3, 36) = 2.7, p = .05, reflecting the different patterns of crossmodal enhancement in the two groups. For the young adults, planned contrasts revealed significant crossmodal enhancement of speech detection by all three visual signals; that is, in each of the three AV conditions, the lower AV threshold differed significantly from the higher A-only threshold, all ps < .05. The older adults, on the other hand, showed significant crossmodal enhancement only with the high contrast video of the talker’s face (p < .001). Planned contrasts also revealed age differences in the three AV conditions (all ts(1,36) >3.0, ps <.005) but not in the A-only baseline condition (t(1,36) = 1.4, p = .165).

To compare the degree of crossmodal enhancement produced by the three visual signals in the young adults, post-hoc tests were conducted using a Bonferroni correction for multiple comparisons. Results of these analyses indicated that young adults’ thresholds in the AV-good condition were significantly lower than those in the AV-poor condition, which were lower than those in the Lissajous condition (AV-L); both ps < .001.

Discussion

The present results demonstrate that the content of a visual signal can affect the crossmodal enhancement of speech detection. The young adults showed crossmodal enhancement for all three types of visual signal (an unaltered video clip of the talker speaking the syllable /ba/, the same visual signal but with the contrast greatly reduced, and a dynamic Lissajous figure), as indicated by speech detection thresholds in the audiovisual conditions that were lower than those in the auditory-only condition. However, their level of performance depended on how similar the visual signal was to the unaltered video of the talker. Audiovisual performance was best when the signal was the talker’s face and contrast was good, next best when contrast was degraded and worst when the visual stimulus was a Lissajous figure, despite the fact that by some measures (e.g., Weber contrast) the Lissajous figure had the highest contrast of the three visual stimuli, and was not the lowest contrast stimulus by any measure.

The results for the older adults differed markedly from those of the young adults, yet they too suggest that signal content affects crossmodal enhancement. The older adults only showed crossmodal enhancement when the signal was the talker’s face and contrast was good. Notably, the Lissajous figure did not produce crossmodal enhancement in the older adults, despite the fact that it was a high contrast visual signal. Taken together, the results are consistent with the hypothesis that both young and older adults would exhibit reduced enhancement as visual content diverged from the original clip of the talker’s face, but that the decrease would be greater for older participants.

Results for the young adults in the current study were similar to those of Experiment 1 in Bernstein et al. (2004) in some respects. For example, both studies found crossmodal enhancement of speech detection by a video clip of the talker’s face and also by a dynamic Lissajous figure. There were differences, however, in the types of other stimuli that were examined in the two studies: Bernstein et al. examined crossmodal enhancement by other non-face visual stimuli (i.e., static and dynamic rectangles, which were not examined in the current study), whereas the current study examined enhancement by a low-contrast face (which was not studied by Bernstein et al.).

Bernstein et al. (2004) did not find differences in crossmodal enhancement among non-face stimuli in their first experiment or among any of the visual stimuli in the second experiment, and they concluded that a fine-grained analysis of the visual signal is not needed to achieve crossmodal enhancement. However, when we compared enhancement by unaltered and low contrast faces, we did find differences in the amount of crossmodal enhancement by these two visual stimuli. Young adults showed diminished, but still statistically significant enhancement when contrast was reduced, while the older adults no longer showed any enhancement, and the results for both age groups suggest that how fine-grained a visual analysis is possible can affect how much enhancement will occur. It is possible, however, that this conclusion applies only to face stimuli. Also of interest for future research is the extent to which the current results and those of Bernstein et al. are peculiar to speech stimuli, and whether fine-grained visual analysis plays similar roles in crossmodal enhancement of the detection of speech and non-speech sounds.

Our primary objective, of course, was to determine whether the pattern of results obtained by Bernstein et al. (2004) with young adults is also observed in older adults. Although both groups showed crossmodal enhancement by an unaltered face stimulus, only the young adults showed enhancement by a Lissajou figure, whereas the older adults did not. Thus, while our results for young adults replicate those by Bernstein et al. with similar stimuli, the kinds of stimuli which can produce crossmodal enhancement are clearly different in young and older adults.

This finding complements our previous work showing that older adults are less able to utilize degraded visual speech information than young adults (Tye-Murray, et al., 2010). When young and older adults were asked to recognize spoken sentences in audiovisual conditions with varying degrees of auditory and visual clarity, both groups showed equivalent crossmodal enhancement, relative to unimodal visual conditions, as long as the video of the talker’s face was not degraded. If the clarity of the visual speech signal was poor, however, the older adults showed significantly less enhancement than young adults when the auditory signal was added (Tye-Murray, et al., 2010; see also Gordon & Allen, 2009). Thus, a similar pattern of age differences emerges with respect to both speech detection and speech recognition, even though the task demands are somewhat different.

Taken together, these findings suggest that aging erodes the ability to take advantage of degraded visual speech information. It should be emphasized that the older adults in both the present investigation and in the Tye-Murray et al. (2010) study had normal or corrected-to-normal visual acuity as well as normal contrast sensitivity as assessed with standard eye charts. Thus, older adults with apparently normal vision may still have a deficit or deficits under suboptimal viewing conditions that preclude their benefiting from visual signals that produce crossmodal enhancement in young adults.

Such deficits could have significant practical implications for older adults, although their interpretation may be open to question. This is because testing under standard visual conditions may overestimate older adults’ visual acuity under conditions where both contrast and luminance are reduced (Haegerstrom-Portony, Schneck, & Brabyn, 1999). Thus, it is possible that age-related declines in visual function under suboptimal viewing conditions could explain why only young adults showed enhancement when the image in the clip was degraded, despite the fact that an undegraded video clip of the talker’s face produced enhancement in both young and older adults. Nevertheless, age differences in vision under suboptimal conditions would not account for the fact that although young adults showed enhancement by the high contrast Lissajous figure, the older adults did not. If the source of this failure can be pinpointed, we may gain new insight not just into the effects of aging on speech detection and recognition, but also into the neural and cognitive mechanisms that underlie crossmodal enhancement in general.

Acknowledgments

This work was supported by grant award number RO1 AG 18029-4 from the National Institute on Aging. We thank Julia Feld, Nathan Rose, and Krista Taake for their suggestions and comments.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

American National Standards Institute. ANSI S3.6-1989. New York: Author; 1989. Specifications for audiometers. [Google Scholar]
Andersen TS, Mamassian P. Audiovisual integration of stimulus transients. Vision Research. 2008;48:2537–2544. doi: 10.1016/j.visres.2008.08.018. [DOI] [PubMed] [Google Scholar]
Bernstein LE, Auer ET, Takayanagi S. Auditory speech detection in noise enhanced by lipreading. Speech Communication. 2004;44:5–18. [Google Scholar]
Feld JE, Sommers MS. Lipreading, Processing Speed, and Working Memory in Younger and Older Adults. Journal of Speech, Language, and Hearing Research. 2009;52:1555–1565. doi: 10.1044/1092-4388(2009/08-0137). [DOI] [PMC free article] [PubMed] [Google Scholar]
Frassinetti F, Bolognini N, Bottari D, Bonora A, Ladavas E. Audiovisual integration in patients with visual deficit. Journal of Cognitive Neuroscience. 2005;17:1442–1452. doi: 10.1162/0898929054985446. [DOI] [PubMed] [Google Scholar]
Frassinetti F, Bolognini N, Ladavas E. Enhancement of visual perception by crossmodal visuo-auditory interaction. Experimental Brain Research. 2002;147:332–343. doi: 10.1007/s00221-002-1262-y. [DOI] [PubMed] [Google Scholar]
Gordon MS, Allen S. Audiovisual speech in older and younger adults: Integrating a distorted visual signal with speech in noise. Experimental Aging Research. 2009;35:202–209. doi: 10.1080/03610730902720398. [DOI] [PubMed] [Google Scholar]
Grant KW. The effect of speechreading on masked detection thresholds for filtered speech. Journal of the Acoustical Society of America. 2001;109:2272–2275. doi: 10.1121/1.1362687. [DOI] [PubMed] [Google Scholar]
Grant KW, Seitz PF. The use of visible speech cues for improving auditory detection of spoken sentences. Journal of the Acoustical Society of America. 2000;108:1197–1208. doi: 10.1121/1.1288668. [DOI] [PubMed] [Google Scholar]
Haegerstron-Portnoy G, Schneck ME, Brabyn JA. Seeing into old age: Vision function beyond acuity. Optometry and Vision Science. 1999;76:141–158. doi: 10.1097/00006324-199903000-00014. [DOI] [PubMed] [Google Scholar]
Shoop C, Binnie CA. The effects of age upon the visual perception of speech. Scandinavian Audiology. 1979;8:3–8. doi: 10.3109/01050397909076295. [DOI] [PubMed] [Google Scholar]
Sommers MS, Tye-Murray N, Spehar B. Auditory-visual speech perception and auditory-visual enhancement in normal-hearing younger and older adults. Ear and Hearing. 2005;26:263–275. doi: 10.1097/00003446-200506000-00003. [DOI] [PubMed] [Google Scholar]
Spence C, Senkowski D, Roder B. Crossmodal processing. Experimental Brain Research. 2009;198:107–111. doi: 10.1007/s00221-009-1973-4. [DOI] [PubMed] [Google Scholar]
Stein B, Meredith M. The Merging of the Senses. Cambridge, MA: MIT Press; 1993. [Google Scholar]
Tye-Murray N. Conversation Made Easy: Speechreading and Conversation Training for Individuals Who Have Hearing Loss (Adults and Teenagers) [Aural rehabilitation training materials, Three 6-set CD-ROM training programs] St. Louis: Central Institute for the Deaf; 2002. [Google Scholar]
Tye-Murray N, Sommers MS, Spehar B, Myerson J, Hale S. Aging, integration, and the Principle of Inverse Effectiveness. Ear and Hearing. 2010;31(5):636–644. doi: 10.1097/AUD.0b013e3181ddf7ff. [DOI] [PMC free article] [PubMed] [Google Scholar]
Walden BE, Busacco DA, Montgomery AA. Benefit from visual cues in auditory-visual speech recognition by middle-aged and elderly persons. Journal of Speech and Hearing Research. 1993;36:431–436. doi: 10.1044/jshr.3602.431. [DOI] [PubMed] [Google Scholar]

[R1] American National Standards Institute. ANSI S3.6-1989. New York: Author; 1989. Specifications for audiometers. [Google Scholar]

[R2] Andersen TS, Mamassian P. Audiovisual integration of stimulus transients. Vision Research. 2008;48:2537–2544. doi: 10.1016/j.visres.2008.08.018. [DOI] [PubMed] [Google Scholar]

[R3] Bernstein LE, Auer ET, Takayanagi S. Auditory speech detection in noise enhanced by lipreading. Speech Communication. 2004;44:5–18. [Google Scholar]

[R4] Feld JE, Sommers MS. Lipreading, Processing Speed, and Working Memory in Younger and Older Adults. Journal of Speech, Language, and Hearing Research. 2009;52:1555–1565. doi: 10.1044/1092-4388(2009/08-0137). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Frassinetti F, Bolognini N, Bottari D, Bonora A, Ladavas E. Audiovisual integration in patients with visual deficit. Journal of Cognitive Neuroscience. 2005;17:1442–1452. doi: 10.1162/0898929054985446. [DOI] [PubMed] [Google Scholar]

[R6] Frassinetti F, Bolognini N, Ladavas E. Enhancement of visual perception by crossmodal visuo-auditory interaction. Experimental Brain Research. 2002;147:332–343. doi: 10.1007/s00221-002-1262-y. [DOI] [PubMed] [Google Scholar]

[R7] Gordon MS, Allen S. Audiovisual speech in older and younger adults: Integrating a distorted visual signal with speech in noise. Experimental Aging Research. 2009;35:202–209. doi: 10.1080/03610730902720398. [DOI] [PubMed] [Google Scholar]

[R8] Grant KW. The effect of speechreading on masked detection thresholds for filtered speech. Journal of the Acoustical Society of America. 2001;109:2272–2275. doi: 10.1121/1.1362687. [DOI] [PubMed] [Google Scholar]

[R9] Grant KW, Seitz PF. The use of visible speech cues for improving auditory detection of spoken sentences. Journal of the Acoustical Society of America. 2000;108:1197–1208. doi: 10.1121/1.1288668. [DOI] [PubMed] [Google Scholar]

[R10] Haegerstron-Portnoy G, Schneck ME, Brabyn JA. Seeing into old age: Vision function beyond acuity. Optometry and Vision Science. 1999;76:141–158. doi: 10.1097/00006324-199903000-00014. [DOI] [PubMed] [Google Scholar]

[R11] Shoop C, Binnie CA. The effects of age upon the visual perception of speech. Scandinavian Audiology. 1979;8:3–8. doi: 10.3109/01050397909076295. [DOI] [PubMed] [Google Scholar]

[R12] Sommers MS, Tye-Murray N, Spehar B. Auditory-visual speech perception and auditory-visual enhancement in normal-hearing younger and older adults. Ear and Hearing. 2005;26:263–275. doi: 10.1097/00003446-200506000-00003. [DOI] [PubMed] [Google Scholar]

[R13] Spence C, Senkowski D, Roder B. Crossmodal processing. Experimental Brain Research. 2009;198:107–111. doi: 10.1007/s00221-009-1973-4. [DOI] [PubMed] [Google Scholar]

[R14] Stein B, Meredith M. The Merging of the Senses. Cambridge, MA: MIT Press; 1993. [Google Scholar]

[R15] Tye-Murray N. Conversation Made Easy: Speechreading and Conversation Training for Individuals Who Have Hearing Loss (Adults and Teenagers) [Aural rehabilitation training materials, Three 6-set CD-ROM training programs] St. Louis: Central Institute for the Deaf; 2002. [Google Scholar]

[R16] Tye-Murray N, Sommers MS, Spehar B, Myerson J, Hale S. Aging, integration, and the Principle of Inverse Effectiveness. Ear and Hearing. 2010;31(5):636–644. doi: 10.1097/AUD.0b013e3181ddf7ff. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Walden BE, Busacco DA, Montgomery AA. Benefit from visual cues in auditory-visual speech recognition by middle-aged and elderly persons. Journal of Speech and Hearing Research. 1993;36:431–436. doi: 10.1044/jshr.3602.431. [DOI] [PubMed] [Google Scholar]

PERMALINK

Crossmodal enhancement of speech detection in young and older adults: Does signal content matter?

Nancy Tye-Murray

Brent Spehar

Joel Myerson

Mitchell S Sommers

Sandra Hale