Abstract
Recent models of the neural encoding of speech suggest a core role for amplitude modulation (AM) structure, particularly regarding AM phase alignment. Accordingly, speech tasks that measure linguistic development in children may exhibit systematic properties regarding AM structure. Here the acoustic structure of spoken items in child phonological and morphological tasks, phoneme deletion and plural elicitation, was investigated. The phase synchronisation index (PSI), reflecting the degree of phase alignment between pairs of AMs, was computed for 3 AM bands (delta, theta, beta/low gamma; 0.9-2.5 Hz, 2.5-12 Hz, 12-40 Hz respectively), for five spectral bands covering 100 – 7250 Hz. For phoneme deletion, data from 94 child participants with and without dyslexia was used to relate AM structure to behavioural performance. Results revealed that a significant change in magnitude of the phase synchronisation index (ΔPSI) of slower AMs (delta-theta) systematically accompanied both phoneme deletion and plural elicitation. Further, children with dyslexia made more linguistic errors as the delta-theta ΔPSI increased. Accordingly, ΔPSI between slower temporal modulations in the speech signal systematically distinguished test items from accurate responses and predicted task performance. This may suggest that sensitivity to slower AM information in speech is a core aspect of phonological and morphological development.
Keywords: phonology, morphology, amplitude modulation, phase synchronisation
I. Introduction
Classically, the study of children’s spoken language acquisition has been investigated using a spectrogram model of the speech signal. The acoustic speech signal is highly complex, and the spectrogram is one way of conceptualising this complexity, depicting the presence of energy across frequency over time. This depiction highlights the importance of rapidly-changing acoustic cues such as voice onset time and formant structure in word formation. Such rapidly-changing cues have consequently been considered the primary basis for language development (e.g., Eimas, 1970), and for developmental disorders of language learning (e.g., Specific Language Impairment, SLI, and developmental dyslexia; Tallal, 2004). However, the speech signal has also been analysed in terms of slower changes in intensity or energy (amplitude modulation, AM) over time, i.e. the speech ‘amplitude envelope’ (Shannon et al., 1995, Loizou 1999). The slower envelope changes are found to be necessary for speech perception (Kanedera, 1999) and have been successfully modelled in speech comprehension studies (Elliot & Theunissen, 2009). Sensitivity to slower AM-related cues may thus also play a role in language learning and developmental language disorders (Goswami, 2011). In the current study, the acoustic structure of the amplitude envelope was analysed for two popular child speech tasks, phoneme deletion and plural elicitation. The analyses suggested a core role for the phase synchronisation of slower AMs in the envelope for successful responding in each task.
The focus here on the amplitude envelope was motivated by recent neural models of speech processing (Giraud & Poeppel, 2012), in which acoustic AM patterns nested in the amplitude envelope (energy fluctuations at different temporal rates) and their phase relations are a key target of successful neural encoding. Most of the slow energy modulations within the amplitude envelope reflect intensity patterns associated with syllable production (Greenberg, 2006). Nevertheless, within the overall amplitude envelope are the many amplitude envelopes of the constituent frequencies changing at different temporal rates, sensitivity to all of which may in principle be important for language encoding. If neural encoding of such modulations is central to linguistic behaviour, then AM structure might also play a key role in successful responding in different tasks that measure language development. Two such tasks were selected for analysis, phoneme deletion and plural elicitation. In phoneme deletion tasks, a target phoneme must be removed from an item, and in plural elicitation tasks, the morpheme s must be added to an item. These tasks are widely used for diagnostic purposes in the developmental language literature (e.g., Melby-Lervåg et al., 2012). The goal of the analyses was to investigate in principle whether individual differences in sensitivity to the amplitude modulation hierarchy, nested in the amplitude envelope, might play a role in accurate task performance.
The modelling built on the child-directed speech work of Leong et al. (2014), who articulated an AM-based modelling perspective motivated by adult ‘multi-time resolution’ models of neural speech encoding (e.g., Poeppel, 2003; Greenberg, 2006; Chait et al., 2015). Multi-time resolution models propose that cortical oscillatory networks encode temporal modulation patterns in speech at the speech-relevant rates of delta (~1 – 3 Hz), theta (~4 – 8 Hz), beta (~15 – 30 Hz), and low gamma (~30 – 50 Hz; rates from Poeppel, 2014), binding the information together to give the final speech percept (Ghitza, 2011; Giraud & Poeppel, 2012). The temporal alignment of neural oscillatory rhythms with speech rhythms is called oscillatory phase alignment or entrainment. For example, Gross et al. (2013) demonstrated that the adult brain encodes the temporal modulation structure of speech by responding in hierarchical phase-phase and phase-amplitude relationships in the delta, theta and gamma bands, with delta-band responding governing this neural hierarchy. Accurate entrainment by adults depends in part on sensitivity to amplitude ‘rise times’, also known as auditory ‘edges’. It is these edges, of the amplitude modulations in the continuous acoustic signal, that phase-reset oscillating cell networks so that their oscillations become phase-aligned with corresponding amplitude modulations in the speech signal (Gross et al., 2013; Doelling et al., 2014). The importance of oscillatory phase entrainment to the temporal modulation patterns in speech for successful speech encoding and comprehension is now relatively well-established in studies with adults (Giraud & Poeppel, 2012; Poeppel, 2014; for reviews).
Oscillatory phase entrainment to speech by children has to date only been investigated in populations with developmental dyslexia. It has been established that children with developmental dyslexia in a range of languages (English, Spanish, French, Finnish, Hungarian, Dutch and Chinese) are relatively insensitive to amplitude envelope rise times (i.e. edges) (Goswami, 2015, for a recent review). Further, in both English and Spanish, oscillatory phase entrainment in the delta band appears atypical in children with developmental dyslexia (in tasks using syllables, sentences or stories, see Power et al., 2013; 2016; Molinaro et al., 2016). These data are consistent with the hypothesis that the linguistic processing difficulties found in children with developmental dyslexia may be related to atypical neural entrainment to the AM structure of speech (Goswami, 2011). However, the AM structure of speech tasks used to identify children with dyslexia (or with SLI) has not yet been widely investigated. The exception is rhyme awareness, in which analyses of the AM structure of rhyming words showed that delta-rate AMs played a key role in phonological similarity judgements (Leong & Goswami, 2016). In the current modelling studies, the aim is to investigate whether the phase alignment of amplitude modulation information in the speech signal of items in other child phonological and morphological tasks might carry important phonological or morphological information. If this were the case, then individual differences in acoustic sensitivity to AM information might affect children’s ability to use AM cues in the speech signal to learn phonology and/or morphology.
II. Methods
A. Participants
Ninety-four children provided behavioural data for correlational analyses for the phoneme deletion task: 41 children with dyslexia ([DYS], 22 male, 19 female); 29 chronological age-matched controls ([CA], 12 male, 17 female); and 24 reading-level matched controls ([RL], 11 male, 13 female). The children were all taking part in a longitudinal study of developmental dyslexia (see Goswami et al., 2013), and the data used here were collected in Year 4 of the study, when the children with dyslexia were aged on average 11 years. The children with dyslexia were recruited via learning support teachers, and only children who had no additional learning difficulties (e.g. dyspraxia, ADHD, autistic spectrum disorder, specific language impairment), a nonverbal IQ above 85, and English as the first language spoken at home were included. All children received a short hearing screen using an audiometer. Sounds were presented in both the left or right ear at a range of frequencies (250, 500, 1000, 2000, 4000, 8000Hz), and all children were sensitive to sounds at 20dB HL or less for both ears across all frequencies.
B. Modelling
1. Derivation of the SAMPH representation
The S-AMPH representation was achieved with a two-stage filtering process, following Leong and Goswami (2016). First, the raw acoustic signal was band-pass filtered into 5 spectral bands using a series of adjacent finite impulse response (FIR) filters. These 5 bands were: (1) 100-300 Hz ; (2) 300-700 Hz ; (3) 700-1750 Hz ; (4) 1750-3900 Hz ; and (5) 3900-7250 Hz. Next, the Hilbert envelope was extracted from each of the 5 sub-band filtered signals. These Hilbert envelopes were then passed through a second series of band-pass filters in order to isolate the 3 different AM rate bands. These 3 AM rates are designated here delta (0.9-2.5 Hz), theta (2.5-12 Hz) and beta/low gamma (12-40 Hz). The result of this two-step filtering process was a 5 x 3 spectro-temporal representation of the speech envelope, made up of 15 AMs in total.
2. Phase Synchronisation Index (PSI): A Multi-Timescale Synchronisation Measure
To compute the phase synchronisation between AMs in the three different temporal bands, a Phase Synchronisation Index (PSI) was applied, utilising methods originally devised to quantify the phase synchronisation of two oscillators at different frequencies (Tass et al., 1998; Shack & Weiss 2005). The PSI is calculated as follows,
Equation (1) |
where n and m are integers and relate to the frequency ratio of the oscillations, and θ1 and θ2 are the instantaneous phases of the oscillations being compared. In equation 1, the PSI is the magnitude of the average difference in the phase angles. This can range from 0 (no synchronisation) to 1 (complete synchronisation).
Figure 1 illustrates an idealised pair of oscillations, with n:m frequency ratio of 1:2. The upper pair has a PSI value of 1, the lower pair has a PSI value of 0.4.
The temporal synchronisation between the pairs of speech AM bands derived via the S-AMPH (delta-theta; and theta-beta/low gamma) was computed using the same formulae as Leong and Goswami (2014), utilising the n:m PSI. Following Leong and Goswami (2014) and Leong et al (2014) the n:m PSI ratio used was 2:1 for the delta-theta AMs and 3:1 for the theta-beta/low gamma AMs.
C. Speech materials
1. Phoneme deletion task
The task was based on a task devised by McDougall et al. (1994), who asked children to listen to a spoken item and delete a target phoneme, for example, “say ‘hift’ without the /f/”. Task items were designed so that the target response was a real word known to children (here, ‘hit’). The sounds to be deleted were either initial, medial or final consonant phonemes. The task comprised 15 trials and the items in the trials (with the target phoneme for deletion in brackets) were bloo(t) - blue; toe(b) – toe; (b)eel – eel; (g)lamp – lamp; (b)rock - rock; (s)trail – trail; c(l)art – cart; s(p)low – slow; s(t)ip – sip; star(p) – star; force(k) – force; bir(l)d – bird; hi(f)t – hit; ma(k)t – mat; cro(t)ss – cross. These items were recorded for acoustic analysis by a female speaker of standard Southern British English as both test item (e.g., hift, starp) and response (e.g. hit, star). The words were recorded digitally using a Bayer Dynamic cardiod microphone with a Tascam digital tape recorder at a sampling rate of 48 kHz. In preparation for SAMPH speech analysis each sound file was converted to mono, and down sampled to 16 kHz in MATLAB©. The sound files were then manually edited using Audacity© software to produce eight example .wav files of each word. On average the sound files were 1s long including 100 ms silence at beginning and end. In order to equalise for differences in loudness, each sound file was normalised and scaled to be between +1/-1.
2. Plural Elicitation task
In English, the change from singular to plural is an example of a single phoneme change that is important for morphology, as the morpheme s is added to a noun. Berko (1958) measured children’s production of inflectional morphemes using a plural elicitation task. Children were asked to produce plural forms for pictured nonword items, for example describing imaginary animals (e.g., wug – wugs; gutch-gutches). The preschool children studied by Berko (1958) were very successful with some plural forms (e.g., 91% were successful with wug-wugs), but not with others (36% were successful for gutch-gutches), implying that phonology at the rhyme level may have played a role in children’s success in this morphology measure (see Cumming et al., 2015). To investigate whether slower AM phase synchronisation information might also be an acoustic correlate of morphemic changes, the S-AMPH PSI analyses were applied to the items used in Berko’s (1958) original plural elicitation task (wug-wugs, lun-luns, tor-tors, heaf-heafs, era-eras, tass-tasses, gutch-gutches, kazh-kazhes, niz-nizzes, glass-glasses). The words were spoken by the first author (also a female native speaker of British English), and were recorded digitally using a AKG© C1000S cardioid microphone with a Tascam© DR-100 digital recorder at a sampling rate of 48 kHz. For the purposes of the speech analysis 8 examples of each item were recorded (singular and then plural), giving 160 items for analysis. Each sound file was then converted to mono, and down sampled to 16 kHz in MATLAB©. The sound files were edited using Audacity© software to produce eight example .wav files of each item (e.g., 8 files for wug, 8 files for wugs). On average, the sound files were 1s long including 100 ms silence at beginning and end. In order to equalise for differences in loudness, each sound file was normalised and scaled to be between +1/-1.
D. Measures of language, non-verbal IQ, and rise time sensitivity
Participating children received standardised measures of reading and spelling (British Ability Scales [BAS], Elliott et al, 1996; Test of Word Reading Efficiency [TOWRE], Torgesen et al., 1999), receptive language development (British Picture Vocabulary Scales [BPVS], Dunn et al., 1982) and a non-verbal subscale from the Wechsler Intelligence Scale for Children (WISC-III,Wechsler, 1992: Picture Arrangement). These tasks were administered at the same time as the phoneme deletion task was administered, and group data are shown in Table I. A psychoacoustic threshold task measuring sensitivity to amplitude envelope rise times (1 Rise AXB task, see Goswami et al., 2013 for detail) was also administered, and the children with dyslexia were significantly less sensitive to rise time than both the RL and CA controls. For further detail on the sample, please see Goswami et al. (2013).
Table I.
DYS N = 41 |
CA N = 29 |
RL N= 24 |
One-way ANOVA F(2,93) |
|
---|---|---|---|---|
Age in monthsa | 136.9 (13.5) |
136.1 (12.6) |
109.8 (7.1) |
45.3*** |
Reading age in monthsb | 106.5 (19.3) |
150.9 (23.2) |
113.5 (19.4) |
42.2*** |
BAS SSc | 84.1 (10.1) |
109.8 (10.9) |
103.3 (13.3) |
50.0*** |
TOWRE Real Words SSc | 86.9 (10.7) |
104.8 (9.9) |
106.3 (10.3) |
37.1*** |
TOWRE Nonwords SSd | 86.3 (9.7) |
109.8 (12.4) |
101.9 (12.9) |
38.1*** |
BAS Spelling SSc | 81.7 (10.0) |
105.1 (8.8) |
103.2 (13.0) |
53.7*** |
BPVS SS | 102.7 (12.1) |
108.3 (8.7) |
106.4 (8.8) |
2.7 |
WISC NVIQe | 14.2 (4.3) |
13.7 (3.2) |
14.0 (4.7) |
0.09 |
Phoneme deletion d (out of 15) |
8.1 (2.8) |
11.6 (2.3) |
9.4 (3.1) |
13.8*** |
1-Rise AXB threshold f (ms) |
104.9 (73.1) |
41.6 (31.7) |
46.0 (21.0) |
12.4*** |
Note. DYS = participants with dyslexia, CA = chronological age matched controls, RL = reading level matched controls, BAS = British Ability Scales, SS = standard score, TOWRE = Test of Word Reading Efficiency, BPVS = British Picture Vocabulary Scales (receptive vocabulary), NVIQ = Wechsler Intelligence Scale for Children Picture Naming Scaled Score (out of 10). Standard deviations are shown in parentheses. *** p< .001
CA = DYS < RL
CA > DYS = RL
RL = CA > DYS
DYS < CA, RL; RL < CA
Scaled Score mean = 10, SD = 1.5
DYS < CA = RL
E. S-AMPH Analysis 1: Phoneme deletion task
The S-AMPH modelling approach was applied to the single-syllable words from the phoneme deletion task. As the focus of the analysis was the acoustic cues that describe phonological differences between items, the temporal modulation structure of the items was analysed in terms of phase synchronisation between pairs of AMs. This enabled description of the acoustic structure of a given item (e.g., hift), and the acoustic structure of the item following deletion of the designated phoneme (e.g., hit), in terms of amplitude modulation structure. From the S-AMPH model output, a phase synchronisation index (PSI) is derived for the speech it is applied to, ranging from 0 (no synchronisation) to 1 (exact synchronisation). PSI values are computed for both the delta-theta AM bands and the theta-beta/low gamma AM bands respectively. An example of the output of these analyses for the item pair hift-hit is provided as Figure 2.
F. S-AMPH Analysis 2: Plural elicitation task
The S-AMPH modelling approach was applied to the single-syllable words from the plural elicitation task using the same computational method as in the phoneme deletion task. The acoustic structure of a given item (e.g., wug), and the acoustic structure of the item following pluralisation (e.g., wugs) was described in terms of PSI values for both the delta-theta AM bands and the theta-beta/low gamma AM bands respectively.
G. Statistical analysis
For both phoneme deletion and plural elicitation, a three-way repeated measures ANOVA was conducted, one ANOVA for each task. Each ANOVA used the PSI scores as the dependent variable and included the factors of AM rate (2 levels, delta-theta and theta-beta/low gamma), Deletion/Pluralisation status (2 levels, item and correct response), and frequency band (5 levels). If a significant change in the phase synchronisation (ΔPSI) between one rather than both pairs of bands is a consistent acoustic correlate of phoneme deletion or plural elicitation, then a significant interaction between Deletion/Pluralisation status and AM Rate would be expected.
For the phoneme deletion task only, the output of the modelling was also related to the behavioural data from 94 children who had performed the phoneme deletion task. Correlations (Spearman’s) between the mean of the children’s performance in the phoneme deletion task and the mean change in PSI values (ΔPSI) between task items and accurate responses for both the delta-theta AM bands and the theta-beta/low gamma AM bands were computed. This was done separately for each group (DYS, CA, RL). The word pair bloot-blue was removed from the analysis as all participant groups performed near ceiling levels (DYS = 93%, CA = 93% RL = 96% correct). Classically, removing the final phoneme from a single syllable is relatively easy for children, as they can begin saying the item and then leave off speaking before the end (Yopp, 1988). Also, in this case the target ‘blue’ is a high frequency word for children, which could explain their excellent performance with this item. Finally, the errors that children produced in the phoneme deletion task were explored in terms of their phonological similarity to the target response.
III. Results
The results for the phoneme deletion task are discussed in section III A, and the results for the plural elicitation task are in section III B.
A. Phoneme deletion task
1. S-AMPH analysis
The result of the S-AMPH two-step filtering process was a 5 x 3 spectro-temporal representation of the speech envelope for each word in the task. A schematic depiction of the results can be seen in Figures 2a and 2b respectively, showing the S-AMPH representations of the speech tokens ‘Hift’ and ‘Hit’.
2. PSI analysis
The temporal synchronisation between the pairs of speech AM bands derived via the S-AMPH (delta-theta; and theta-beta/low gamma) was computed following Leong and Goswami (2014) as described earlier (Section B). Figure 3 shows the PSI between the pairs of bands for the speech tokens ‘Hift’ and ‘Hit’. The delta/theta PSIs are shown in the left hand panel, and the theta/beta low gamma PSIs are shown in the right hand panel.
The average magnitude of change in PSI from test item to accurate response, ie the mean ΔPSI, was then calculated by summing the five PSI difference values from the component spectral bands. ΔPSI was calculated for both the delta-theta AM PSI values and the theta-beta/low gamma AM PSI values. This gives two ΔPSI values for each word pair (x and y) i.e. ΔPSI(delta-theta) and ΔPSI(theta-beta/low gamma) as described in equation 2.
Equation (2) |
Thus the ΔPSI has the potential range of 0 – 5.
Figure 4 shows the mean of the ΔPSI values for all word pairs in the phoneme deletion task.
3. Statistical analysis
To determine if there was a significant difference between the delta-theta and the theta-beta/low gamma ΔPSI scores, a repeated measures ANOVA was conducted as described above (section G). If a change in the phase synchronisation between one rather than both pairs of AM bands is a consistent acoustic correlate of phoneme deletion, then a significant interaction between AM Rate and Deletion status would be expected. The ANOVA showed a significant main effect of AM Rate, F(1,14) = 109.0, p = 0.000, ηρ2 = 0.886, because there was a significant difference between the delta-theta PSI (mean = 0.595) and the theta-beta/low gamma PSI (mean = 0.196). There was also a significant main effect of Deletion status, F(1,14) = 12.98, p = .003, ηρ2 = 0.481, because the temporal modulation characteristics of the spoken items as characterised by the S-AMPH modelling changed consistently with the verbal deletion. Overall, the mean PSI was larger for the items yielded by the phoneme deletion than for the original items. There was no significant effect of Spectral Frequency band, F(4,11) = 2.064, p > .05. This likely reflected the large range of spectral shapes for individual items and the variations between the different word pairs. The theoretically important interaction between AM Rate and Deletion status was also significant, F(1,14) = 7.25, p = 0.017, ηρ2 = 0.341. This reflected the significantly greater ΔPSI between delta-theta AMs compared to theta-beta/low gamma AMs following the phoneme deletion.
4. Correlation analysis
To examine whether children’s performance in the phoneme deletion task was systematically related to the ΔPSI in the speech signal between test items and correct responses, children’s scores were correlated with the delta-theta ΔPSI and the theta-beta/low gamma ΔPSI respectively. The proportion of correct phoneme deletion responses produced by each group as a function of the delta-theta ΔPSI is shown in Figure 5a, while the proportion of correct phoneme deletion responses produced by each group as a function of the theta-beta/low gamma ΔPSI is shown in Figure 5b. Each point in the figure corresponds to the respective group mean for a particular item pair. Individual group (DYS, CA, RL) Spearman’s rank correlations were computed. For delta-theta phase alignment, the proportion of correct phoneme deletion responses produced by the children with dyslexia decreased significantly as the ΔPSI for the delta-theta AM bands increased, r = -.593, p = .025. For the CA participants, the proportion of correct phoneme deletion responses was unrelated to the changes in delta-theta PSIs. The younger RL-matched participants showed a similar pattern to the children with dyslexia, however this effect was not significant, r = -.478, p = .08. No correlations were significant for the theta-beta/low gamma ΔPSI values. Figure 6 depicts performance by item as a function of group and delta-theta ΔPSI. The decrease in performance of the children with dyslexia and the younger RL control children as ΔPSI values increased is visible.
5. Error analysis
Analysis of the errors that the children produced revealed a variety of wrong answers (range 2 – 10) as well as null responses where the child did not give a response or said “I don’t know”. For example, for the item toeb (toe), the errors produced by children were always either “too” or “tab”, whereas for the item crots (cross), children produced 10 different errors including “cots”, “crops”, “crow” and “crot”. For the children with dyslexia, the range of different errors produced was significantly related to the magnitude of the delta-theta PSI difference between the test item and the target response, r = .704, p = .005. For the younger RL-matched children, the relationship was also significant, r = .592, p = .026. For the CA children the correlation was not significant, r = .158, however this group also made a smaller number of errors. There was no correlation between the magnitude of the difference in the theta-beta/low gamma PSI and the range of different errors produced for any of the groups.
B. Plural Elicitation
1. S-AMPH analysis
The average magnitude of the ΔPSI for the spoken items before and after pluralisation (also averaged across all 10 word pairs, such as wug-wugs) is shown in Figure 7 for each pair of AM bands (delta-theta; theta-beta/low gamma). As the figure shows, there is a larger change in synchronisation (i.e. a larger ΔPSI) between the delta-theta AM bands as a result of pluralisation compared to the theta-beta/low gamma AM bands. Hence one acoustic correlate of pluralisation appears to be a significant difference in temporal synchronisation between the delta-theta AMs in the spoken items. The acoustic difference between single and plural forms appears focused on the slower temporal modulations.
2. PSI analysis
To determine if there was a significant difference between the delta-theta and the theta-beta/low gamma ΔPSI scores, a repeated measures ANOVA was conducted as described above (Section G). If a change in the phase synchronisation between one rather than both pairs of AM bands is a consistent acoustic correlate of pluralisation, then a significant interaction between AM Rate and Pluralisation status would be expected. The ANOVA showed a significant main effect of AM Rate, F(1,9) = 115.6, p <.000, ηρ2 = 0.92. The mean PSI value for delta-theta AM synchronisation was significantly larger (mean = 0.516, standard error = 0.03) than the mean PSI value for theta-beta/low gamma AM synchronisation (mean = 0.202, standard error = 0.01). There was also a significant main effect of Pluralisation status, F(1,9) = 16.96, p = .003, ηρ2 = 0.653. The mean PSI value was larger for the singular form (mean = 0.390, SE = 0.02) than the plural form (mean = 0.327, SE = 0.02). The main effect of Spectral Band was not significant, F(4,6) = .440, p > .05. Again, this likely reflected the large range of spectral shape and the variations between the different word pairs. There was also a significant interaction between AM Rate and Pluralisation status, F(1,9) = 18.70, p = .002, ηρ2 = 0.675. This interaction arose because the ΔPSI for the delta-theta AM bands summed across all the spectral frequency bands was significantly greater for the plural forms of the words compared to the singular forms. There was no significant difference in the ΔPSI across bands for the theta-beta/low gamma band AMs accompanying pluralisation.
These analyses suggest that the acoustic differences in the temporal modulation structure of singular versus plural forms in English are primarily in the phase synchronisation between the slower AM bands in the speech signal. As pluralisation in English often involves the addition of a single phoneme (the morpheme s), the acoustic dominance of the slower modulation bands (delta and theta) appears to provide convergent data with the phoneme deletion task analysed previously, namely that slow AM information plays a role in successful responding in what are classically considered changes in phoneme-level information. As will be recalled however, Berko utilised two forms of plural item, wug-wugs and gutch-gutches. The second morphological form involves the addition of a syllable (a schwa sound and then the phoneme /s/ or /z/) rather than the addition of a single phoneme. In order to examine whether the acoustic correlate of the inflectional morpheme (a significant delta-theta ΔPSI) would be consistent across these two forms of pluralisation, a second analysis was carried out to compare the two types of pluralisation. A second repeated measures ANOVA identical to the first was run, but adding the factor Type of Plural (‘s’ or ‘es’), again taking the PSI scores for the items of each type of plural as the dependent variable. The main effect of Type of Plural (‘s’ vs ‘es’) was not significant, F(1,8) = .309, p>.05. As in the earlier analysis, the ANOVA showed significant main effects of AM Rate, F(1,8) = 106.8, p <.000, ηρ2 = 0.93, and Pluralisation status, F(1,8) = 16.78, p = .003, ηρ2 = 0.677, but no effect of Spectral Band, F(4,5) =1.61, p > .05. There was again a significant interaction between AM Rate and Pluralisation status, F(1,8) = 15.97, p = .004, ηρ2 = 0.667, because there was a significant delta-theta ΔPSI for the plural forms of the words compared to the singular forms. All other interactions were non-significant.
IV. Discussion
Given that recent models of the neural encoding of speech suggest a core role for amplitude modulation (AM) structure, particularly regarding AM phase alignment, here we analysed spoken items in child speech tasks from an AM perspective. Our aim was to investigate whether speech tasks that measure linguistic development in children may exhibit systematic properties regarding their AM structure. In particular, we investigated whether the phase synchronisation between slower and faster rates of energy in the speech signal was related systematically to phonological and morphological changes. The modelling of AM phase relations can in principle reveal acoustic parameters likely to be related to phonological and morphological learning by cortical oscillatory networks. The S-AMPH model was applied to two tasks used to index phonological and morphological development respectively.
The first analysis used the spoken items in a phoneme deletion task, a task frequently used to measure children’s phonological awareness. In terms of classic linguistic theory, as this task measures awareness of phoneme-level changes in the speech signal, its acoustic structure should reflect the phonetic segment or distinctive feature level (Stevens, 1980; Blumstein & Stevens, 1981). In multi-time resolution models of neural speech processing, faster (beta- and gamma-band) information should be most important for detecting and manipulating phonetic segments and distinctive features (e.g., Giraud & Poeppel, 2012). Counter-intuitively regarding multi-time resolution models, the S-AMPH modelling revealed that the consistent acoustic correlate of phoneme deletion was a greater change in the phase synchronisation index (a greater ΔPSI) between the slower delta- and theta-rate AM bands. The ΔPSI between the faster AM bands (theta and beta/low gamma) did not relate in any systematic way to phoneme deletion for the items studied. In a second analysis, the S-AMPH model was applied to the spoken items in a plural elicitation task, a measure widely-used with children to index the development of inflectional morphology (Berko, 1958). Mirroring the phoneme deletion task, the consistent acoustic correlate for the morphemic change explored was the degree of change in phase synchronisation, ΔPSI, between the slower rates of amplitude modulation, delta and theta. Additional analysis of the plural elicitation task confirmed that this continued to be the case even when the morphemic change was analysed in linguistic terms of adding a phoneme (wug-wugs) versus adding a syllable (gutch-gutches). At the acoustic level, it was thus slow amplitude information that correlated consistently with the inflectional morpheme.
Consequently, the analyses suggest that sensitivity to the magnitude of phase synchronisation between slower AM bands may be of importance regarding individual differences in children’s phonological and morphological awareness. This finding is consistent with a recent neural study revealing that low frequency cortical oscillations (delta and theta) in themselves carry phonetic information (Di Liberto et al., 2015). The S-AMPH modelling has implications for the sensory/neural basis of phonological and morphological learning by children, for the acoustic cues that may support the computation of phonology and grammar, and for developmental disorders of language learning such as developmental dyslexia and SLI.
Regarding the sensory/neural basis for phonological development, we have argued that early phonological learning is supported by the acoustic hierarchy of AMs that is found when child-directed speech is highly rhythmic (Goswami & Leong, 2013; Leong & Goswami, 2015). From this AM-perspective, sensitivity to the phase alignment between the different AM bands in the speech signal could play an important role in linguistic development. To investigate the AM structure of child-directed speech, the S-AMPH amplitude demodulation approach was originally applied to English nursery rhymes. The modelling showed that the core statistical dependencies in English nursery rhymes were described by 3 hierarchically-nested AM tiers in temporal rate bands corresponding neurally to delta-, theta- and beta/low gamma-rate oscillations, with centre frequencies of ~2 Hz (delta band), ~5 Hz (theta band) and ~20 Hz (beta band; Leong & Goswami, 2015). Leong and Goswami (2015) argued that these AM bands formed a nested relational acoustic structure that in principle could support the extraction by young learners of the phonological hierarchy of stressed syllables, syllables, and onset-rime units in speech (via an automatic process of neural entrainment). If the infant brain does entrain automatically to these acoustic statistical dependencies, and if cortical entrainment is temporally accurate, then the amplitude modulation structure of speech would by itself facilitate the emergence of a rudimentary phonological system. The plural elicitation modelling presented here is also supportive of a role for the acoustic hierarchy of AMs in morphological development. The development of both phonological and morphological knowledge by children would nevertheless also be facilitated by additional acoustic cues, including rapidly-changing cues, as well as by a rich set of social learning mechanisms (e.g., Kuhl, 2007).
Regarding the computation of phonology and grammar, the current studies add important acoustic information concerning the basis for phoneme awareness and plural elicitation to this temporal modulation perspective on language development. On multi-time resolution models of speech processing, the phoneme-level changes measured by the phoneme deletion and plural elicitation tasks should be dependent acoustically on rapid temporal modulations (particularly gamma-band AM information, Giraud & Poeppel, 2012). However, the modelling presented here showed that the acoustic changes that consistently accompanied phoneme deletion or pluralisation were related to the magnitude of the change in synchronisation between delta- and theta-rate AMs in the speech signal: slower temporal modulations. Universal features of linguistic processing, such as automatic neural tracking of the slower temporal modulation patterns in the speech envelope, may hence contribute to both morphological and phonological development across languages in ways that are unexpected within the theoretical context of models of language that assume that phonemic information relies only on rapid acoustic changes. The S-AMPH modelling data suggest a key role for slower amplitude modulations and their phase alignment in both phonological and morphological development, particularly regarding individual differences between children. In our view, mechanisms such as AM phase synchronisation should be regarded as complementary acoustic factors to those identified by more traditional linguistic analyses (Blumstein & Stevens, 1979). The data reported here are quite consistent with data showing that the brain uses transient cues during speech processing. The current findings suggest only that, at least for the two developmental speech tasks analysed here, individual differences in early morphological and phonological learning may depend critically on children’s sensitivity to slow AMs and to delta-theta AM phase synchronisation.
Finally, regarding developmental disorders of language learning, children with developmental dyslexia are known from related work to show functionally atypical neuronal entrainment to speech in the delta band (Molinaro et al., 2016; Power et al., 2013, 2016), which would affect the accuracy of delta-theta phase synchronisation. In the behavioural analyses reported here, children with dyslexia were found to make more errors in the phoneme deletion task as the delta-theta ΔPSI increased. These are the same children who showed atypical delta-band entrainment to speech in the EEG studies reported by Power et al. (2013, 2016). The behavioural findings are consistent with a neural developmental model that accords a primary role to slower temporal modulations in the successful development of a phonological system by the child’s brain. Note further that adult data (Doelling et al., 2014) implicate acoustic sensitivity to AM rise times as critical for automatic neural entrainment to AMs in the speech signal. Children with both developmental dyslexia (Goswami, 2015) and with developmental disorders of spoken language (previously termed Specific Language Impairment, SLI) have amplitude envelope rise time processing difficulties (e.g., Corriveau et al., 2007; Beattie & Manis, 2012; Cumming et al., 2015). According to the analyses presented here, these rise time impairments could affect childrens’ ability to learn both phonological and morphological information from the speech signal. The acoustic structure of the amplitude envelope alone may carry significant information to support phonological and morphological learning by children.
One limitation of the study is that only two, similar (female native southern British English) voices were analysed. However, for the phoneme deletion task, the same speaker provided the speech tokens when delivered as stimuli to the children. Hence the behavioural correlations provided a direct comparison between the acoustic structure of the speech stimuli and the children’s performance on the task. The investigation of different voices (and accents) would, no doubt, produce their own unique values of phase synchronisation. However, theoretically we predict that the significant change in the phase synchronisation (the ΔPSI) between the tokens used to measure phonological and morphological learning would still be predominantly between the delta-theta AM bands, rather than the theta-beta/low gamma AM bands, as was the case for the two example voices analysed here.
V. Summary and Conclusions
In two modelling studies applying an S-AMPH model of the speech signal, the consistent acoustic correlate of the phonological and morphological changes in English speech tasks used with young children was found to be the degree of phase synchronisation change, ΔPSI, between AMs in the delta- and theta-rate bands in the signal. Even though successful performance in both the phonological and morphological tasks studied (phoneme deletion and inflectional morphology for plurals) apparently required phonemic sensitivity, phase synchronisation between the faster temporal rate bands in speech (theta with beta/low gamma) did not contribute in any systematic way to the single-phoneme phonological nor morphological changes studied. Rather, slower temporal modulation information was critical acoustically for successful task responding in each case. These data suggest that the sound systems of natural languages which form the basis for phonological and morphological learning by children may be structured in other or additional ways than by individual segments and features. This possibility is also supported by behavioural data from children. Children with developmental dyslexia made significantly more phoneme deletion errors as the magnitude of delta-theta ΔPSI between item (hift) and response (hit) increased. When phase synchronisation changes were larger, the dyslexic children made significantly more errors and also produced a significantly greater range of different erroneous responses. A similar pattern was apparent for the younger typically-developing children (the RL controls), suggestive of a developmental effect regarding phonological learning. As the delta-theta ΔPSI increases, the similarity space of phonologically-similar words also increases, making the correct answer more difficult to identify. In view of these data, it could be fruitful to apply an amplitude modulation approach to analysing the acoustic structure of different speech tasks used to measure the development of inflectional morphology and phonological awareness in different languages. Such modelling may reveal unanticipated acoustic similarities and differences in the tasks used across languages, for example in terms of AM phase alignment.
Acknowledgements
We thank Natasha Mead, Tim Fosker and Martina Huss for collecting the phoneme deletion data, and Hannah Noble for scoring the item data. This research was supported by the Medical Research Council, grants G0400574 and G0902375 to Usha Goswami. The sponsor played no role in the study design, data interpretation nor writing of the report.
References
- Beattie RL, Manis F. Rise time perception in children with reading and combined reading and language difficulties. J Learn Disabil. 2012;46(3):200–209. doi: 10.1177/0022219412449421. [DOI] [PubMed] [Google Scholar]
- Berko J. The child’s learning of English morphology. Word. 1958;14:150–177. [Google Scholar]
- Blumstein SE, Stevens KN. Acoustic invariance in speech production: Evidence from measurements of the spectral characteristics of stop consonants. J Acoust Soc Am. 1979;66(4):1001–1017. doi: 10.1121/1.383319. [DOI] [PubMed] [Google Scholar]
- Blumstein SE, Stevens KN. Phonetic features and acoustic invariance in speech. Cognition. 1981;10:25–32. doi: 10.1016/0010-0277(81)90021-4. [DOI] [PubMed] [Google Scholar]
- Chait M, Greenberg S, Arai T, Simon JZ, Poeppel D. Multi-time resolution analysis of speech: Evidence from Psychophysics. Front Neurosci. 2015;9:214. doi: 10.3389/fnins.2015.00214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corriveau K, Pasquini E, Goswami U. Basic auditory processing skills and specific language impairment: A new look at an old hypothesis. J Speech Lang Hear Res. 2007;50:647–666. doi: 10.1044/1092-4388(2007/046). [DOI] [PubMed] [Google Scholar]
- Cumming R, Wilson A, Goswami U. Basic auditory processing and sensitivity to prosodic structure in children with specific language impairments: A new look at a perceptual hypothesis. Front Psychol. 2015;6:972. doi: 10.3389/fpsyg.2015.00972. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Di Liberto GM, O’Sullivan JA, Lalor EC. Low-frequency cortical entrainment to speech reflects phoneme-level processing. Curr Biol. 2015;25:2457–65. doi: 10.1016/j.cub.2015.08.030. [DOI] [PubMed] [Google Scholar]
- Doelling KB, Arnal LH, Ghitza O, Poeppel D. Acoustic landmarks drive delta–theta oscillations to enable speech comprehension by facilitating perceptual parsing. Neuroimage. 2014;85:761–768. doi: 10.1016/j.neuroimage.2013.06.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dunn LM, Whetton C, Pintilie D. British Picture Vocabulary Scale. Windsor: NFER: Nelson; 1982. [Google Scholar]
- Eimas PD, Siqueland ER, Jusczyk P, Vigorito J. Speech perception in infants. Science. 1970;171(3968):303–306. doi: 10.1126/science.171.3968.303. [DOI] [PubMed] [Google Scholar]
- Elliott CD, Smith P, McCulloch K. British Ability Scales. (2nd Ed.) Windsor, UK: NFER-NELSON; 1996. [Google Scholar]
- Ghitza O. Linking speech perception and neurophysiology: speech decoding guided by cascaded oscillators locked to the input rhythm. Front Psychol. 2011;2:130. doi: 10.3389/fpsyg.2011.00130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Giraud AL, Poeppel D. Cortical oscillations and speech processing: Emerging computational principles and operations. Nat Neurosci. 2012;15:511–517. doi: 10.1038/nn.3063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goswami U. A temporal sampling framework for developmental dyslexia. Trends Cogn Sci. 2011;15:3–10. doi: 10.1016/j.tics.2010.10.001. [DOI] [PubMed] [Google Scholar]
- Goswami U. Sensory theories of developmental dyslexia: Three challenges for research. Nat Rev Neurosci. 2015;16:43–54. doi: 10.1038/nrn3836. [DOI] [PubMed] [Google Scholar]
- Goswami U, Leong V. Speech rhythm and temporal structure: Converging perspectives? Lab Phonol. 2013;4(1):67–92. [Google Scholar]
- Goswami U, Mead N, Fosker T, Huss M, Barnes L, Leong V. Impaired perception of syllable stress in children with dyslexia: a longitudinal study. J Mem Lang. 2013;69(1):1–17. [Google Scholar]
- Greenberg S. A multi-tier framework for understanding spoken language. In: Greenberg S, Ainsworth W, editors. Understanding speech: An auditory perspective. Mahweh, NJ: LEA; 2006. pp. 411–434. [Google Scholar]
- Gross J, Hoogenboom N, Thut G, Schyns P, Panzeri S, Belin P, et al. Speech rhythms and multiplexed oscillatory sensory coding in the human brain. PLoS Biol. 2013;11(12):e1001752. doi: 10.1371/journal.pbio.1001752. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kanedera N, Arai T, Hermansky H, Pavel M. On the relative importance of various components of the modulation spectrum for automatic speech recognition. Speech Communication. 1999;28:43–55. [Google Scholar]
- Kuhl PK. Is speech learning ‘gated’ by the social brain? Devel Sci. 2007;10(1):110–120. doi: 10.1111/j.1467-7687.2007.00572.x. [DOI] [PubMed] [Google Scholar]
- Leong V, Goswami U. Assessment of rhythmic entrainment at multiple timescales in dyslexia: Evidence for disruption to syllable timing. Hear Res. 2014;308:141–161. doi: 10.1016/j.heares.2013.07.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leong V, Goswami U. Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech. PLoS One. 2015;10(12):e0144411. doi: 10.1371/journal.pone.0144411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leong V, Goswami U. Difficulties in auditory organization as a cause of reading backwardness? An auditory neuroscience perspective. Devel Sci. 2016 doi: 10.1111/desc.12457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loizou PC, Dorman M, Tu Z. On the number of channels to understand speech. J Acoust Soc Am. 1999;106:2097–2103. doi: 10.1121/1.427954. [DOI] [PubMed] [Google Scholar]
- Melby-Lervåg M, Lyster SA, Hulme C. Phonological skills and their role in learning to read: A meta-analytic review. Psychol Bull. 2012;138(2):322–52. doi: 10.1037/a0026744. [DOI] [PubMed] [Google Scholar]
- Molinaro N, Lizarazu M, Lallier M, Bourguignon M, Carreiras M. Out-of-synchrony speech entrainment in developmental dyslexia. Hum Brain Map. 2016 doi: 10.1002/hbm.23206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Poeppel D. The analysis of speech in different temporal integration windows: cerebral lateralization as ‘asymmetric sampling in time’. Speech Commun. 2003;41:245–255. [Google Scholar]
- Poeppel D. The neuroanatomic and neurophysiological infrastructure for speech and language. Curr Opin Neurobiol. 2014;28c:142–149. doi: 10.1016/j.conb.2014.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Power AJ, Mead N, Barnes L, Goswami U. Neural entrainment to rhythmic speech in children with developmental dyslexia. Front Hum Neurosci. 2013;7:777. doi: 10.3389/fnhum.2013.00777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Power AJ, Colling LC, Mead N, Barnes L, Goswami U. Neural encoding of the speech envelope by children with developmental dyslexia. Brain Lang. 2016;160:1–10. doi: 10.1016/j.bandl.2016.06.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shannon RV, Zeng FG, Kamath V, Wygonski J, Ekelid M. Speech recognition with primarily temporal cues. Science New Series. 1995;270(5234):303–304. doi: 10.1126/science.270.5234.303. [DOI] [PubMed] [Google Scholar]
- Stevens KN. Acoustic correlates of some phonetic categories. J Acoust Soc Am. 1980;68(3):836–42. doi: 10.1121/1.384823. [DOI] [PubMed] [Google Scholar]
- Tallal P. Opinion - Improving language and literacy is a matter of time. Nat Rev Neurosci. 2004;5(9):721–728. doi: 10.1038/nrn1499. [DOI] [PubMed] [Google Scholar]
- Tass P, Rosenblum MG, Weule J, Kurths J, Pikovsky A, Volkmann J, Schnitzler A, Freund HJ. Detection of n:m phase locking from noisy data: application to magnetoencephalography. Physical Rev Lett. 1998;81:3291. [Google Scholar]
- Torgesen JK, Wagner RK, Rashotte CA. Test of Word Reading Efficiency (TOWRE) Austin, TX: Pro-Ed; 1999. [Google Scholar]
- Wechsler D. Wechsler Intelligence Scale for Children (WISC – III) (3rd Ed.) Kent, UK: The Psychological Corporation; 1992. [Google Scholar]
- Yopp HK. The validity and reliability of phoneme awareness tests. Read Res Quart. 1988;23(2):159–77. [Google Scholar]