Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Sep 1.
Published in final edited form as: Ann N Y Acad Sci. 2022 Jun 26;1515(1):266–275. doi: 10.1111/nyas.14817

Auditory-motor mapping training: Testing an intonation-based spoken language treatment for minimally verbal children with autism spectrum disorder

Karen V Chenausky 1,2, Andrea C Norton 3, Helen Tager-Flusberg 4, Gottfried Schlaug 2,3,5
PMCID: PMC10264969  NIHMSID: NIHMS1809879  PMID: 35754007

Abstract

We tested an intonation-based speech treatment for minimally verbal children with autism (Auditory-Motor Mapping Training, AMMT) against a non-intonation–based control treatment (Speech Repetition Therapy, SRT). AMMT involves singing, rather than speaking, two-syllable words or phrases. In time with each sung syllable, therapist and child tap together on electronic drums tuned to the same pitches, thus co-activating shared auditory and motor neural representations of manual and vocal actions, and mimicking the “babbling and banging” stage of typical development. Fourteen children (three female), aged 5;0–10;8, with a mean Autism Diagnostic Observation Schedule-2 (ADOS-2) score of 22.9 (SD = 2.5) and a mean Kaufman Speech Praxis Test (KSPT) raw score of 12.9 (SD = 13.0) participated in this trial. The main outcome measure was % Syllables Approximately Correct. Four weeks post-treatment, AMMT resulted in a mean improvement of +12.1 (SE = 3.8) percentage points, compared to +2.8 (SE = 5.7) percentage points for SRT. This between-group difference was associated with a large effect size (Cohen’s d = 0.82). Results suggest that simultaneous intonation and bimanual movements presented in a socially engaging milieu are effective factors in AMMT and can create an individualized, interactive music-making environment for spoken-language learning in minimally verbal children with autism.

Keywords: autism, music therapy, minimally verbal, intonation, speech therapy

Graphical abstract

We tested an intonation-based speech treatment for minimally verbal children with autism (Auditory-Motor Mapping Training, AMMT) against a non-intonation–based control treatment (Speech Repetition Therapy, SRT). AMMT involves singing, rather than speaking, two-syllable words or phrases. In time with each sung syllable, therapist and child tap together on electronic drums tuned to the same pitches, thus co-activating shared auditory and motor neural representations of manual and vocal actions, and mimicking the “babbling and banging” stage of typical development.

INTRODUCTION

Of the approximately 1 in 44 U.S. children each year who receive diagnoses of autism spectrum disorder1 (ASD), roughly one-quarter will remain minimally verbal past age five2. These children use a limited vocabulary of single words or fixed phrases to communicate, and they experience high rates of challenging behaviors such as aggression and self injury36.

Until recently, these children were not included in research studies because they are challenging to assess, especially with standardized tests. However, in the past decade or so, assessment techniques have been refined and more studies investigating the effects of communication treatment for these severely affected children have been conducted. While strategies such as reinforcing attempts at verbalization and speech shaping have been used for decades, generally in a discrete-trial context7,8, recent work has employed more naturalistic approaches, sensorimotor supports, and intonation-based treatment for developing spoken language in minimally verbal children with ASD aged 5 years and older.

For example, one study9 compared two versions of a naturalistic developmental behavioral intervention: one employing a play-based milieu alone to one incorporating speech-generating devices (SGDs) in a play-based milieu in 61 minimally verbal children aged 5–8 years. The play-based milieu included systematic modeling and prompts to promote joint attention, symbolic play skills, and independent functional communication in the context of responsive social interactions. In a sequential multiple assignment randomized trial, 30 children received behavioral therapy alone, while 31 received behavioral therapy and the use of speech-generating devices (SGDs). After 12 weeks of therapy, the mean number of social communicative utterances was significantly greater for the group using behavioral therapy plus SGDs, by a mean of 19.4 utterances, showing that forms of therapy other than discrete-trial treatments can improve aspects of spoken language in minimally verbal autistic children.

Another novel treatment, this one an intonation-based therapy, Auditory Motor Mapping Training (AMMT), has also been shown to be effective for minimally verbal children with ASD10. AMMT involves singing two-syllable target words or phrases rather than speaking them. In time with each syllable, therapist and child tap on electronic drums tuned to the same two pitches that are sung. This co-activates shared auditory and motor neural representations of manual and vocal actions1113 and mimics the “babbling and banging” stage of typical development1416. Music has often been used to support spoken language development in minimally verbal children with ASD1720. Its neurological basis is in putatively strengthening white-matter pathways that integrate auditory and motor information, including the arcuate fasciculus, uncinate fasciculus, extreme capsule, and frontal aslant tract2124. At the same time, AMMT capitalizes on the complementary laterality for language regions in minimally verbal children with ASD25 by emphasizing the prosodic aspects of spoken language, which are processed by the right hemisphere.

It has been shown that 40 sessions of AMMT resulted in statistically significant improvement in the ability of six minimally verbal children with ASD to pronounce a set of 15 stimuli they practiced in therapy, as well as to generalize those skills to 15 stimuli that were assessed, but not trained, in the study10. At baseline, no children in the study could correctly pronounce any of the target words, but after treatment children’s scores on a measure of word approximation, syllables approximately correct, averaged 29% (range: 8%–71%).

Later, the performance of 23 minimally verbal children with ASD who received 25 sessions of AMMT was compared with that of 7 who received a control treatment, Speech Repetition Therapy, that did not involve intonation or drumming19. Stimuli for this study were identical to those in the previous study; here, the AMMT group improved by an average of 19.4 percentage points on syllables approximately correct, compared to an average increase of only 3.6 percentage points for the SRT group.

A more recent study26 examined the effect of a version of AMMT, modified for Mandarin, a tonal language. 12 minimally verbal children with ASD who were randomly assigned to receive 12 sessions of a version of AMMT improved their word production intelligibility by an average of 17.4 percentage points compared to an average increase of 11.9 percentage points in 12 who were randomly assigned to receive SRT. Both groups also improved tone production on trained items, with the AMMT group showing significantly higher accuracy than the SRT group. For untrained items, only the AMMT group improved; the SRT group showed no improvement on untrained words.

In short, then, various forms of spoken-language treatment exist for minimally verbal children with ASD. They employ discrete-trial or naturalistic, developmentally-informed techniques and address (or compensate for) factors such as joint attention, symbolic communication, speech ability, and sensorimotor skills—all of which affect expressive language in this population. These treatments all produce some improvement in spoken language for most children who receive them.

Here, we report on a randomized controlled trial (RCT) comparing the effects of AMMT and SRT in English-speaking children. While AMMT has been shown in an RCT to improve word production in Mandarin-speaking children, we wished to stringently test whether it would do so for children who do not speak a tonal language. Using an intonation-based treatment may be logical for a tonal language, in which different prosodic contours signal a difference in meaning between words. However, it is less clear whether an intonation-based treatment, when compared to a speech-repetition based treatment in a randomized assignment, would differentially improve speech production accuracy in a non-tonal language. Our main outcome measure was % Syllables Approximately Correct and our secondary outcome measures were % Consonants Correct and % Vowels Correct.

METHODS

Participants

Twenty-seven children between the ages of 5;0 and 11;0 were assessed for eligibility. To be included in the study, participants had to meet the following criteria: diagnosis of ASD, confirmed by assessment with the Autism Diagnostic Interview-Revised27 (ADI-R) and the Autism Diagnostic Observation Schedule-228 (ADOS-2) administered by research-reliable examiners; minimally verbal status, confirmed by an expressive vocabulary of 20 or fewer words produced during the ADOS; the ability to repeat at least two phonemes on the Kaufman Speech Praxis Test29 (KSPT) or a phoneme repetition test; and the ability to follow at least one-step commands such as “clap hands” or to imitate these actions. These latter skills were verified at baseline for each child using direct assessment. Exclusion criteria included comorbidities such as hearing or sight impairment, Down’s syndrome, Fragile X, or seizure disorder.

The power analysis for this study was based on pilot phase data10 suggesting that, with an assumed effect size of 2 (corresponding to a mean improvement of 29 percentage points on our main outcome measure), a total of 7 subjects per group were required to show a significant treatment effect with 90% power and α equal to 0.05. The study was registered at clinicaltrials.gov (NCT identifier 03015272).

Fifteen children met initial inclusion criteria for the study. Two were Hispanic; nine were white, four were Asian, one was African-American, and one was mixed race. Eight children (two female) were randomly assigned to AMMT and seven (one female) to SRT by a researcher blind to the group code and not involved in child assessment. After randomization, during baseline assessments, one child from the SRT group spoke in sentences (“I want the black box”, “I want to play”) and was withdrawn from the study as not meeting inclusion criteria. The remaining 14 children completed 3 (n = 11) or 4 (n = 3) baseline assessment sessions, 25 therapy sessions, and four probe assessment sessions (after 10, 15, 20, and 25 treatments, respectively). Thirteen children completed a maintenance probe assessment four weeks after the end of treatment; one child from the AMMT group failed to return for the post 4-week probe. The CONSORT flowchart appears in Figure 1, and children’s baseline characteristics are detailed in Table 1.

Figure 1.

Figure 1.

CONSORT flowchart. AMMT: Auditory-motor mapping training; SRT: Speech repetition therapy.

Table 1.

Participant characteristics at Baseline.

Group Age ADOS-2a KSPTb PPVT-4c Leiter-3d Phonetic Inventorye %Sacf %Ccg %Vch
Overall
μ ± SD
[min–max]

6;8 ± 1;7
[5;0–10;8]

22.9 ± 2.5
[19–28]

13.6 ± 10.7
[4–38]

9.0 ± 9.9
[0–30]

69.1 ± 8.5
[56–83]

7.9 ± 7.9
[0–22]

24.4 ± 20.4
[0–71.1]

24.3 ± 14.2
[8.2–57.0]

24.2 ± 18.8
[1.6–70.6]
AMMT
n = 8 (2 F)

7;0 ± 1;11

22.8 ± 1.7

14.8 ± 11.2

8.1 ±9.3

68.0 ± 8.9

9.9 ± 7.2

16.7 ± 9.6

23.4 ± 7.2

15.4 ± 8.1
SRT
n = 6 (1 F)

6;5 ± 1;0

23.3 ± 3.4

12.2 ± 10.9

10.2 ± 11.5

70.5 ± 8.6

5.2 ± 8.5

34.7 ± 26.9

25.4 ± 21.2

35.8 ± 23.3
a

ADOS-2: Autism Diagnostic Observation Schedule-2, Module 1. Score ≥ 16 for diagnosis of autism; max score = 28.

b

KSPT: Kaufman Speech Praxis Test, Sections 1 and 2. Max raw score = 74.

c

PPVT-4: Peabody Picture Vocabulary Test, raw score.

d

Leiter-3: Leiter International Performance Scale-3rd Edition, NVIQ composite (standard) score.

e

Phonetic Inventory: Number of English consonants and vowels child could repeat correctly at baseline. Max = 31.

f

%Sac: Percent syllables approximately correct (at baseline).

g

%Cc: Percent consonants correct (at baseline).

h

%Vc: Percent vowels correct (at baseline).

Baseline measures

Autism diagnosis was confirmed by administration of the ADOS-2 to the participant and by administration of the ADI-R to the parent. The ADOS-2 is an interactive assessment, administered by research-reliable assessors, in which various opportunities to show joint attention, requesting, and commenting skills are provided to the child. Module 1, for children younger than 10 years who are not using phrase speech, was employed for all participants in this study. The maximum score is 28, with a score of at least 16 required to meet criteria for ASD. The ADI-R is a structured interview that elicits detailed information about the child’s development that pertains to the features of autism.

Children’s ability to repeat speech was assessed using two instruments. First was the KSPT. Section 1 of the KSPT (11 items) assesses nonspeech oromotor skills such as the ability to spread and pucker lips, while Section 2 (63 items) includes repetition of stimuli ranging from single phonemes to two-syllable nonreduplicated words such as “happy” and “tuna”. Total raw scores are reported in Table 1. Children were also asked to repeat the consonants and vowels of English (31 phonemes total) to derive a Phonetic Inventory Score. Correct repetition of a minimum of two phonemes, either on the KSPT or the Phonetic Inventory, was an inclusion criterion.

Children’s receptive vocabularies were assessed using the Peabody Picture Vocabulary Test30 (PPVT). Children are presented a page with four images and asked to point to one of the items (e.g., “show me ‘ball’!”). Raw scores on the PPVT are reported in Table 1 rather than percentile scores as the former are more informative in this population.

Finally, nonverbal IQ was assessed using the Leiter International Performance Scale (Leiter)31, which includes sequential ordering tasks, pattern completion tasks, and figure-ground tasks. Standard scores are reported in Table 1.

Interventions

AMMT

In AMMT, the bisyllabic stimuli were sung, one note per syllable, at a pace of approximately one syllable per second, and the therapist and child tapped drums (tuned to the same two notes, middle C and E-flat, with the stressed syllable receiving the higher note) in time with each syllable. Drums were occasionally removed from the table during an AMMT session if the child found them a distraction. In these cases, however, intonation and tapping on the table in time to each syllable were still employed. Words and phrases were accompanied by a laminated picture illustrating each image, as a visual cue, and children had several opportunities to produce each item, with varying degrees of scaffolding from the adult. During treatment sessions, the therapist was allowed to repeat (or return to) each step up to three times, in order to provide multiple opportunities for motor practice of the stimuli. Therapists provided verbal feedback about children’s performance during therapy sessions, but not probe assessments. During probes, children were positively reinforced for participation only, not whether their response was correct.

SRT

The procedure for SRT was matched in content and process to that for AMMT, but the drum-tapping and intonation were eliminated. SRT employed the same stimuli, was spoken at a rate of approximately one syllable per second, and used the same steps outlined in the description of AMMT. While no acoustic analyses were performed to compare therapists’ stimulus productions in AMMT versus SRT, therapists used a ticking clock in the therapy room as a metronome to pace their productions, helping to ensure that rate and duration would be similar in the two arms of the study. Table 2 details the steps in the therapies.

Table 2:

Steps in AMMT/SRT.

Step Description Example
Listening Therapist introduces target in a semantic context, accompanied by a picture. When you were little, you were a baby
Unison Therapist produces target with child. Let’s say it together: ‘baby
Unison fade Therapist produces initial portion of the target with child, then fades out while child continues on their own. Again: ‘ba…
Imitation (model) Therapist produces target alone. My turn: ‘baby
Imitation (response) Therapist remains silent while child imitates target. Your turn:…
Cloze Therapist presents target in same semantic context as above. Child fills in the blank by producing target independently. Last time! When you were little, you were a…

Stimuli

Words and phrases used in the two treatments consisted of two groups of 15 bisyllabic words or phrases (30 total), such as “mommy”, “bye-bye”, or “cookie”, that represent people, actions, or objects relevant to children’s daily lives. One group of words/phrases (Trained) was practiced during therapy sessions and assessed during probe sessions, while the other group (Untrained) was only assessed during probe sessions. The words and phrases contained a range of consonants and vowels, though they were heavily loaded toward early-emerging consonants such as /b, m, d, p/32. Vowels included the corner vowels (/i, a, u, æ/), schwa, and some diphthongs (e.g., /aɪ/). The words and phrases thus provided ample opportunities for children with extremely limited phonetic repertoires to produce the earliest-appearing sounds, while also allowing for children with more sizeable repertoires to practice later-appearing phonemes.

Treatment sessions used the 15 Trained stimuli, presented in random order, and probe sessions used the 15 Trained and the 15 Untrained stimuli together, also in random order. The same hierarchy of steps was used for each stimulus in both treatment and probe sessions. The stimulus set was kept constant across time and participants, so that even if a child mastered a particular stimulus it was retained in the set. This was done in order to maintain a consistent “dose” across participants (i.e., so that all children would have the same number of opportunities to practice each stimulus).

Outcome measures

Our primary outcome measure was % Syllables Approximately Correct (%Sac), a measure which does not require perfect accuracy from a child, but allows them to approximate the targets. Children’s responses to each prompt were broadly transcribed. The child’s best production, in terms of number of consonants and vowels in the response that best matched those in the target, was selected for scoring (see reliability figures below).

A syllable was scored “approximately correct” if the consonant that was produced by the child shared at least two of three phonetic features (place, manner, voicing) with the target and if the child’s vowel shared at least two of four features (height, backness, rounding, tenseness) with the target. Secondary outcome measures were % Consonants Correct (%Cc) and % Vowels Correct (%Vc). For these variables, the child’s production had to exactly match the adult target. For example, if a child produced [kʌkɛ] for “cookie”, they would receive a score of 2 consonants correct and 0 vowels correct. They would also receive a score of 0 syllables approximately correct since, although the child’s consonants were an exact match to the target, [ʌ] does not share height or rounding with [ʊ] and [ɛ] does not share height or tenseness with [i]. A production of [bʌpɛ] for “puppy”, on the other hand, would receive a score of 1 consonant correct (/p/), one vowel correct (/ʌ/), and one syllable approximately correct (because [b] matches /p/ in place and manner and [ʌ] is an exact match to the target).

Children’s productions during baseline and probe sessions were audio and video recorded, then transcribed and scored by a coder blind to session date. This coder had previously achieved reliability with other independent coders on a data set of different minimally verbal children with ASD19. Inter-rater reliability results yielded a Cohen’s κ = 0.497, p < 0.0005, and 68.0% agreement on %Sac. For %Cc , κ = 0.547, p < 0.0005, and 70.1% agreement; and for %Vc, κ = 0.270, p < 0.0005, and 54.7% agreement. These figures are comparable to previously published agreement figures for infant babbles of 76.8% for consonants and 44.8% for vowels33.

Each child received three or four baseline assessments in order to ensure that their performance was consistent before proceeding with treatment. These assessments took the same form as the probe sessions: Children were asked to repeat the 15 Trained and 15 Untrained words and phrases, in random order, without feedback. Baseline assessments were administered in the same modality that children had been randomized to; in other words, they were administered in AMMT format (with drums and intoning) for children randomized to that therapy and in SRT format (without drums or intoning) for children randomized to that therapy. The baseline session with the best %Sac score was subsequently selected for comparison with the immediate post-treatment (P25) and post-4 weeks (P4wk) maintenance probe sessions.

Treatment fidelity

Treatment fidelity was assessed using two measures. First, to ensure that the therapeutic dose was consistent between the two arms, the mean number of prompts per item (i.e., opportunities to produce each one) for one randomly selected session per child was calculated by a coder blind to the study arm and session date. Second, videos of one session per child, also randomly selected, were examined for the use of drums and intonation by a coder blind to session date.

Statistical methods

Treatment fidelity was assessed using two-tailed, independent-samples t-tests for unequal variances. To compare AMMT and SRT participants’ scores at baseline, we conducted a one-way ANOVA on the variables listed in Table 1. To compare groups’ performance from baseline to P25 and P4wk, we used repeated-measures ANOVAs, with time and stimulus type as within-subjects factors and group as a between-subjects factor, with the outcome variables %Sac, %Cc, and %Vc. The last observation carried forward was used to impute the missing P4wk scores for the one participant whose data were missing for this time point. Post-hoc analyses were also performed and effect sizes were calculated using Cohen’s d for significant changes.

RESULTS

Treatment fidelity

The mean number of prompts per item for AMMT participants was 7.2 (SD = 4.3) and for SRT participants 7.3 (SD = 2.4). The t-test was not significant (t(12) = 0.18, p > 0.99), showing that the two arms of the study included equal numbers of opportunities to produce the stimuli. No AMMT sessions lacked drums, tapping, or intonation. Neither drums, tapping, nor intonation were used in any SRT session.

Baseline measures

A multivariate ANOVA on the variables Age, ADOS score, KSPT score, PPVT score, Leiter score, Phonetic Inventory, %Sac, %Cc, and %Vc was not significant (F(4,9) = 2.97 p = 0.15). However, the SRT group did perform significantly better than the AMMT group on %Vc at baseline (p = 0.038).

Outcomes

% Syllables Approximately Correct (%Sac)

A repeated-measures ANOVA (RMANOVA) with three levels of TIME (baseline, P25, P4wk) and two levels of STIMULUS TYPE (Trained, Untrained) as within-subject factors and two levels of GROUP (AMMT, SRT) as a between-subjects factor revealed a significant main effect of stimulus type, F(1,12) = 15.829, p = 0.002. On average, participants achieved 31.8% (SD = 3.0) %Sac on Trained stimuli versus 23.3% (SD = 3.5) on Untrained stimuli. The mean changes from baseline to P4wk were associated with Cohen’s d values of 0.24 (small) for Trained stimuli and 0.33 (small) for Untrained stimuli.

There was also a significant TIME × STIMULUS TYPE × GROUP interaction, F(2,24) = 5.877, p = 0.008. The AMMT group gained a mean of 12.1 percentage points (SE = 3.8, p = 0.046) from baseline to P4wk on Trained stimuli and a mean of 5.4 percentage points (SE = 4.1, n.s.) on Untrained stimuli. These changes were associated with Cohen’s d values of 0.86 (large) and 0.59 (medium), respectively. The SRT group lost a mean of 2.7 percentage points (SE = 5.7, n.s.) on Trained stimuli and gained a mean of 7.2 percentage points (SE = 4.8, n.s.) on Untrained stimuli over the same interval. Figure 2 illustrates both groups’ change over time and by stimuli. There were no other significant main effects and no significant two-way interactions. Table 3 shows %Sac scores for the overall group and for the AMMT and SRT groups separately at baseline and P4wk. Table S1 details change scores on all three of our outcome measures for all individual participants.

Figure 2.

Figure 2.

Change in % Syllables Approximately Correct (%Sac), by group and stimulus type, from baseline to the post-4 week probe assessment. Error bars: ± 1 standard error of the mean.

Table 3.

Overall and group performance on % Syllables Approximately Correct (%Sac).

Baseline P25a P4wkb
Trained Untrained Trained Untrained Trained Untrained
Overall
μ ± SD
[min–max]

29.5 ± 23.8
[0.0–83.3]

19.3 ± 17.6
[0.0–60.0]

30.7 ± 24.0
[3.3–86.7]

25.1 ± 24.1
[6.7–90.0]

35.2 ± 25.5
[3.3–93.3]

25.55 ± 20.8
[6.7–83.3]
AMMT 19.2 ± 11.4 14.2 ± 9.0 29.6 ± 20.3 20.4 ± 11.6 31.3 ± 18.1 19.6 ± 10.4
SRT 43.3 ± 29.8 26.1 ± 24.3 32.2 ± 30.2 31.4 ± 35.3 40.6 ± 34.3 33.3 ± 29.0
a

P25: Post 25 therapy sessions.

b

P4wk: 4 weeks after therapy.

% Consonants Correct (%Cc)

An RMANOVA on %Cc with TIME (baseline, P25, P4wk) and STIMULUS TYPE (Trained, Untrained) as within-subject factors and GROUP (AMMT, SRT) as a between-subjects factor revealed a significant main effect of time, F(2,24) = 3.480, p = 0.047. There was also a significant main effect of STIMULUS TYPE, F(1,12) = 7.840, p = 0.016. Participants increased by a mean of 5.1 percentage points (SE = 2.0, n.s.) on Trained stimuli from baseline to P4wk and by a mean of 4.3 percentage points (SE = 2.7, n.s.) on Untrained stimuli. There was a significant increase in Untrained stimuli between P25 and P4wk (mean = 5.1, SE = 1.9, p = 0.047), associated with a Cohen’s d of 0.33 (small).

Finally, there was also a significant TIME × STIMULUS TYPE × GROUP interaction for %Cc, F(2,24) = 4.698, p = 0.019. The AMMT group gained a mean of 6.1 percentage points (SE = 2.6, n.s.) from baseline to P4wk on Trained stimuli and a mean of 1.2 percentage points (SE = 3.5, n.s.) on Untrained stimuli. The SRT group gained a mean of 3.9 percentage points on Trained stimuli (SE = 3.3, n.s.) and a mean of 8.5 percentage points on Untrained stimuli (SE = 4.1, n.s.). Pairwise post-hoc testing also revealed that SRT participants increased by a mean of 6.6 percentages points on Untrained stimuli from P25 to P4wk (SE = 1.6, p = 0.03; Cohen’s d = 0.35, small). No other pairwise post-hoc comparisons were significant.

There were no other significant main effects and no significant two-way effects for %Cc. Figure 3 illustrates both groups’ performance on %Cc. Table 4 shows %Cc scores for the overall group and for the AMMT and SRT groups separately at baseline and P4wk.

Figure 3.

Figure 3.

Change in % Consonants Correct (%Cc), by group and stimulus type, from baseline to the post-4 week probe assessment. Error bars: ± 1 standard error of the mean.

Table 4.

Overall and group performance on % Consonants Correct (%Cc).

Baseline P25a P4wkb
Trained Untrained Trained Untrained Trained Untrained
Overall
μ ± SD
[min–max]

25.3 ± 14.4
[2.3–53.5]

21.1 ± 10.3
[7.0–39.5]

28.2 ± 15.5
[9.3–55.8]

20.3 ± 14.3
[4.7–58.1]

30.4 ± 14.4
[11.6–62.8]

25.4 ± 13.8
[9.3–60.5]
AMMT 22.1 ± 9.6 21.2 ± 8.6 28.5 ± 11.6 18.3 ± 5.5 28.2 ± 11.7 22.4 ± 8.2
SRT 29.5 ± 19.4 21.0 ± 13.1 27.9 ± 21.0 22.9 ± 21.9 33.4 ± 18.0 29.5 ± 19.1
a

P25: Post 25 therapy sessions.

b

P4wk: 4 weeks after therapy.

% Vowels Correct (%Vc)

Finally, we also performed a RMANOVA on %Vc. From baseline to P4wk, there were no significant main effects or three-way interactions; but there was a significant TIME × GROUP interaction, F(2,24) = 5.646, p = 0.01. AMMT participants increased by a mean of 9.5 percentage points (SE = 2.9, p = 0.04; Cohen’s d = 0.88, large) on Trained stimuli from baseline to P4wk and by a mean of 7.3 (SE = 3.5, n.s.) on Untrained stimuli over the same interval (Table 5). SRT participants increased by a mean of 1.5 percentage points (SE 4.1, n.s.) on Trained stimuli and did not increase on Untrained stimuli over the same interval (Table 5). The mean change from baseline to P4wk between groups, +9.1 percentage points for AMMT versus +0.9 percentage points for SRT, was associated with a Cohen’s d of 0.98 (large). Figure 4 shows each group’s performance on %Vc from baseline to P4wk.

Table 5.

Overall and group performance on % Vowels Correct (%Vc).

Baseline P25a P4wkb
Trained Untrained Trained Untrained Trained Untrained
Overall
μ ± SD
[min–max]

25.3 ± 22.5
[0.0–76.7]

23.1 ± 16.2
[3.2–64.5]

31.0 ± 22.0
[3.3–76.7]

27.0 ± 17.7
[3.2–67.7]

31.4 ± 20.6
[13.3–88.7]

27.2 ± 15.8
[12.9–77.4]
AMMT 15.9 ± 11.4 14.9 ± 8.1 28.7 ± 19.3 25.0 ± 10.2 25.4 ± 11.6 22.2 ± 7.0
SRT 37.8 ± 28.3 33.9 ± 18.6 33.9 ± 26.8 29.6 ± 25.6 39.3 ± 28.0 33.9 ± 22.0
a

P25: Post 25 therapy sessions.

b

P4wk: 4 weeks after therapy.

Figure 4.

Figure 4.

Change in % Vowels Correct (%Vc), by group, from baseline to the post-4 week probe assessment. Error bars: ± 1 standard error of the mean.

DISCUSSION

This report details the results of an RCT designed to test AMMT, an intonation-based treatment for spoken language in minimally verbal autistic children, against a non-intonation-based treatment, SRT. The main findings were that AMMT was associated with significantly greater gains than SRT on our main outcome measure, %Sac (% Syllables Approximately Correct), for trained stimuli. AMMT was also associated with a significantly larger change than SRT on %Vc (% Vowels Correct). These results show that, in this group of participants, AMMT resulted in better learning for syllables and vowels in trained stimuli than did SRT. Results for %Cc (% Consonants Correct) were mixed, with AMMT outperforming SRT on trained stimuli and SRT outperforming AMMT on untrained stimuli. Below, we discuss each finding in turn.

First of all, both groups showed evidence of generalization by improving on untrained stimuli. This is an important finding, given the challenges of getting minimally verbal children with ASD, for whom speaking is extremely challenging, to produce large numbers of stimuli per session. The literature on severe speech disorders such as childhood apraxia of speech (CAS) suggests that from 20 to over 100 repetitions per stimulus are required for improvement in children whose language and cognition are within typical limits34, while our participants had between five and fifteen opportunities to produce the stimuli that were practiced in therapy. Thus, a smaller number of repetitions can also result in improvement.

It is also helpful to compare these findings to previous work. Previously, we reported on the results of treatment for a matched, non-randomized group of seven AMMT and seven SRT participants19. There, we found effect sizes (Cohen’s d) of 3.0 (very large) for a significant between-group difference in change in %Sac and of 1.0 (large) for change in %Cc. Cohen’s d in the present study was 0.98 (large) for the significant between-group difference in change in %Vc. AMMT can therefore result in change scores that are associated with large to very large effect sizes on our primary and secondary outcome measures. The lack of improvement in the current SRT group on %Vc may be a result of the fact that they were more proficient on this measure at baseline than the AMMT group. Alternatively, the more consistent pitch on the intoned vowels in AMMT may have facilitated their acquisition, by presenting a more consistent stimulus for longer and focusing children’s attention on vowels more in AMMT than in SRT.

It is unsurprising to find smaller effect sizes in an RCT than in case-control formats, just as it is common to find smaller effect sizes in a case-control study than a proof-of-concept study35. There can be many factors that give rise to this effect, such as a bias for selecting “model participants” in the early stages of an intervention, regression to the mean over repeated samples of the same population, or simple within-population variation. In our RCT, we observed that the participants seemed more severely affected than those from the case-control stage of the study. In particular, we examined phonetic inventories at baseline for both the RCT and pre-RCT groups, as phonetic inventory is a significant predictor of the amount of improvement participants show over the course of treatment36. Though the two sets of participants did not differ significantly on the mean size of phonetic inventory at baseline, significantly fewer children in the RCT stage of the study than in the pre-RCT stage were able to repeat more than the minimum number of phonemes (2) to meet inclusion criteria: 6/14 (43%) for RCT participants versus 4/30 (13%) for pre-RCT participants, (χ2(1) = 4.738, p = 0.03). As in previous work (see Ref. 26), children whose speech and expressive language are more severely affected benefit less from AMMT and similar therapies.

Taken together, however, our findings do indicate that AMMT can be a useful and effective treatment for minimally verbal children with ASD, more effective than a speech therapy that does not involve intonation and drumming. The degree of variation in performance of the participants within and between various stages of AMMT’s development highlight the importance of understanding predictors of improvement in more detail. While we know that a larger phonetic inventory at baseline predicts larger magnitude of improvement for both AMMT and SRT36, we do not yet know which factors predict whether a child will perform better in AMMT than SRT.

Relatedly, it is unclear what the active ingredients of each treatment might be. We hypothesize that employing intonation and simultaneous bimanual movements are the effective factors in AMMT11,15,34,37. An additional factor may be the use of drumming to communicate knowledge of performance (i.e., whether the child’s production was correct or not). We note anecdotally that therapists tended to prevent children from drumming, and thus from finishing a trial, until their responses were accurate enough. For example, if a child tried to simply tap the drums without responding to the verbal prompt, the therapist would put her hands between theirs and the drums, waiting until the child attempted the target, communicating that just drumming was not a sufficient response. Thus, in addition to communicating correct/incorrect, drumming may also have provided positive reinforcement for correct productions. Another potential difference in active ingredient between AMMT and SRT has been alluded to above in our discussion of the greater improvement on vowels in AMMT than SRT. If intonation in AMMT might have the effect of focusing children’s attention on vowels, it is possible that the converse is true for SRT and that it allows children to focus more on consonants. This may have interacted with a difference in the stimulus sets, which is that there were more /b/ targets in Trained stimuli and more /p/ targets in Untrained stimuli. While both /b/ and /p/ are considered early-emerging consonants, it is also true that /b/ emerges first because it is articulatorily less complex. A possible focus in SRT on consonants may resulted in better long-term retention of /p/ than in AMMT, but this effect may only have showed up in the Untrained stimuli because this set contained more /p/ targets.

Finally, in both AMMT and SRT children are directed to watch the therapist’s face during both unison and unison-fade productions. The therapist actively engages the child’s social attention to the interaction by taking turns producing the stimuli and by providing praise for participation and, during therapy sessions, for correct productions. This high degree of social engagement may be an additional active ingredient in both treatments.

Limitations and future work

The present study is limited by its small n and should be repeated with a larger number of participants to better detect small between-group effects. In addition, better stratification of participants by ability should be the focus of future work, since the more severely affected children likely require simpler stimuli than two-syllable words. Indeed, some of the most severely affected children may need to begin by being taught not to say words per se but to phonate intentionally or use communicative grunts. Children may also show more improvement if stimulus sets are comprised of words that are personally meaningful to them (e.g., words describing preferred snacks, activities, or people) rather than of generic sets of words that are taught to all participants. Single-case research designs may be most helpful for these investigations and can be structured to produce between-group data as well. Furthermore, though AMMT was presented in a discrete-trial training (DTT)–like format here, it could also be implemented in a play-based or Naturalistic Developmental Behavioral Intervention (NDBI) format. Future studies could investigate AMMT’s effectiveness in this milieu.

In addition, it remains unknown whether (1) the amount of improvement shown in children who did improve was clinically significant, and (2) whether improving speech production also improves expressive language. That is, severe speech production difficulty may inhibit children from attempting verbal communication at all, and this may result in lagging development of expressive language. Thus, future work is needed to elucidate the potential relationship of severe speech impairment to expressive language delay in children with both. Longitudinal studies are especially important in this regard.

Supplementary Material

tS1

ACKNOWLEDGEMENTS

We thank the children who participated in the studies leading to this manuscript and their families for allowing them to participate. This work was supported by P50 DC 13027 (to H.T.F.), P50 DC 18006 (to H.T.F.), the Nancy Lurie Marks Family Foundation and Autism Speaks (to G.S.), and R00 DC 017490 (to K.V.C.).

Footnotes

COMPETING INTERESTS

The authors declare no competing interests.

References

  • 1.Maenner M, Shaw K, Bakian A, et al. 2021. Prevalence and characteristics of autism spectrum disorder among children aged 8 years — Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2018. MMWR Surveillance Summary 70 (No. SS-11): 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Tager-Flusberg H & Kasari C. 2013. Minimally verbal school-aged children with autism spectrum disorder: the neglected end of the spectrum. Autism Research 6(6): 468–478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Baghdadli A, Pascal C, Grisi S, et al. 2003. Risk factors for self-injurious behaviors among 222 young children with autistic disorders. Journal of Intellectual Disability Research, 47(8): 622–627. [DOI] [PubMed] [Google Scholar]
  • 4.Dominick K, Davis N, Lainhart J, et al. 2007. Atypical behaviors in children with autism and children with a history of language impairment. Research in Developmental Disabilities 28: 145–162. [DOI] [PubMed] [Google Scholar]
  • 5.Hartley S, Sikora D, & McCoy R, 2008. Prevalence and risk factors of maladaptive behaviour in young children with autistic disorder. Journal of Intellectual Disability Research 52(10): 819–829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Matson J & Rivet T. 2008. The effects of severity of autism and PDD-NOS symptoms on challenging behaviors in adults with intellectual disabilities. Journal of Developmental Physical Disabilities 20:41–51. [Google Scholar]
  • 7.Koegel R, O’Dell M, & Koegel L. 1987. A natural language teaching paradigm for nonverbal autistic children. Journal of Autism and Developmental Disorders 17(2): 187–200. [DOI] [PubMed] [Google Scholar]
  • 8.Sautter R & LeBlanc L. 2006. Empirical applications of Skinner’s analysis of verbal behavior with humans. The Analysis of Verbal Behavior 22: 35–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kasari C, Kaiser A, Goods K, et al. 2014. Communication interventions for minimally verbal children with autism: a sequential multiple assignment randomized trial. Journal of the American Academy of Child and Adolescent Psychiatry 53:6, 635–646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wan C, Bazen L, Baars R, et al. 2011. Auditory-Motor Mapping Training as an intervention to facilitate speech output in non-verbal children with autism: A proof of concept study. PLoS One 6(9), e2550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Meister I, Boroojerdi B, Foltys H, et al. 2003. Motor cortex hand area and speech: implications for the development of language. Neuropsychologia 41: 401–406. [DOI] [PubMed] [Google Scholar]
  • 12.Ozdemir E, Norton A, & Schlaug G. 2006. Shared and distinct neural correlates of singing and speaking. Neuroimage 33: 628–635. [DOI] [PubMed] [Google Scholar]
  • 13.Lahav A, Saltzman E, & Schlaug G. 2007. Action representation of sound: audiomotor recognition network while listening to newly acquired actions. Journal of Neuroscience 27: 308–314. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Binkofski F & Buccino G. 2004. Motor functions of the Broca’s region. Brain and Language 89(2): 362–369. [DOI] [PubMed] [Google Scholar]
  • 15.Iverson J & Fagan M. 2004. Infant vocal-motor coordination: precursor to the gesture-speech system? Child Development 75(4):1053–66. [DOI] [PubMed] [Google Scholar]
  • 16.Gernsbacher M, Sauer E, Geye H, et al. 2008. Infant and toddler oral- and manual-motor skills predict later speech fluency in autism. Journal of Child Psychology and Psychiatry 49(1):43–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Miller S & Toca J. 1979. Adapted melodic intonation therapy: case-study of an experimental language program for an autistic child. Journal of Clinical Psychiatry 40: 201–203. [PubMed] [Google Scholar]
  • 18.Hoelzley P (1993) Communication potentiating sounds: developing channels of communication with autistic children through psychobiological responses to novel sound stimuli. Canadian Journal of Music Therapy 1: 54–76. [Google Scholar]
  • 19.Chenausky K, Norton A, Tager-Flusberg H et al. 2016. Auditory-Motor Mapping Training: comparing the effects of a novel speech treatment to a control treatment for minimally verbal children with autism. PLOSOne. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Chenausky K, Norton A, & Schlaug G. 2017a. Auditory-motor mapping training in a more verbal child with autism. Frontiers in Human Neuroscience 11:426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Heaton P, Williams K, Cummins O et al. 2007. Beyond perception: musical representation and on-line processing in autism. Journal of Autism and Developmental Disorders 37: 1355–1360. [DOI] [PubMed] [Google Scholar]
  • 22.Norton A, Zipse L, Marchina S, et al. 2009. Melodic intonation therapy: how it is done and why it might work. Annals of the New York Academy of Science 1169: 431–436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Wan C & Schlaug G. 2010. Neural pathways for language in autism: the potential for music-based treatments. Future Neurology 5(6), 797–805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Chenausky K, Kernbach J, Norton A, et al. 2017b. White matter integrity and treatment-based change in speech performance in minimally verbal children with autism spectrum disorder. Frontiers in Human Neuroscience 11: Article 175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Wan C, Marchina S, Norton A, et al. 2012. Atypical hemispheric asymmetry in the arcuate fasciculus of completely nonverbal children with autism. Annals of the New York Academy of Sciences 1252: 332–337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Yan J, Chen F, Gao X, et al. 2021. Auditory-Motor Mapping Training facilitates speech and word learning in tone language-speaking children with autism: an early efficacy study. Journal of Speech, Language, and Hearing Research. [DOI] [PubMed] [Google Scholar]
  • 27.Rutter M, Le Couteur A, & Lord C. 2003. Autism Diagnostic Interview–Revised. Los Angeles, CA: Western Psychological Services. [Google Scholar]
  • 28.Lord C, Rutter M, DiLavore P, et al. 2012. Autism Diagnostic Observation Schedule (Modules 1–4). 2nd ed. Torrance, CA: Western Psychological Services. [Google Scholar]
  • 29.Kaufman N 1995. Kaufman Speech Praxis Test. Detroit: Wayne State University Press. [Google Scholar]
  • 30.Dunn L, & Dunn D. 2007. PPVT-4: Peabody Picture Vocabulary Test. Minneapolis, MN: Pearson Assessments. [Google Scholar]
  • 31.Roid G, & Miller L. 2013. Leiter-3: Leiter International Performance Scale, Third Edition. Torrance, CA: Western Psychological Services. [Google Scholar]
  • 32.Shriberg L, Gruber F, & Kwiatkowski J. 1994. Developmental phonological disorders III: long-term speech-sound normalization. Journal of Speech and Hearing Research 37: 1151–1177. [DOI] [PubMed] [Google Scholar]
  • 33.Davis B & MacNeilage P. 1995. The articulatory basis of babbling. Journal of Speech and Hearing Research 38: 1199–1211. [DOI] [PubMed] [Google Scholar]
  • 34.Strand E, Stoeckel R, & Baas B, B. 2006. Treatment of severe childhood apraxia of speech: a treatment efficacy study. Journal of Medical Speech-Language Pathology 14(4): 297–307. [Google Scholar]
  • 35.Chenausky K, & Schlaug G. 2018a. From intuition to intervention: developing an intonation-based treatment for autism. Annals of the New York Academy of Sciences. Special Issue: The Neurosciences and Music VI. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Chenausky K, Norton A, Tager-Flusberg H, et al. 2018b. Behavioral predictors of improved speech output in minimally verbal children with autism. Autism Research 11: 1356–1365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Iverson J & Thelen D. 1999. Hand, mouth, and brain: the dynamic emergence of speech and gesture. Journal of Conscious Studies 6: 19–40. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

tS1

RESOURCES