Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Feb 19.
Published in final edited form as: Phonetica. 2015 Feb 19;71(3):183–200. doi: 10.1159/000369630

Accommodation of end-state comfort reveals subphonemic planning in speech

Donald Derrick 1,1,, Bryan Gick 2
PMCID: PMC4464800  NIHMSID: NIHMS693420  PMID: 25790787

Abstract

Applying Rosenbaum’s “end-state comfort” hypothesis (Rosenbaum et al., 1992, 1996) to tongue motion provides evidence of long-distance subphonemic planning in speech. Speakers’ tongue postures may anticipate upcoming speech up to three segments, two syllables, and a morpheme or word boundary later. We used m-mode ultrasound imaging to measure the direction of tongue tip/blade movements for known variants of flap/tap allophones of North American English /t/ and /d/. Results show that speakers produce different flap variants early in words or word sequences so as to facilitate the kinematic needs of flap/tap or other /r/ variants that appear later in the word or word sequence. Similar results were also observed across word boundaries, indicating that this is not a lexical effect.

Introduction

Many scientists have been trying to explain low-level speech production, in particular coarticulation, without reference to planning for decades (Joos, 1948; Ohman, 1966, 1967; Fowler, 1980; Saltzman and Munhall, 1989; Boyce, 1990). In this view, speakers may plan larger units of speech such as the phrase (Shattuck-Hufnagel, 2000) or sentence (Butterworth, 1975), but the limited inventory of segments are drawn from stored knowledge in the brain. Other scientists, in contrast, have argued for planning during adjacent anticipatory coarticulation (Whalen, 1990; Roelfs, 1997), anticipatory coarticulation spanning a vowel-consonant-vowel (VCV) trajectory (Winkler et al., 2011; Barbier, 2013), or they incorporate planning into the structure of speech articulation (Henke, 1966).

Evidence has been found for speech planning at the level of syllables (Levelt, 1994, Hawkins and Nguyen, 2002), with some evidence for planning at lower levels, such as the phoneme (Levelt, 1989; Dell, 1986) or feature (Dell, 1986; Mowrey and MacKay, 1990; Bernhardt and Stemberger, 1998). In addition, Bell-Berti and Harris (1979) argued for a timing-based anticipatory coarticulation, at least when there are no competing constraints, such as with lip-rounding preceded by segments that do not constrain lip position, while Keating (1990) argued for windowed coarticulation of adjacent segments based on subphonemic production variability. Ungrammatical productions in tongue-twisters (Frisch and Wright, 2002) have also been used to argue that spreading activation of more than one competing phoneme or feature can generate subphonemic variation that can only be noticed through careful acoustic (Goldrick and Blumstein, 2006) and articulatory (McMillan and Corley, 2010) analysis. That is, higher level planning can influence the subtlest of speech articulations. Nevertheless, Munhall et al. (2000) have illustrated the difficulty in demonstrating clear cases of planning in speech, as actual speech output may look very similar regardless of whether there is planning or not. As such, a diagnostic allowing identification of low-level planning would be of great use in speech. One such diagnostic is end-state comfort. Rosenbaum et al. (1992, 1996) and Cohen and Rosenbaum (2004) observe that people grasp objects at the beginning of transport in a way that allows joints to be in a comfortable position at the end of transport. If someone is asked to pick up a glass and put it down the same way, usually the hand is held with the thumb in medial position throughout, but if asked to put the glass down upside-down, usually the arm begins twisted so that the thumb is in lateral position when the glass is picked up, and twisted back to the more comfortable thumb-medial position when the cup is put down upside-down. Observations of the end-state comfort effect have been used as diagnostics of motor planning in humans, lemurs (Chapman et al., 2010), and cotton-top tamarins (Weiss et al., 2007).

We argue that an analysis relying on the end-state comfort hypothesis also applies to speech motion, provided it is possible to identify a beginning and end-point in a sequence where motion transitions are constrained enough to measure categorical differences. This is not without challenges, as many speech articulators move faster and have more degrees of freedom than skeletal structures. For instance, the human tongue and lips are muscular hydrostats, much like an elephant’s trunk or the tentacles of an octopus (Kier and Smith, 1985). The tongue is free to move faster than a skeletal structure, in many more directions and patterns, and with partially independent control of different parts of the tongue (Stone, et al., 2004). However, this freedom of motion is in practice constrained in some animals and some situations, as when an octopus mimics the bend-points of an arm when it is trying to grab something (Sumbre, et al. 2001, 2005). We argue that in North American English, the interaction of flap/tap variants and surrounding non-rhotic and rhotic vowels provides constraints on tongue motion that work well with testing of the end-state comfort hypothesis in speech planning.

To explain, we must describe subphonemic variability in flaps/taps, rhotic vowels, and non-rhotic vowels:

During English /t, d/ (“flap” or “tap”) production, the back of the tongue is braced against the teeth; in contrast the tongue tip must make rapid contact with the front of the hard palate. In previous research, Derrick and Gick (2011) identified four subphonemic categorical kinematic variations for these flap/taps. The first is an alveolar tap ([ɾ]), in which the tongue tip moves from below the alveolar ridge upwards, makes contact, then moves back down into position for the following vowel. The second is a down-flap ([ɾ]), in which the tongue tip moves from above the alveolar ridge, makes contact, and continues downwards below the alveolar ridge. The third is an up-flap ([ɾ]), in which the tongue tip moves from below the alveolar ridge, makes contact, and continues upward into a position above the alveolar ridge. The fourth is a post-alveolar tap ([ɾ]), in which the tongue tip moves anteriorly from behind and above the alveolar ridge, makes contact at a point at or above the ridge, then retracts back to a position above and behind the alveolar ridge.

These flap/taps are always produced within a vocalic context, and all begin and end with the tongue tip in either a high position (above the alveolar ridge), or a low position (below the alveolar ridge) based on interaction with their surrounding vowels. These vowels come in two types, rhotic and non-rhotic – and these two types impose significantly different constraints on tongue tip position.

Previous research (Delattre and Freeman, 1968) demonstrated that rhotic vowels (hereafter R in transcription shorthand) could be produced in at least eight categorically different ways. Simplifying these patterns by focusing on anterior tongue constriction, Hagiwara (1995) identified two broad categories, tongue tip-down or “bunched” r ([ɹ̩]), and tongue tip-up, including but not limited to “retroflex” r ([ɻ̩]). That is, tip-up productions have the tongue tip and the front of the tongue blade above, and therefore also behind, the alveolar ridge. The tip-down productions have the tongue tip below the alveolar ridge - regardless of the position of the back of the tongue or the angle of the blade. In contrast, non-rhotic vowels (hereafter V in transcription shorthand) are typically produced tip-down only (see Chiba and Kajuyama, 1958; Perkell, 1969). This means that a tip-up instance of a non-rhotic vowel would constitute a non-ideal variant of what would normally be a tip-down vowel. Because of this, nonrhotic vowels can be used as a diagnostic of “comfort” when the tongue tip is down, or of “non-comfort” when the tongue tip is up. That is, they can serve as a context in which it is possible to identify end-state comfort effects in speech movements.

Our previous research (Derrick, 2011) suggests that such end-state comfort may play an important role in sequences of these articulations, as evidenced by the observation that in words with one flap/tap, the flap/tap variants are better predicted from the following rhotic vowel variant than from the preceding rhotic vowel variant. That is, for the word “Berta”, speakers largely ignored the R variant, producing [ɾ] even following a [ɹ̩]. In these cases, the [ɹ̩] rapidly transitions to a [ɻ̩] position prior to the [ɾ]. The researchers observed few similar transitions into the final rhotic vowel for “otter”. That is, the most common production showed anticipatory coarticulation - a [ɾ] followed by a word-final [ɻ̩]. However, this evidence of coarticulation focuses on immediate context, extending no further than the transition from a preceding rhotic vowel into a flap/tap variant in relation to the tongue tip position of the following non-rhotic vowel. This result could potentially be explained via within-word memorized sequences or local assimilation. In contrast, the present study was designed to test whether speakers take into account tongue movement information from upcoming morphemes and words when producing longer speech sequences, and over non-adjacent spans.

Hypotheses

Our hypotheses for end-state comfort accommodation across morpheme and word boundaries are presented below.

Morpheme boundary

To test for subphonemic planning across morpheme boundaries, we compared two words that end in non-rhotic vowels, “edify” and “audify”, to two words ending in rhotic vowels, “editor” and “auditor”. Because all of the flap/tap contacts we observed occur at or above the alveolar ridge, in order to produce any of these four words, the tongue tip typically starts low and moves up to produce the initial flap/tap. For “edify/audify”, the rest of the sequence involves non-rhotic vowels that are produced with the tongue tip-down, so the tongue tip may be expected to move back down to facilitate their production. In comparison, for “editor/auditor”, the sequence ends with a rhotic vowel, which can be either tip up or down. If the words are produced exclusively with word-final [ɹ̩], we expect a “double-tap” [Vɾɹ̩] sequence regardless of whether the sequence shows anticipatory subphonemic planning or not. However, if those rhotic vowels are tip-up [ɻ̩], it is possible to identify the sequences with anticipatory planning. If the speaker plans out the movement sequence in advance, in those cases where a tip-up word-final [ɻ̩] is planned, we expect that the sequence is more likely to begin with an initial [ɾ], anticipating the need to raise the tongue tip later in the word, and overriding the preference for a tip-down vowel in the middle of the sequence. Such a sequence is then more likely to end with a [ɾ] followed by the [ɻ̩]. If the speaker does not plan ahead, but rather engages in purely sequential (left-to-right) production, the tongue tip is more likely to move back down after contacting the alveolar ridge, producing an initial [ɾ] leading into the second vowel, followed by a [ɾ] leading into the final [ɻ̩]. Our hypothesis generates two predictions:

Prediction 1

If subphonemic planning influences tongue motion sequences, we expect a greater incidence of [ɾ] for the first flap/tap in “editor/auditor” in comparison to “edify/audify”, which should exhibit a greater incidence of [ɾ].

Figure 1 schematically illustrates prediction 1, as well as how the middle (non-rhotic) vowel would be realized with the tongue tip-up during a planned sequence, illustrating the end-state comfort effect. The black lines illustrate the predicted position of tongue tip height during the given production. The blue circles represent the flap/tap under examination. The green short-dashed circles represent ideal (“comfortable”) states. The red long-dashed circle represents a tongue tip up non-rhotic vowel, which is a non-ideal state as described above. The grey arrow highlights the relationship between the beginning and end-states. The grey dashed line represents the morpheme boundary.

Figure 1.

Figure 1

Schematic of predicted tongue tip height during planned productions of “editor/auditor” vs. “edify/audify”, focusing on initial flap/tap. Black line = tongue tip height. Blue circle = flap/tap of interest. Green short-dashed circle = ideal state. Red long-dashed circle = non-ideal state. Grey arrow = Predicted influence of ideal end-state on initial flap/tap variant. Grey dashed line = morpheme boundary. (Color online)

Prediction 1a

We also note that choice of comparing “editor/auditor” vs. “edify/audify” has a potential confound in that “editor/auditor” has two flap/taps in adjacent syllables, whereas “edify/audify” have one flap/tap. We therefore expect that “editor/auditor” would have a greater incidence of [ɾ] than any sequence with two flap/taps in adjacent syllables that end with a non-rhotic vowel, such as “edit/audit a”.

Prediction 2

If prediction 1 is true, we expect the number of instances of [ɾ] for the final flap/tap in “editor/auditor” to be similar to the number of instances of [ɾ] tokens for the initial flap/tap in “editor/auditor”. Figure 2 schematically illustrates prediction 2.

Figure 2.

Figure 2

Schematic of predicted tongue tip height during production “editor/auditor” for planned vs. unplanned productions, focusing on final flap/tap. Black line = tongue tip height. Blue circle = flap/tap of interest. Green short-dashed circle = ideal state. Red long-dashed circle = non-ideal state. Grey dashed line = morpheme boundary. (Color online)

Word boundary

To test for subphonemic planning across word boundaries, we compared phrases that contain either one or two flap/taps. The words “edit” and “audit” contain only one flap/tap (the one between vowels). However, the onset of the following word can affect the number of flaps/taps in the sequence. For example, the phrases “edit the” and “audit the” contain only one flap/tap, followed by a rapid motion of the tongue tip toward the teeth for the interdental stop + fricative sequence [tθ]' instead, (see Browman and Goldstein 1989 regarding gestural occlusion). In all cases, the tongue tip would move higher than would be preferred for a non-rhotic vowel, but lower than the typical height of contact for any of the four flap/tap variants. In contact, the phrases “edit a” and “audit a” contain two flap/taps, where the second flap/tap in the sequence is conditioned by the vowel following the word boundary. Producing any of these four phrases normally requires starting with the tongue tip low and moving it up to produce the initial flap/tap, again because all flap/tap contact is at or above the alveolar ridge. Thus, a speaker who plans ahead for the double-flap/tap sequence should be more likely to produce a [ɾ], [ɾ] sequence for “edit/audit a”, but an initial [ɾ] for the single flap/tap in “edit/audit the”.

Prediction 3

If subphonemic planning influences tongue motion sequences, we expect a greater incidence of [ɾ] for the first flap/tap in “edit/audit the” in comparison to “edit/audit a”, which would exhibit a greater incidence of [ɾ].

Figure 3 schematically illustrates prediction 3, as well as how the middle (non-rhotic) vowel would then be retroflexed during the planned sequence, illustrating the end-state comfort effect.

Figure 3.

Figure 3

Schematic of predicted tongue tip height during planned productions of “edit/audit a” vs. “edit/audit the”, focusing on initial flap/tap. Black line = tongue tip height. Blue circle = flap/tap of interest. Green short-dashed circle = ideal state. Red long-dashed circle = non-ideal state. Grey arrow = Predicted influence of ideal end-state on initial flap/tap variant. Grey dashed line = morpheme boundary. (Color online)

Prediction 4

In cases where the speaker does not produce a [ɾ] for “edit/audit the”, we might expect them to produce a greater incidence of [ɾ] in comparison to “edit/audit a”.

Avoiding confounds

All of these sequences begin with a non-rhotic vowel at the beginning of a stressed word/phrase, followed by an initial flap/tap and a medial non-rhotic vowel. For each hypothesis, there are tokens with contrasting end-states for analysis.

Since the sequence begins with a non-rhotic vowel, and our analysis begins with the middle of the initial flap/tap motion that follows, initial prosodic strengthening (Fougeron and Keating, 1997) does not interfere with our measurements. In addition, the medial state of the sequence – the vowel in the middle of the sequence - is the one that is expected to show the most deviation from the expected norm. Therefore, end-state comfort analyses do not interfere with observations of initial strengthening or of medial/final reduction or declination (Krakow, et al., 1995), and we expect our results to fit within the confines of all of these analyses at the same time.

Methods

Eighteen (18) native speakers of North American English between the ages of 18 and 40 participated in the study. All participants had normal speech and hearing. Participants were seated in a customized American Optical Co. model 507-a (1953) ophthalmic chair with a 2-cup rear headrest adjusted to contact the base of the skull just above the neck. This technique is the same one used in Gick et al. (2005); it allows for careful alignment of the ultrasound probe along the midsagittal plane, and reduces head motion to about a standard deviation of 1 mm and angular rotation of about 0.5 degrees.

A UST-9118 EV 180 electronic curved array ultrasound probe was placed under the chin using a mechanical arm. The probe has a variable frequency range of 3–9.0 MHz (set to 6 MHz) with an average µ slice thickness of the tissue viewed with this probe of approximately 3 mm (Medicines and Healthcare products Regulatory Agency, 2004). Therefore head motion in relation to the position of the ultrasound probe is expected to be less than the thickness of the ultrasound slice. The ultrasound probe was held upright as close as possible to the thyroid notch to allow for effective imaging of most or all of the tongue tip. This is because the anterior portion of the probe is not near the jaw as it would be with a long probe, mitigating occlusion of the tongue tip by the jaw. In addition, the angle between the probe and the tongue tip is maximally well suited to avoiding possible air boundaries under the tongue tip. As a result, for most of our data, the tongue tip was clearly visible, and for the rest, it was similar to results from point-tracking techniques such as x-ray microbeam or EMA, where the tongue tip sensor is placed 1 cm back from the tongue tip. In addition, the use of three m-mode lines to access tongue tip and blade motion maximized the likelihood of obtaining the most anterior motion information possible.

This probe was attached to an Aloka ProSound SSD-5000 ultrasound machine connected via s-video cable (marked video IN) to a Canopus ADVC-110 advanced digital video recorder. A Sennheiser MKH-416 short shotgun microphone was mounted on a microphone stand and aimed 30 cm away from the participant’s mouth. The microphone was plugged into a M-Audio DMP3 pre-amplifier via XLR balanced cable and out with an RCA cable to the Canopus card to ensure time synchronization between the ultrasound and audio output. The Canopus card was connected via FireWire to a MacPro Quad Core 2.8 gHz computer.

An LCD monitor was mounted on the ophthalmic chair’s monitor mount in front of the participant. A computer containing the experiment stimuli presentation software was connected to the LCD monitor so that the participant could easily read the stimuli from the screen.

The ultrasound machine was set up in simultaneous B/M mode and aligned to the acoustic signal. The B-mode ultrasound was used to capture 2-dimensional images of the midsagittal plane of the tongue at 30 fps. The M-mode (motion mode) ultrasound provided a progressive scan of three selected one-dimensional lines accessible from an ultrasound probe.

These three one-dimensional M-mode lines were set visually to about 45 degrees left-up, and adjusted to capture anterior (tip and blade) tongue motion during flap/tap production. Because M-mode ultrasound is a progressive scan, it presents the motion data at the full capture rate of the ultrasound probe, ranging from 60–100 Hz depending on the depth of the scan. This arrangement of M-mode lines allows capture of the general direction of motion of the tongue tip and blade, which is ideal for identifying the flap/tap variants described above. At the same time, the B-mode ultrasound allows examination of the midsagittal plane of the tongue surface at 30 fps (NTSC), which along with the M-mode data allowed identification of the rhotic vowel variants described above. The three M-mode intersect lines were placed parallel to each other in order to provide redundancy in case one or two of the intersects fell outside of the range of tongue motion for the participants. Too close to the palate and part of the tongue motion could be missing, too far and the motion would be less pronounced as the tongue body does not move much during flap production.

Tokens were selected to contain single flap/taps or sequences of flap/taps in consecutive syllables. Data were collected on 17 control sentences, 9 sentences with 1 flap/tap, 10 sentences with double flap/tap sequences, and 2 sentences with triple flap/tap sequences, for a total of 38 unique sequences. The sentences were randomized within each of 12 blocks, giving a total of 456 stimuli sentences. The stimuli were presented using PXlabRT (Irtel, 2007) such that each sentence was displayed on an LCD screen for 2.2 seconds. The software automatically paused the experiment after the first 6 blocks to allow participants to swallow some water or take a short break if needed. Each set of 6 blocks took 9 minutes, for a total of 18 minutes recording time. The present study is based on a subset of the phrases collected, as shown in Table 1.

Table 1.

Subset of phrases used for analysis of subphonemic planning.

Token Word Carrier Phrase Flap/tap count
1 editor We have editor books 2
2 auditor We have auditor books 2
3 edify We have him edify a book 1
4 audify We have him audify a book 1
5 audit the We have him audit the books 1
6 edit the We have him edit the books 1
7 edit a We have him edit a book 2
8 audit a We have him audit a book 2

The acoustic signal was labeled and transcribed in PRAAT (Boersma, 2001). Vowels were identified using the standard techniques of listening and aligning to waveform and spectrogram. Vowels were labeled as rhotic [R] or non-rhotic [V]. Flap contact duration could not be measured precisely using ultrasound, and doing so was not required for our analysis. However, flap/tap contacts could be and were identified temporally as the point of lowest amplitude in the acoustic signal, as defined by Zue and Laferriere (1979). See Figure 5 for an example from a token of “editor”. Flap/tap boundaries were identified by their attenuation of vocalic energy in the waveform and spectrograms. Flap/taps were labeled as [T], and were expected to be from 10–40 ms in length (see Zue and Laferriere, 1979; Fukaya and Bird, 2005). However, the duration from the middle of the proceeding vowel to the middle of the following vowel contains the articulatory transition information used to identify flap/tap variants, and is expected to be much longer, at over 100 ms.

Figure 5.

Figure 5

Schematic of B/M mode ultrasound with visualization of the technique for identifying T variants through M-Mode. (Image from Derrick & Gick, Under Review).

PRAAT transcriptions were then imported into ELAN (Sloetjes & Wittenburg, 2008) and the flap/tap variants were identified from the B/M mode ultrasound image. The B-mode data allowed categorical identification of the rhotic vowels as either tip-down [ɹ̩] or tip-up [ɻ̩] based on tongue shape and position.

The M-mode data allowed tracking of tongue tip/blade motion for identifying which of the four flap/tap variants was uttered. The top of Figure 5 illustrates the position of M-mode intersect lines. As the surface of the tongue moves through the intersect lines, the white surface appears in the progressive scans produced from the data captured along the intercept lines. When the tongue tip/blade moves high and back the white lines move higher, and when the tongue tip/blade is low and front the white lines are lower. The three parallel intersect lines typically move in unison, though one or two may be absent due to motion of the tongue outside the intersect lines and/or occlusion of that part of the tongue. The three together ensure that the most anterior portion of the tongue possible was always tracked, allowing for accurate observation of the movement patterns of the tongue tip.

The flap/tap variants were identified by first examining the B-mode video just before, during, and after flap/tap contact. The identification was confirmed by examining the three M-mode progressive scans, starting from a 2–3 frames ahead of the flap contact, and focusing on the M-mode data adjacent to the leading edge, as identified by the thick black lines, and highlighted as the area of interest in Figure 5. Within the M-mode data, there are four patterns of interest illustrated in Figure 5: Alveolar taps ([ɾ]) are identified by a white up-down loop centered on the acoustically identified time of contact. Down-flaps ([ɾ]) are identified by a downward motion of the white air boundary. Up-flaps ([ɾ]) are identified by an upward motion of the white air boundary. Lastly, postalveolar taps ([ɾ]) are identified by a flat or slightly wavy horizontal white air boundary, higher than the typical up-down loop of an [ɾ] (See Derrick, Stavness and Gick, Under review). In Figure 5, the three m-mode intersect scans each show the pattern of interest highlighted.

Data were analyzed using Wilcoxon Signed-Rank and Rank-sum tests (Wilcoxon, 1945) in R (R Core Team, 2013). The Wilcoxon Signed-Rank test is a nonparametric alternative to the paired t-test based on an ordered ranking of the measured observations. For example, for prediction 2, to test whether “edify/audify” are produced with more [ɾ] than “editor/auditor”, the percentages of flap/taps produced as [ɾ], for each subject and for each group, were computed. The results were paired by participant, and the statistical test completed. Wilcoxon Rank-Sum tests were selected because they are the most conservative of the statistical tests and make no assumptions about normality. They are therefore highly appropriate for data that varies by participant as much as this data does, though they are prone to type II errors, so even when results are not statistically significant, the descriptive statistics may be informative as to behavioral trends.

In contrast, the Wilcoxon Rank-Sum test, or Mann-Whiney U test, is a nonparametric alternative to the 2-sample t-test. This was used as a test of significance for the behavior of individual participants for data where there was particularly high between-subject variability, so as to avoid type II errors that might be introduced by using the Signed-Rank test. Because the Rank-Sum test does not require paired samples, but simply two groups, it can be used to examine differences in contextual behavior for each participant individually.

Results

The results of the experiment are presented below, including descriptive statistics and logistic regression tests for each of the predictions listed above.

Descriptive acoustic analysis

Flap/tap duration, as measured by the boundaries of attenuation of vocalic energy seen in the waveform and spectrograms, were an average (mean) of 32.7 ms (10.9 ms standard deviation [sd]) for initial flap/taps, and a mean of 56.8 ms (19.4 ms sd) for the second flap/taps for “editor/auditor” and “edit/audit a”. The transition times from the middle of the preceding vowel to the middle of the following vowel were a mean of 106.1 ms (19.8 ms sd) for initial flap/taps, and a mean of 123.2 ms (24.3 ms sd) for second flap/taps for “editor/auditor” and “edit/audit a”. These transition durations provide lines of progressive m-mode data based upon the frame rate of the Ultrasound machine (between 60 and 100 Hz) and duration of the transitions. That is, an average of 6.4 (at 60 Hz) to 10.6 (at 100 Hz) ultrasound cycles for initial flap/taps, and an average of 7.4 (at 60 Hz) to 12.3 (at 100 Hz) ultrasound cycles for second flap/taps. These correspond to the m-mode “area of interest” windows shown in Figure 4, and are quite sufficient for flap/tap identification.

Figure 4.

Figure 4

PRAAT transcription of a token of the word “editor”. V = non-rhotic vowel. T = flap/tap. R = rhotic vowel.

Hypothesis 1: Subphonemic planning across Morpheme Boundaries

We begin with the tests for subphonemic planning across morpheme boundaries, as described in predictions 1 and 2.

Prediction 1

Recall that we predicted more [ɾ] for “edify/audify”, and more initial [ɾ] for “editor/auditor” depending upon differences in tongue tip position for the final vowel. Of the 399 measureable rhotic vowels in our dataset, for “editor/auditor” 314 (78.7%) of them were tip-up [ɻ̩], allowing a meaningful comparison with tip-down non-rhotic vowels at end of “edify/audify”. All instances of flap/tap variants produced during the word “edify/audify” vs. the initial flap/taps from “editor/auditor” are plotted in Figure 6. Overall, the participants produced 238 [ɾ] out of 399 successfully measured tokens for “editor/auditor”, compared to only 73 out of 414 tokens for “edify/audify”.

Figure 6.

Figure 6

Comparison of the flap/tap variants in “edify/audify” vs. the first flap/tap in “editor/auditor”. Left side = by-subject comparison. Right-side = All subject combined comparison. Each box contains four quadrants sized proportionally to the counts of the relevant flap/tap variant. Upper-left = alveolar tap [ɾ]. Upper-right = down-flap [ɾ]. Lower-left = up-flap [ɾ]. Lower-right = postalveolar tap [ɾ]. (Color online)

Three participants (8, 15 and 23) almost always produced [ɾ] in all four phrases, violating the assumption that would make them testable for hypothesis one. However, the rest did not. Five participants (2, 4, 5, 6 and 26) produced exclusively or almost exclusively [ɾ] during the production of the flap/tap in “edify/audify”, but exclusively or almost exclusively [ɾ] during the production of the first flap/tap in “editor/auditor”. Seven more participants followed the same pattern, but not as consistently (9, 10, 12, 14, 16, 17 and 18). One participant (13) followed the opposite pattern, producing mostly [ɾ] during “editor/auditor”, and [ɾ] during “edify/audify”. One participant (3) produced mostly [ɾ] in all four phrases. The last participant (21) produced a mixed bag of [ɾ], [ɾ] and [ɾ].

Wilcoxon Signed-Rank tests were performed on the data summarized in Figure 6. For each of the four flap/tap variants, the percentage of productions matching that variant were compared based on “edify/audify” vs. “editor/auditor”. As expected from the descriptive statistics in Figure 6, the results show that there is a significantly higher ratio of [ɾ] in “edify/audify” than the first flap/tap of “editor/auditor” (V -127, p = 0.003), and a significantly lower ratio of [ɾ] “edify/audify” than the first flap/tap of “editor/auditor” (V = 8, p =0.002).

Prediction 1a

Wilcoxon Signed-Rank tests were performed on the data for “editor/auditor” summarized in Figure 6, and “edit/audit a” summarized in plot 7. For each of the four flap/tap variants, the percentage of productions matching that variant were compared based on “edit/audit a” vs. “editor/auditor”. As expected from the descriptive statistics in Figures 6 and 7, the results show that there is a significantly higher ratio of [ɾ] in “edit/audit a” than the first flap/tap of “editor/auditor” (V -148, p < 0.001), and a significantly lower ratio of [ɾ] “edit/audit a” than the first flap/tap of “editor/auditor” (V = 0, p < 0.001).

Figure 7.

Figure 7

flap/tap sequences for “editor/auditor”. Y-axis = initial flap/tap, X-axis = final flap-tap. (Color online)

Prediction 2

We predicted predominantly [ɾ] final sequences for “editor/auditor”. The full sequence of flap/tap variants produced for “editor/auditor” is shown in Figure 7. Note from Figure 7 that of the 238 [ɾ] tokens produced, 234 of them were followed by a [ɾ].

Hypothesis 2: Subphonemic planning across Word Boundary

The following section tests for subphonemic planning across word boundaries, as described in prediction 3.

Prediction 3

We predicted more [ɾ] for phrases “edit/audit the”, and more initial [ɾ] for phrases “edit/audit a”.

Figure 8 plots all instances of flap/tap variants produced during “We have him edit/audit the books” vs. “We have him edit/audit a book”. Note that several participants, particularly participant 5, sometimes did not produce a second flap/tap in “edit/audit a” because they slowed down and produced the flap/tap as a stop. These individual tokens were excluded from statistical analysis. Figure 9 plots the flap/tap sequences in “We have him edit/audit a book” in order to show sequence context not otherwise visible in Figure 8.

Figure 8.

Figure 8

Comparison of first flap/tap variants in “edit/audit a” vs. the flap/tap variants in “edit/audit the”. Left side = by-subject comparison. Right-side = All subject combined comparison. Each box contains four quadrants sized proportionally to the counts of the relevant flap/tap variant. Upper-left = alveolar tap [ɾ]. Upper-right = down-flap [ɾ]. Lower-left = up-flap [ɾ]. Lower-right = postalveolar tap [ɾ]. (Color online)

Figure 9.

Figure 9

Count by flap/tap sequences in “edit/audit a”. Y-axis = initial flap/tap, X-axis = final flap-tap. (Color online)

The descriptive statistics seen in Figure 8 show that one participant (2) consistently used [ɾ] for “edit/audit a”, and [ɾ] for “edit/audit the”, whereas 6 participants (10, 12, 15, 16, 17 and 18) used [ɾ] in “edit/audit a” sequences more often than in “edit/audit the” sequences. Two other participants (3 and 21) have a mixture of [ɾ], [ɾ] and [ɾ]. Four participants (8, 9, 14 and 23) have predominantly [ɾ] in all conditions. One participant (21) produced only [ɾ] and [ɾ] tokens. Three participants (13, 15 and 26) have mostly [ɾ] in “edit/audit a” and [ɾ] in “edit/audit the”, and one (3), had more [ɾ] for “edit/audit the”, which is opposite the predicted pattern.

Because it is clear that many individuals appeared to produce different flap/tap variants for “edit/audit a” vs. “edit/audit the”, Wilcoxon rank-sum tests were performed on each of the participants separately. These tests show significantly more tokens with [ɾ] for the first T of “edit/audit a” and more tokens with [ɾ] for “edit/audit the” for 5 of 18 participants (2, 10, 12, and 15, 17 and 18), and marginal significance for 1 more (participant 16), as seen in Table 2. Note, however, that for one participant (15), the test showed a significant result is in the opposite direction.

Table 2.

Wilcoxon Rank Sum tests comparing prevalence of final flap/tap variants in “edit/audit a” vs. “edit/audit the”.

Flap variant
] ] ]
participant W p W p W p
2 25.5 *p < 0.001 144.5 NA 263.5 *p < 0.001
3 90 p = 0.601 94.5 p = 0.353 58.5 p = 0.184
4 126 p = 0.194 112 NA 98 p = 0.194
5 NA NA NA NA NA NA
6 168 p = 0.541 153 NA 138 p = 0.541
8 162 NA 162 NA 162 NA
9 36 NA 36 NA 36 NA
10 60 *p < 0.001 153 NA 246 *p < 0.001
12 76.5 *p = 0.007 144.5 NA 212.5 *p = 0.007
13 198 p = 0.119 162 NA 126 p = 0.119
14 136 p = 0.347 144.5 NA 153 p = 0.347
15 198 *p = 0.015 153 NA 108 *p = 0.015
16 87.5 +p = 0.069 117 NA 146.5 +p = 0.068
17 96.5 *p = 0.047 136 NA 175.5 *p = 0.047
18 108 *p = 0.015 153 NA 198 *p = 0.015
21 153 p = 0.412 117 p = 0.412 135 NA
23 20 p = 0.716 18 NA 16 p = 0.716
26 129 p = 0.211 96 p = 0.260 99 p = 0.541
*

= significant (α = 0.05)

+

= marginally significant (α = 0.1).

All comparisons are against [ɾ].

In addition, Wilcoxon Signed-Rank tests were performed for between-subject analysis. The results of between-subject Wilcoxon signed rank tests were not statistically significant.

Prediction 4

While all four participants who produced [ɾ] (3, 21, 23 and 26) produced more [ɾ] in “edit/audit the” than in “edit/audit a”, as predicted, the results were not statistically significant using either Wilcoxon test.

Discussion

The results show that, as per prediction 1, participants produce more instances of [ɾ] for “edify/audify” and more instances of [ɾ] for the first flap/tap in “editor/auditor”. Similarly, as per prediction 2, the results show that participants produce a number of [ɾ] for the second flap/tap in “editor/auditor” consistent with the number of [ɾ] for the first flap/tap. These results support the hypothesis that speakers can plan tongue tip motion sequences across morpheme boundaries, and that the selection of the initial flap/tap variant in “editor/auditor” is related to the final posture of the rhotic/vowel segment in accord with the end-state comfort effect, as described in Figures 1 and 2. In addition, supporting prediction 1a, speakers produce more instances of [ɾ] for the first flap/tap in “editor/auditor” than for that in “edit/audit a”, supporting the hypothesis that the rhotic vowel at the end of the sequence had more of an influence than the absence of a second flap/tap for “edify/audify”.

For prediction 3, and as seen in Figure 9, most of the participants produced [ɾ] [ɾ] sequences for “edit/audit a”. While this result is consistent with an end-state-comfort analysis of the influence of sequence-final rhotic vowels on preceding flap/taps, it was prevalent enough to make prediction 3 difficult to test. This behavior was impossible to exclude from between-subject analysis, and attenuated the possibility of identifying an end-state comfort effect through less-than-ideal production of initial or subsequent non-rhotic vowels when comparing “edit/audit a” with “edit/audit the”.

However, the descriptive statistics show that, overall, participants were still more likely to produce [ɾ] for the “audit/edit the” phrases, and [ɾ] for “audit/edit a” phrases, and that these results were significant for 6 participants, and marginally significant (α = 0.1) for 1 other. This provides some support for prediction 3.

For prediction 4, there were simply too few examples of [ɾ] for any of the statistical tests to function. For the four relevant participants, their behavior trended in the direction hypothesized - but the results were not statistically significant.

Our conclusion is that there was a tendency to strategize tongue tip motion to accommodate end-states across word boundaries, but there were too many differences between the speakers, and possibly too small an effect of flap sequences in comparison to the influence of surrounding vowels, to demonstrate that the effect was statistically significant for the group as a whole.

Goldrick and Blumstein (2006) and McMillan and Corley (2010), argue that overlapping activations of word and phoneme plans generate gradient effects on articulation in speech production. They therefore do not argue for planning at the subphonemic level. However, the present findings support the idea that desired articulations at the end of a complex word or phrase sequence can influence tongue tip trajectories from the very beginning of the sequence. Instead of competition between overlapping activations in a context-free system, our data suggest there is an interaction between an upcoming speech plan and earlier tongue tip motion trajectories. Within an overlapping activation system, this is evidence not just for subphonemic planning, but for look-ahead planning in the form of anticipatory accommodation of end-state comfort.

Alternate analyses of these observations, wherein a wide range of possible outcomes are stored rather than planned, could also help account for these results. For instance, usage-based grammars (e.g., Bybee, 1995; Tomasello, 2005) where commonly used chunks such as “edit/audit a” vs. “edit/audit the” are stored in memory could explain differences in flap/tap selection. Similarly, exemplar theories (e.g., Pierrehumbert, 2001) in which multiple variants of the whole word “edit” and “audit” are stored based on individual experience, with different contextual variants used before vowels and consonants, could explain different flap/tap selection between phrase contexts. Lastly, models that allow complex representation of each word with stored choices at specific points in the word (e.g., Hudson, 1980) could also account for the observed trends. Note that all three of these memory-based analyses would need to encode instances of groups of words spanning a phrase boundary (as in “edit/audit a/the”), and not just single words or even phrases. Further, these theories must still involve subphonemic planning at some stage, as the production output is optimized to anticipate a comfortable end-state. Memory-based explanations based on phonological context must account for the realization that the observed behaviors are themselves behavioural trends, not behavioural certainties. Lastly, they do not account for the remarkable degree of variation we have already seen within speaker and context (see Derrick & Gick, 2011). Our results thus demonstrate look-ahead planning across both morpheme and word boundaries, necessitating a theory of subphonemic planning in models of the motor control of speech production.

Nevertheless, the high degree of variability suggests that individuals speaking the same language likely generate substantially different subphonemic plans from each other, and may additionally be influenced by intrinsic speaker and speech act conditions (e.g., motor ability, speech rate) combined with ongoing optimization in speech production. In addition, our previous research has shown that extrinsic effects such as gravity and elasticity can influence interactions between flap/taps and rhotics during the word “Saturday”, leading to a strong preponderance of up-flap, down-flap sequences (i.e., [sæɾɻ̩ɾei]; see Derrick, Stavness and Gick, Under review).

Future work

In order to understand the relationship between speech rate and subphonemic planning, we are currently conducting follow-up research where speakers produce similar word/phrase sequences at different speech rates. This follow-up study combines ultrasound and electromagnetic articulometry (EMA) to look for both categorical and gradient evidence for end-state comfort effects on flap/tap sequences across morpheme/word boundaries. We believe that the results will uncover a relationship between speech rate and optimization of subphonemic planning that can lead to a speaker employing distinct but stable speech plans under different speaking conditions, hearkening to multiple stabilities observed in other motor subsystems, such as locomotion (e.g. Dominici, et al., 2011).

Acknowledgements

This research was funded by a Discovery Grant from the Natural Sciences and Engineering Council of Canada (NSERC) to the second author, and by National Institutes of Health (NIH) Grant DC-02717 to Haskins Laboratories. Thanks to Cathi Best and Jason Shaw for help with argumentation. Special thanks to Aislin Stott for labeling and segmenting the acoustic data.

Contributor Information

Donald Derrick, University of Canterbury, New Zealand Institute of Language, Brain and Behaviour, Private Bag 4800, Christchurch, 8140, New Zealand, +64 3364 2987 x 8321, donald.derrick@gmail.com.

Bryan Gick, University of British Columbia, Department of Linguistics, Totem Field Studios, 2613 West Mall, Vancouver, BC, Canada, V6T 1Z4, +1 604 822 4817, gick@mail.ubc.ca.

References

  1. Barbier G, Perrier P, Ménard L, Payan Y, Tiede MK, Perkell JS. Speech planning as an index of speech motor control maturity. Proceedings of the 14th Annual Conference of the Internation Speech Communication Association (INTERSPEECH 2013); Lyon. 2013. pp. 1278–1282. [Google Scholar]
  2. Bell-Berti F, Krakow RA, Gelfer CE, Boyce SE. Anticipatory and Carryover Effects: Implications for Models of Speech Production. In: Bell-Berti F, Raphael LJ, editors. Producing Speech: A Festschrift for Katherine Safford Harris. Vol. 6. NY: AIP; 1995. pp. 77–93. [Google Scholar]
  3. Bell-Berti F, Harris KS. Anticipatory coarticulation: Some implication from a study of lip rounding. Journal of the Acoustical Society of America. 1979;65(5):1268–1270. doi: 10.1121/1.382794. [DOI] [PubMed] [Google Scholar]
  4. Bernhardt BH, Stemberger JP. Handbook of phonological development from a nonlinear constraints-basted perspective. Academic Press; 1998. [Google Scholar]
  5. Boersma P. Praat, a system for doing phonetics by computer. Glot International. 2001;5(9/10):341–345. [Google Scholar]
  6. Boyce SE. Coarticulatory organization for lip rounding in Turkish and English. Journal of the Acoustical Society of America. 1990;88:2584–2595. doi: 10.1121/1.400349. [DOI] [PubMed] [Google Scholar]
  7. Browman CP, Goldstein L. Articulatory gestures as phonological units. Phonology. 1989;6:201–251. [Google Scholar]
  8. Butterworth B. Hesitation and semantic planning in speech. Journal of Psycholinguistic Research. 1975;4(1):75–87. [Google Scholar]
  9. Bybee J. Regular morphology and the lexicon. Language and Cognitive Process. 1995;10(5):425–455. [Google Scholar]
  10. Chapman KM, Weiss DJ, Rosenbaum DA. Evolutionary roots of motor planning: The end-state comfort effect in lemurs. Journal of Comparative Psychology. 2010;124(2):229–232. doi: 10.1037/a0018025. [DOI] [PubMed] [Google Scholar]
  11. Chiba T, Kajiyama M. The vowel: Its nature and structure. Phonetic Society of Japan. 1958 [Google Scholar]
  12. Cohen RG, Rosenbaum DA. Where grasps are made reveals how grasps are planned: generation and recall of motor plans. Experimental Brain Research. 2004;157(4):487–495. doi: 10.1007/s00221-004-1862-9. [DOI] [PubMed] [Google Scholar]
  13. Delattre P, Freeman D. A dialect study of American Rs by x-ray motion picture. Linguistics. 1968;44:29–68. (1968) [Google Scholar]
  14. Dell GS. A spreading-activation theory of retrieval in sentence production. Psychological Review. 1986;93(3):283–321. [PubMed] [Google Scholar]
  15. Derrick D. PhD thesis. University of British Columbia; 2011. Kinematic patterning of flaps, taps and rhotics in English. [Google Scholar]
  16. Derrick D, Gick B. Individual variation in English flaps and taps: A case of categorical phonetics. Canadian Journal of Linguistics. 2011;56(3):307–319. [Google Scholar]
  17. Derrick D, Stavness I, Gick B. Three speech sounds, one motor action: Evidence for speech-motor disparity from English flap production. Journal of the Acoustical Society of America. doi: 10.1121/1.4906831. (Under Review) [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Dominici N, Ivanenko YP, Cappellini G, D’Avella A, Mondì V, Cicchese M, Fabiano A, Silei T, Di Paolo A, Giannini C, Poppele RE, Lacquaniti F. Locomotor Primitives in Newborn Babies and Their Development. Science. 2011;334:997–999. doi: 10.1126/science.1210617. [DOI] [PubMed] [Google Scholar]
  19. Fowler CA. Coarticulation and theories of extrinsic timing. Journal of Phonetics. 1980;8:113–133. [Google Scholar]
  20. Frisch SA, Wright R. The phonetics of phonological speech errors: An acoustic analysis of slips of the tongue. Journal of Phonetics. 2002;30:139–162. [Google Scholar]
  21. Fukaya T, Bird D. An articulatory examination of word-final flapping at phrase edges and interiors. Journal of the International Phonetic Association. 2005;35(1):45–58. [Google Scholar]
  22. Gick B. The use of ultrasound for linguistic phonetic fieldwork. Journal of the International Phonetic Association. 2002;32(2):113–121. [Google Scholar]
  23. Gick B, Bird S, Wilson I. Techniques for field application of lingual ultrasound imaging. Clinical Linguistics & Phonetics. 2005;19(6/7):503–514. doi: 10.1080/02699200500113590. [DOI] [PubMed] [Google Scholar]
  24. Goldrick M, Blumstein SE. Cascading activation from phonological planning to articulatory processes: Evidence from tongue twisters. Language and Cognitive Processes. 2006;21(6):649–683. [Google Scholar]
  25. Hawkins S, Nguyen N. Influence of syllable-coda voicing on the acoustic properties of syllable-onset /l/ in English. Journal of Phonetics. 2004;32(2):199–231. [Google Scholar]
  26. Hagiwara R. PhD thesis. LA: University of California; 1995. Acoustic Realizations of American /r/ as Produced by Women and Men. [Google Scholar]
  27. Henke W. PhD thesis. Massachusetts Institute of Technology; 1966. Dynamic articulatory model of speech production using computer simulation. [Google Scholar]
  28. Hudson G. Automatic alternations in non-transformational phonology. Language. 1980;51(1):94–125. [Google Scholar]
  29. Irtel H. Version 2.1.11. Mannheim, Germany: University of Mannheim; 2007. PXLab: The Psychological Experiments Laboratory [online] [Google Scholar]
  30. Joos M. Acoustic phonetics. Language. 1948;24:5–136. [Google Scholar]
  31. Kier WM, Smith KK. Tongues, tentacles and trunks: The biometrics of movement in muscular-hydrostats. Zoological Journal of the Linnean Society. 1985;83(4):307–324. [Google Scholar]
  32. Keating PA. The window model of coarticulation: articulatory evidence. In: Kingston J, Bechman M, editors. Papers in Laboratory Phonology I. Cambridge University Press; 1990. pp. 451–470. [Google Scholar]
  33. Krakow RA, Bell-Berti F, Wang QE. Supralaryngeal declination: Evidence from the velum. In: Bell-Berti F, Raphael LJ, editors. Producing Speech: A Festschrift for Katherine Safford Harris. Vol. 23. NY: AIP; 1995. pp. 333–353. [Google Scholar]
  34. Levelt WJM. Speaking: From Intention to Articulation. MIT Press; 1989. [Google Scholar]
  35. Levelt WJM. Do speakers have access to a mental syllabary? Cognition. 1994;50:239–269. doi: 10.1016/0010-0277(94)90030-2. [DOI] [PubMed] [Google Scholar]
  36. McMillan CT, Corley M. Cascading influences on the production of speech: Evidence from articulation. Cognition. 2010;117:243–260. doi: 10.1016/j.cognition.2010.08.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Medicines and Healthcare products Regulatory Agency Evaluation report MHRA 03107. 2004 [Google Scholar]
  38. Mowrey RA, MacKay IRA. Phonological primitives: Electromyographic speech error evidence. Journal of the Acoustical Society of America. 1990;88(3):1299–1312. doi: 10.1121/1.399706. [DOI] [PubMed] [Google Scholar]
  39. Munhall KG, Kawato M, Vatikiotis-Bateson E. Coarticulation and physical models of speech production. In: Broe MB, Pierrehumbert JB, editors. Papers in Laboratory Phonology V: Acquisition and the Lexicon. chaper 1. Cambridge University Press; 2000. pp. 9–28. [Google Scholar]
  40. Ohman S. Coarticulation in VCV utterances: Spectrographic measurements. Journal of the Acoustical Society of America. 1966;39(1):151–168. doi: 10.1121/1.1909864. [DOI] [PubMed] [Google Scholar]
  41. Ohman S. Numerical model of coarticulation. Journal of the Acoustical Society of America. 1967;41(2):310–320. doi: 10.1121/1.1910340. [DOI] [PubMed] [Google Scholar]
  42. Perkell JS. Research Monograph No. 53. MIT. Press; 1969. Physiology of Speech Production: Results and Implications of a Quantitative Cineradiographic Study. [Google Scholar]
  43. Pierrehumbert J. Exemplar dynamics: Word frequency, lenition and contrast. In: Bybee J, Hopper P, editors. Frequency effects and emergent grammar. John Benjamins; 2001. pp. 137–157. [Google Scholar]
  44. R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2013. [Google Scholar]
  45. Roelfs A. The weaver model of word-form encoding in speech production. Cognition. 1997;64(3):249–284. doi: 10.1016/s0010-0277(97)00027-9. [DOI] [PubMed] [Google Scholar]
  46. Rosenbaum DA, van Heugten CM, Caldwell GE. From cognition to biomechanics and back: the end-state comfort effect and the middle-is-faster effect. Acta Psychologica (Amsterdam) 1996;94(1):59–85. doi: 10.1016/0001-6918(95)00062-3. [DOI] [PubMed] [Google Scholar]
  47. Rosenbaum DA, Vaughan J, Barnes HJ, Jorgensen MJ. Time course of movement planning: selection of handgrips for object manipulation. Journal of Experimental Psychology: Learning, Memory and Cognition. 1992;18(5):1058–1073. doi: 10.1037//0278-7393.18.5.1058. [DOI] [PubMed] [Google Scholar]
  48. Saltzman E, Munhall G. A dynamical approach to gestural patterning in speech production. Ecological Psychology. 1989;1:333–382. [Google Scholar]
  49. Sloetjes H, Wittenburg P. Annotation by category – ELAN and ISO DCR; Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC); 2008. [Google Scholar]
  50. Shattuck-Hufnagel S. Prosody: Theory and Experiment. Springer; 2000. Phrase-level phonology in speech production planning: Evidence for the role of prosodic structure. [Google Scholar]
  51. Stone M, Epstein MA, Iskarous K. Functional segments in tongue movement. Clinical Linguistics & Phonetics. 2004;18(6):507–521. doi: 10.1080/02699200410003583. [DOI] [PubMed] [Google Scholar]
  52. Sumbre G, Graziano F, Flash T, Binyamin H. Neurobiology: Motor control of flexible octopus arms. Nature. 2005;433:595–596. doi: 10.1038/433595a. [DOI] [PubMed] [Google Scholar]
  53. Sumbre G, Gutfreund Y, Fiorito G, Flash T, Hochner B. Control of octopus arm extension by a peripheral motor program. Science. 2001;293(5536):1845–1848. doi: 10.1126/science.1060976. [DOI] [PubMed] [Google Scholar]
  54. Tomasello M. Constructing a language: A usage-based theory of language acquisition. Harvard University Press; 2005. [Google Scholar]
  55. Weiss DJ, Wark JD, Rosenbaum DA. Monkey See, monkey plan, monkey do: The end-state comfort effect in cotton-top tamarins (Saguinus oedipus) Psychological Science. 2007;18(12):1063–1068. doi: 10.1111/j.1467-9280.2007.02026.x. [DOI] [PubMed] [Google Scholar]
  56. Winkler R, Ma L, Perrier P. A model of optimal speech production planning integrating dynamical constraints to achieve appropriate articulatory timing. Proceedings of the 9th ISSP, Montreal. 2011:235–236. [Google Scholar]
  57. Whalen DH. Coarticulation is largely planned. Journal of Phonetics. 1990;18:3–35. [Google Scholar]
  58. Wilcoxon F. Individual comparisons by ranking methods. Biometrics Bulletin. 1945;1(6):80–83. [Google Scholar]
  59. Zue VW, Laferriere M. Acoustic study of medial /t,d/ in American English. Journal of the Acoustical Society of America. 1979;66(4):1039–1050. [Google Scholar]

RESOURCES