. Author manuscript; available in PMC: 2008 Dec 9.

Published in final edited form as: J Speech Lang Hear Res. 2007 Dec;50(6):1510–1545. doi: 10.1044/1092-4388(2007/104)

Appendix 1.

Summary table of the 21 studies on vowel development reviewed listed chronologically. Study abbreviations (Abbr.) in bold and marked with an asterisk in the first column indicate the 14 studies from which formant values were used in the summary average plots F1-F2 plots (Figures 2, 3 & 4) and F1-F3 plots (Figures 5, 6 & 7). The last two columns further specify the three groups (children, male, & female; C-M-F), and the exact ages from which the formant values were included in the averages for Figures 2 to 7.

Abbr.	Study	Subject Detail	Age	Vowels in plots	Methods of obtaining the vowel	Additional Details	F1-F2 Plot C-M-F	F1-F3 Plot C-M-F
PB *	Peterson and Barney (1952)	n=76 Children: n1=15 Adults: n2=33M n3=28F	Children 9yrs and Adults	/i/, /u/, /ae/, /a/	Lists with ten monosyllabic words: heed, hid, head, had, hod, hawed, hood, who'd, hud, heard. General American English.	Two random word lists per speaker producing 1520 recorded words. Analysis via sound spectrograph. Formant frequencies estimated from weighted average of the frequencies of the principal components in the formant.	C at: -9 yrs M & F: - Adults	C at: -9 yrs M & F: - Adults
EH *	Eguchi and Hirsh (1969)	n=84 n per age/sex group = 5/6	Children 3-13 yrs and Adults	/i/, /u/, /ae/, /a/	Two sentences: He has a blue pen. I am tall. Vowels in American English.	Each sentence produced/read on five different occasions. Repetition after a native speaker for children <7 years. Used wide-band and narrow band spectrograms. F1 and F2 estimated from spectrum envelopes drawn on expanded narrow band sections (0-4000Hz).	C at: -3, 4, 5, 6, 7, 8, 9, 10 & 11 yrs M & F at: -11, 12 &13 - Adults	No F3 values reported.
B	Buhr (1980)	n=1 1M	Child 16-64 weeks	/i/, /u/, /ae/, /a/	Select sounds classified by phonetician as a particular vowel sound of English.	Biweekly recordings. 944 vowels analyzed. Spectrograms measured by determining the first three formants for each sound. Formant values for the different vowels can be extracted from the plots.	Not plotted	Not plotted
KM *	Kent and Murray (1982)	n=21 n1=7 3F; 4M n2=7 3F;4M n3=7 4F;3M	Children n1: 3 ms n2: 6 ms n3:9 ms	Extreme F1 F2 values of the four corners were used to define acoustic space.	Infant vocalizations (comfort state) coded according to vocalic, phonation and noise segment (Coding table I, pg.355).	Vocalic utterances: f₀ determined from narrowband (45 Hz) spectrograms and formant frequencies determined from wideband (450Hz) spectrograms. F1-F2 formant values per age group were extracted from Figures 8-10, p. 357	C at: - 9 mos Note: Ages 3 & 6 mos not plotted	Average F3=5K. No mean F3 values for the different corner vowels reported
PG *	Pentz and Gilbert (1983)	n=not available	Children 7, 8 and 9 yrs	/i/, /u/, /ae/, /a/	Not available.	Not available. Poster presentation at the 1983 annual convention of the American Speech-Language Hearing Association. Values reported in Kent 1994, p.73	C at: - 7, 8 & 9 yrs	C at: - 7, 8 & 9 yrs
H *	Hodge (1989)	n=115 n1=15 12M; 3F n2=20 19M; 1F n3=20 20M	Children n1: infants mean = 8.5 mos n2: 1 yr mean = 18 mos n3: 3, 5, 9 yrs & Adults	< 9 mos: /i/, /u/, /ae/ 1 yr. - adult /i/, /u/, /ae/, /a/	Isolated productions of the point and central vowels precede by aspirate /h/; reduplicated triple stop consonant vowel(CV) chain; words ‘baby’ and ‘bye bye’; sustained /s/ sounds; and words ‘two’ and ‘tea’; and two word combinations ‘a stee’ and ‘a stew’. Vowels in Canadian English.	Used spectographic displays with filter bandwith and dynamic range settings that provided the clearest formant pattern. F1, F2 & F3 values were estimated from the vocalic segments by tracing the midpoint of the strong energy region corresponding to each of the formants.	C at: - 9 mos - 1, 3, 5 & 9 yrs. M at: -3, 5, 9 & Adults	C at: - 9 mos - 1, 3, 5 & 9 yrs. M: - 3, 5, 9 & - Adults
CW *	Childers and Wu (1991)	n=52 27M; 25F	Adults 20-80yrs	/i/, /u/, /ae/, /a/	Sustained each vowel as it would be pronounced in the following words: beet, bit, bet, bat, Bob, bought, book, boot, but, Bert. American English Vowels.	Formant information extracted by closed phase weighted recursive least square method with a variable forgetting factor. (WRLS-VFF).	M & F: - Adults	M & F: - Adults
ZJ *	Zahorian and Jagharghi (1993)	n=30 Children n1=10 5M; 5F Adults: n2=20 10M; 10F	Children 11yrs and Adults	/i/, /u/, /ae/, /a/	99 CVC syllables produced in isolation. CVC syllable list contained 9 instances of each of 11 vowels /iy, ih, eh, ae, ah, aa, ao, ow, uh, uw, er/. CVC list is in Appendix A (Zahorian & Jagharghi, 1991). Vowels in American English-various dialects.	2922 vowels/formant frequency analyzed. 50msec Hanning window & 10^th order LP model computed. Used a dynamic programming approach to select actual formant values with formant seed values/data from Peterson & Barney. Formant tracking verified visually & with global spectral shape.	C at: -11 yrs M & F: - Adults	C at: -11 yrs M & F: - Adults
BP *	Busby and Plant (1995)	n=40 n =10 per age group 5M; 5F	Children 5, 7, 9 & 11 yrs	/i/, /u/, /ae/, /a/	Words: sheep, ship, bed, cat, cart, cut, four, dog, shoe, book, bird Carrier phrase: “I can see a.....” Test words presented orthographically & pictorially. Vowels in Australian English.	440 values per formant frequency. f₀ estimated from spectrographic traces of the first two harmonics. F1-F3 values determined from the steady state portion of vowel using spectrogram and average power spectrum displays. Formant values extracted from Fig2-3, p. 2605.	C at: - 5, 7, 9 & 11 yrs M & F at: - 5, 7, 9 & 11 yrs	No mean F3 values per vowel. Mean F3 across all vowels/age group in Fig1 p. 2604.
HW *	Hagiwara (1997)	n=15 6M; 9F	Adults 18-26 yrs	/i/, /u/, /ae/, /a/	33 words database with 11 vowels: beat, bit, bate, bet, bat, boot, put, boat, bought, but, Bert. Each word presented in the frame ‘Cite … twice”. Vowels in American English.	30 msec window Digitized at 10Khz Formants determined from wide band spectrogram and narrow band FFT spectra & LPC analysis.	M & F: - Adults	M & F: - Adults
HG *	Hillenbrand, Getty, Clark, and Wheeler (1995)	n=139 Children n1=46 27M; 19F Adults n2=93 45M; 48F	Children 10-12 yrs and Adults	/i/, /u/, /ae/, //	Read randomized lists with words: heed, hid, hayed, had, hod, hawed, hoed, hood, who'd, hud, heard, hoyed, hide, hewed, how'd. Vowels in American English-upper Midwest dialect.	Digitized at 16kHz with 12 bit resolution. F1-F4 measured from LPC spectra over 16 msec hamming windows. Measures made while viewing a spectral peak display and a spectrogram.	C at: - 11 yrs M & F: -Adults	C at: - 11 yrs M & F: -Adults
Y *	Yang (1996)	n=20 10M; 10 F Note: The n reflects only the number of American English speakers/participants.	Adults 18-27 yrs	/i/, /u/, /ae/, /a/	Read list with the following 13 vowels/words: had, hard, hawed, hayed, head, heed, herd, hid, hod, hoed, who'd, Hudd, and hood. Each word was present/ produced 5 times. American English vowels. Korean vowels are not included in this summary.	F1-F3 are average values of 3 repetitions/ vowel per speaker. Input samples were low pass filtered at 4 kHz and digitized at 10 kHz sampling rate. Spectrograms used 256-point DFT analysis with a 6.4 ms Hamming window once every ms. Formant measure taken 1/3 into the vowel duration (on/offset)	M & F: - Adults	M & F: - Adults
KMe	Kuhl and Meltzoff (1996)	n=72 n=24 per age group	Children 3, 4 & 5 months	/i/, /u/, /a/ - like vowels	Infants listened to 8 different productions representative of /i/ as in “heap”, /u/ as in “hoop” and /a/ as in “hop. Infants produced vocalizations resembling the vowel they heard. Vowel-like vocalizations analyzed following perceptual coding as /i/, /u/, /a/ -like.	Formant values were based on agreement of 3 analyses: Narrowband spectrogram (114 Hz), fast Fourier transform (256 points, preemphasis, Blackman window weighting), & LPC frequency response (10 ms frame length, filter order=12).	Not plotted	Not plotted
GRC	Gilbert, Robb and Chang (1997)	n=4 4M	Children 15-36 mos Sampling sessions at: 15, 18, 21, 24 & 36 mos	Vowels not specified	Spontaneous vocalizations. Phonetic classifications made as belonging to tongue height (high, middle, low) and tongue advancement features (front, central, back).	1334 vowels analyzed. Analysis used a wideband (450 Hz) spectrogram. Center frequencies of the first two steady state formants were judged to be F1 & F2.	Not plotted	Not plotted
RG	Robb, Chen and Gilbert (1997)	n=20 9M; 11F	Children 4-25 months	Average formant frequency	Comfort-state non-cry vocalizations were collected from each child. Total of 1743 vowel-like sounds.An average of 88 vowel-like sounds/ child. Every distinguishable vocalization was phonetically transcribed.	Digitized at 10kHz with 16 bit quantization. Using an amplitude by time waveform, the on/offset per vowel was determined, and transformed to a narrowband spectrogram (24 Hz). F1 and F2 was determined from LPC spectra (6 coefficients across a 20 ms Hamming window).	Not plotted	Not plotted
LPN *	Lee, Potamianos and Narayanan (1999)	n=492 Children n1=436 229M; 207 F Adults n2=56	Children 5-18yrs Adults 25-50 yrs	/i/, /u/, /ae/, /a/	15 target words and 5 sentences presented on computer monitor. Utterances produced twice in random. Words were: bead, bit, bet, bat, pot, ball, but, put, boot, bird - produced in a carrier sentence: “I say uh---again”; 5-6 year olds who produced target words in isolation. Sentences were: He has a blue pen. I am tall. She needs strawberry jam on her toast. Chuck seems thirsty after the race. Did you like the zoo this spring? Vowels in American English.	3265 pairs analyzed. Digitized at 20kHz sampling rate and 16 bit resolution. Utterances segmented using the AT&T hidden Markov model recognition engine. f₀ and formant values were secured using an automatic Fo and formant-tracking program in ESPS signal processing package. Automatic vs. a sample of manual estimations were compared. Waveforms were downsampled to 12kHz and processed using a 12msec Hamming window.	C at: - 5, 6, 7, 8, 9, 10 & 11 M & F at: - 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18 yrs & Adults	C at: - 5, 6, 7, 8, 9, 10 & 11 M & F at: - 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18 yrs & Adults
WH	Whiteside and Hodgson (2000)	n=29 Children n1=20 10M;10F Adults n=9 4M;5F	Children Ages 6 & 9 yrs./ age group 3M; 3F Age 10 4M; 4F Adults age mean 37.4	/a/	Picture naming task. Target phrases: The red/blue/green bar/jar/car. Four distracters: The red boat, The green balloon. Vowels in UK English	9 phrase final vowels/subject analyzed. Digitized at 10kHz sampling rate. Fo calculated using autocorrelation method (20 ms frame length). F1-F3 calculated at midpoint of final vowel using automatic LPC analysis.	Not plotted	Not plotted
AK *	Assmann and Katz (2000)	n=50 Children n1=30; 10 per age group Adults n=20 10M; 10F	Children 3, 5 & 7 yrs Adults	/i/, /u/, /ae/, /a/	Words: heed, hid, hayed, head, had, hud, hod, hawed, herd, hoed, hood, who'd. 6 random repetitions per word/vowel. Vowels in American English.	180 vowel tokens/formant frequency analyzed. Digitized at 48kHz and 16-bit resolution. Waveforms were low-pass filtered and resampled at 12kHz. f₀ estimates made using the Meddis & Hewitt '91 pitch model. Formant center frequencies were tracked using a custom MATLAB program.	C at: -3, 5 & 7 yrs M & F at: - Adults	C at: -3, 5 & 7 yrs M & F at: - Adults
POA *	Perry, Ohde, and Ashmead (2001)	n=80 n=20 per age group 10M; 10F	Children 4, 8, 12, & 16 yrs.	/i/, /u/, /ae/, /a/	Seven vowels in the neutral context of /hVd/ -- had, head, heed, hid, hod, hud, who'd -- were embedded in the carrier phrase “Say /hVd/ again”. Each /hVd/ was analyzed 5 of the 7 times produced. Vowels in American English.	2800 vowels analyzed. Digitized at 20kHz sampling rate. f₀ determined using with CSL pitch extraction program. F1, F2, F3 values measured from spectrogram at the midpoint of vowel. Also, LPC analysis with a 10 msec triangular window (14-20 coefficients) and cursor placed at midpoint of F2 stability.	C at: - 4 & 8 yrs M & F at: - 4, 8, 12 & 16 yrs	C at: - 4 & 8 yrs M & F at: - 4, 8, 12 & 16 yrs
NM	Nijland, Maassen, Meulen, Breels, Kraaimaat and Schreuder (2002)	n=25 controls Children n1=19 14M; 5F Adults n2=6 0M; 6F Note: The n reflects only the number of controls for the participants with DAS.	n1: 4;11 to 6;10yrs Adults 6 Females	/i/, /u/, /a/	Disyllabic nonsense utterances of the type [ǝCV], with V as one of extreme vowels /a, i, u/ of the Dutch vowel space and C was either a fricative (the alveolar /s/ or the velar /x/) or a stop(/b/ or /d/). All syllables produced 6 times in the carrier phrase: ‘he de … weer’ (‘hey the … again’). Vowels from native speakers of Dutch.	72 utterances/subject (3 vowels, 4 consonants, 6 repetitions) digitized at 25 kHz sampling rate. F2 trajectory was used to determine utterance types. Used CSL to mark amplitude peaks. Formant values were obtained using pitch synchronous LPC analyses (triangular window; 20 components autocorrelation with pre-emphaiss of .950).	Not plotted in summary plots. Dutch vowels formant values fall within the age specific average acoustic space for English vowels.	Not available
CD	Casal, Dominguez, Fernandez, Sarget, Celdran, Vilalta and Escoda (2002)	n=22 controls for the 22 children with cleft. Note: The n reflect only the number of control participants.	n for controls: n=9 at 22 mos & n=13 at 33 mos.	/i/, /u/, /a/	Spontaneous speech in Spanish elicited to secure needed speech sounds; 5 vowels /a, i, u, e, o/ and 4 consonants /p, t, k, m/ . Vowels from Spanish speakers.	220 vowels/formant frequency. F1 and F2 values were measured at midpoint of steady state of vowels. Midpoint as determined from duration of steady states. Note: Formant values from cleft lip &/or cleft palate are not considered for comparison. Only formant values from their controls were considered.	Not plotted in summary plots. Spanish acoustic space is similar to the average acoustic space for English vowels.	Not available