Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Mar 2.
Published in final edited form as: Clin Linguist Phon. 2015;29(0):686–700. doi: 10.3109/02699206.2015.1041609

Using PhonBank and Phon in Studies of Phonological Development and Disorders

Yvan Rose 1, Carol Stoel-Gammon 2
PMCID: PMC4774542  NIHMSID: NIHMS760410  PMID: 26035223

Abstract

The goal of this paper is to present an overview of new tools that can be used to further our understanding of phonological development and disorders. We begin with a summary of the field of child phonology with a focus on databases and methods of analysis and then move to a description of PhonBank, a shared database for the study of phonology, and Phon, a specialised software system capable of performing various types of phonological analyses based on both phonetic transcriptions and acoustic analyses of speech productions. We provide a detailed example of using PhonBank and Phon to examine the use of velar fronting using longitudinal data from one child with typical development and three children with phonological disorder. We conclude with an emphasis on data sharing and its central relevance to further advances in our field.

Keywords: Phon, PhonBank, database, phonological development, phonological disorders, data sharing

Introduction

Until recently, large-scale investigations of phonological development and/or disorders have been hampered by limited access to corpus data and to specialised tools required for the analysis of these data (Rose & MacWhinney, 2014). While various technologies were developed throughout the last few decades to address the latter problem, access to properly formatted corpora of phonological data has remained a central problem throughout this period. PhonBank, a recent offshoot of CHILDES (the Child Language Data Exchange System; MacWhinney, 2000), and Phon, a software program designed for research in phonetics, phonology and acquisition, now provide a framework for both data sharing and analysis. In this paper, we illustrate how this framework can be used to address current research questions. After a brief summary background, we describe this database and software infrastructure. We then demonstrate how it can be used to study phonological production, taking patterns of velar fronting observed in the speech of both typical and atypical English learners as an illustrative example. Building on this description, we emphasise the communal nature of this project and, in particular, the importance of open standards and of data sharing for research and the related development of clinical applications.

Background

In the past, data on phonological development and disorders came from two relatively disparate fields of study with different goals and different methods of collecting and analysing data (Stoel-Gammon & Bernhardt, 2013). One field, comprised primarily of educators and psychologists, sought to establish norms for phonological acquisition in children with typical development with the goal of identifying those children with atypical development. In the US, norms for children were based on elicited productions of a list of single words that included all (or nearly all) the consonants of English. The first major studies of speech sound development appeared in the 1930s, based on cross-sectional data from large groups of children (Wellman, Case, Mengert & Bradbury, 1931; Poole, 1934; see also Templin, 1957). These norming studies had a very narrow focus, looking only at the acquisition of phonemes (sometimes only consonants) of English in a limited set of words. Wellman and colleagues wished “to determine the development of the ability of children 2-6 years to correctly produce the sounds of English”; the goal of Templin’s study was to “describe the growth of articulation of speech sounds from 3 to 8 years”. The tradition of using large-scale cross-sectional studies as the basis for norms for speech sound development has continued, with a study of productions of 997 children, aged 3-9 years, by Smit and colleagues (Smit, Hand, Freilinger, Bernthal & Bird, 1990; see also Smit, 1993). Findings have provided teachers and clinicians with norms for the ‘age of acquisition’ or ‘age of mastery’ of each phoneme of English and often serve as a guide for the design and implementation of intervention programs for children with atypical speech sound development (e.g. McLeod 2007; Williams, McLeod, & McCauley, 2010).

The second source of data on phonological development and disorders was based primarily on work by linguists and speech scientists and focused on how children acquired phonology rather than what they acquired. Within this approach, attention was given to a wide range of topics including individual differences and relationships among various aspects of the phonological system (e.g., phones and syllable positions; placement of stress; omission patterns). There was no attempt to determine norms or establish orders of acquisition for phonemes. This approach usually involved data from relatively few children, ranging from diary studies of a single child to small groups. Data were gathered in an effort to observe and document patterns of early speech and language development, often in relation to the development of the child’s lexicon, and to explore the notion of universal patterns of acquisition using cross-linguistic data (e.g. Stampe, 1969; Smith, 1973; Ferguson & Farwell, 1975; Macken, 1979; Stoel-Gammon & Cooper 1984; Fikkert, 1994; Levelt, 1994; see Bernhardt & Stemberger, 1998 for a summary of this early literature).

The early diary studies were based parental observations of their children’s speech development. The data for these studies were longitudinal, based on spontaneous speech gathered in naturalistic settings, and typically transcribed and analysed by a single individual. Among the published accounts, many studies were carried out by individuals trained in linguistics, and included sophisticated and detailed transcriptions of children’s speech. Well-known examples of this approach include the work of Velten (1943), Leopold (1949), Smith (1973), and Compton & Streeter (1977), among others.1 We also have access to a few studies of children with atypical phonological acquisition; see, for example, Hinkley (1915). Data from diary studies continue to be an important source of material, as is evident in the work of Inkelas (2003) and Inkelas & Rose (2003, 2007).

Small group studies of phonological development emerged in the 1960s when course offerings in child language acquisition became part of the curriculum in linguistics departments (Stoel-Gammon & Bernhardt, 2013). A classic study in this domain is that of Ferguson and Farwell (1975), who examined the phonetic and phonological features of the first 50 words of three children. Subsequent small-group studies have focused on a variety of issues, including acquisition of a sound class or a syllable shape (e.g. Lieberman, 1980); acquisition of tone in Mandarin (e.g. Li & Thompson, 1977); and comparisons of voicing of stops in children acquiring Spanish and English (e.g. Eilers, Oller, & Benito-Garcia, 1984). Most of the work of this type was based on phonetic transcriptions, although acoustic analysis was also introduced in the 1980s (e.g. Macken & Barton, 1980; Eilers et al., 1984). Nowadays, in line with the more general field of research on phonology, more and more scholars combine acoustic and transcription data as part of their analyses (e.g. Kehoe, Stoel-Gammon, & Buder 1995; Buder 1996; Edwards, Fourakis, Beckman, & Fox 1999).

Small group studies have also involved children with atypical speech and language, allowing researchers and clinicians to compare the phonological systems of children with typical and atypical development. In many cases, the research indicates that differences between typical and atypical phonologies are better captured in quantitative rather than qualitative terms; that is, children with disordered phonologies often display developmental patterns of speech production for longer periods, and may combine such patterns in ways that reveal particular dimensions of the developing system. Thus, phonological acquisition in this population may be referred to as ‘protracted’ (Bernhardt, Romonath, & Stemberger, 2014), or ‘delayed’ (Dinnsen, Green, Gierut, & Morrisette, 2011), while the systems of other children may be described as ‘disordered’ or ‘deviant’ (Dodd, 1995). For many clinicians, the distinction between ‘delayed’ and ‘disordered’ is important, as it has implications for the design and implementation of intervention programs.

Researchers and clinicians can use the data from past work as a springboard for their own studies, allowing them to check the status of particular phenomena and generate new questions. Regardless of the approach to data collection and analysis, research on phonological development and disorders requires an enormous amount of time and patience, particularly in the context of longitudinal studies. When a researcher uses an extant dataset, a diary study for example, there is no need to collect or phonetically transcribe the productions, but the need for sorting and cross-tabulating the data remains.

Building on digital means of textual database building and multimedia data processing, we are now in a position to tackle some of these methodological problems. In the next section, we overview a number of early solutions, and subsequently move our emphasis on the PhonBank database project and the Phon software program, both of which supplement the CHILDES database system in the area of phonological development and disorders.

Recent advances

In the past three decades, research in the area of child language development and disorders has benefited immensely from tools and corpora now available online. The CHILDES database has, since its inception in 1983, allowed us to compare language samples across large numbers of children and test a wide variety of hypotheses and models (http://childes.talkbank.org). The MacArthur-Bates Communicative Developmental Inventories (MB-CDI) provides data on lexical development across thousands of children and many languages (http://www.cdi-clex.org). By comparison, until recently, fewer resources for shared databases and data analysis were available to the field of phonological acquisition. Early systems, each of which was capable of both independent and relational analyses in some fashion, include two small-scale programs coincidentally (although not surprisingly) called ChildPhon (Fikkert, 1994; Levelt, 1994; Freitas, 1997; Rose 2000) as well as more specialised applications such as LIPP (Logical International Phonetics Program; Oller & Delgado 1990), PEPPER (Program to Examine Phonetic and Phonological Evaluation Records; Shriberg, 1986), PROPH+ (Long, Fey, & Channell, 2006), and CAPES (Computerised Articulation and Phonology Evaluation System; Masterson & Bernhardt, 2001), the latter two developed more specifically for clinical purposes and related research. These early programs offered basic templates for textual (orthographic and phonetic transcriptions) annotations as well as functions for data query and reporting. However, these systems also suffered from various types of technical limitations. For example, except for CAPES, none of these applications offered time-alignment between the data transcript and the recorded media, a key feature for the quick retrieval of a media segment within a long audio or video recording. Also, because most of these systems were based on ASCII fonts (as opposed to Unicode) as well as proprietary file formats, data transcriptions performed within these systems were often incompatible with other programs or operating systems. This situation compromised our ability to compare existing corpora, and also imposed serious limitations to data sharing initiatives. The development of many of these systems has now been abandoned (see Rose 2003 for an earlier discussion).

Over the past decade, teams of researchers and programmers involved in the building of PhonBank have provided a number of solutions to the problems identified above. Spearheaded by Brian MacWhinney (Carnegie Mellon University) and Yvan Rose (Memorial University of Newfoundland), this research consortium pursues two inter-related goals: (1) to develop PhonBank, a shared database for the study of phonology, phonological development, and speech disorders; and (2) to develop Phon, a specialised software program for the building and analysis of phonological corpora (Rose & MacWhinney, 2014).

PhonBank and Phon: A brief description

At the present time, PhonBank includes 29 different corpora, each consisting of one or more longitudinal or cross-sectional datasets. Taken together, these corpora document the speech productions of 130 learners of 11 different languages. All data are transcribed phonetically, using the symbols and diacritics of the International Phonetic Alphabet (including the Extended IPA symbols for disordered speech), and follow the conventions systems developed within CHILDES to annotate other aspects of the recorded speech forms, for example pauses, repetitions, or instances of retracing. Many corpora also include digital audio and/or video recordings. The availability of media recordings depends on technical constraints at the time of data collection (especially for the earlier datasets) and on ethical considerations, often because no informed consent toward scientific data sharing was obtained from the participants or their caregivers.

These shared corpora allow us to examine a broad array of phenomena relevant to phonological development and disorders. Work is also underway for the inclusion of more corpora, which will supplement the existing database with new languages and/or language dialects (e.g. Arabic, Berber, Catalan, German, Norwegian, Spanish) as well as new and/or expanded populations of speakers (e.g. bilingual, second-language, and disordered learners). For example, while the set of clinical studies available through PhonBank currently consists of only two corpora of English, contributed by Shula Chiat and Tara McAllister Byun, respectively, additional documentation of protracted phonological development is currently underway for languages as varied as Icelandic, Canadian French, English, European Portuguese, Japanese, Slovenian, Swedish, as well as various dialects of Spanish both from the Iberian Peninsula and the Americas, among others. This upcoming series of corpora will offer maximally comparable datasets based on a uniform methodology developed within the Cross-linguistic Project in Protracted Phonological Development directed by May Bernhardt and Joseph Stemberger at the University of British Columbia.

The tasks involved in corpus development and in the phonological analysis of transcribed corpora such as those described above are aided by Phon, which we describe in more detail below. In sum, PhonBank provides a modern framework through which the research community can address current needs in data specification and availability. It offers flexible functions to collate production patterns over large sets of comparable data, all compatible with CHILDES standards. Researchers adopting this framework can thus accomplish more work, in a more reliable way, and in far less time than with previous methods.

In the sections that follow, we focus on how the functions assembled in Phon can support different types of studies of development (e.g. longitudinal, cross-sectional, or based on score sheets, whose data can be adapted into sets of attempted/produced phones or word forms). We provide an example of the use of Phon to analyse patterns of velar fronting. We then dedicate a section on the importance of data sharing, which we consider one of the most crucial conditions to move the field ahead. We conclude with a special invitation to the clinical community to engage actively with this important initiative.

The Phon software program

As mentioned above, Phon is designed to meet the needs of individuals performing research in phonology, phonological development and disorders. Phon is an open-source software solution built on modern solutions for multimedia support. It also uses Unicode font encoding, which ensures compatibility across all computer platforms. Descriptions of Phon can be found within the recent literature on corpus phonology and phonological development (e.g. Rose, 2010, 2012, 2014; Rose et al., 2006; Rose & MacWhinney, 2014). We are now reaching out to the clinical research community, and wish to invite research-oriented members of this community to contribute to this project. Below we focus on how one can take advantage of this program to perform research on speech production patterns. We then return to the central issue of engagement.

Phon works on Mac OS X, Windows, and Linux operating systems. Available to the community as free software (https://www.phon.ca), it supports the most important steps involved in the phonetic transcription, annotation, and compilation of phonological data. Functions available in Phon include:

  • Systems for multimedia data linkage and word/utterance time alignment

  • Facilities for IPA transcription (e.g. media playback; IPA character map and input functions; phonetic dictionaries for different languages and dialects)

  • Interface for multiple-blind transcriptions and consensus-based transcript validation

  • Systems for automatic labelling of data (e.g. phonetic features; syllabification)

  • Data query and reporting functions, including support for phonological units (e.g. descriptive features; syllable positions; stress)

All of the functions supported by Phon are accessible through a friendly graphical user interface; queries can be based on orthographic or IPA strings of characters, on phonetic features associated with IPA symbols (e.g. features such as ‘labial’, ‘voiced’, ‘fricative’, …), and information about positions within syllables (e.g. onsets, codas), words, or word groups (e.g. phrases). Queries can also incorporate information about participants, for example age or age ranges. In sum, Phon provides a modern framework for the building of multimedia databases compliant with established technological and linguistic standards as well as versatile methods for data annotation, mining, and reporting.

Of particular use to researchers in phonological development and disorders, Phon provides both independent analyses (consonant and vowel inventories; syllable and word shapes; stress patterns) and relational analyses through which the user can identify substitutions and deletions based on individual phones or phone classes (e.g. voiced vs. voiceless obstruents). Phon can also compute Percentage of Consonants (or Vowels) Correct (PCC/PVC) measures at a click, as well as whole-word accuracy measures, which assess children’s productions in terms of phone accuracy or through evaluations of production rates for individual phones or syllable shapes, for example.

We illustrate some of the key functions of Phon below. We use the pattern of velar fronting as our main example, and emphasise methodological considerations central to the study of this phenomenon.

Working with Phon: An illustration

In this section, we show how a researcher or clinician can quickly and systematically characterise production patterns using Phon. We illustrate our discussion through a comparison of two different data sets documenting the productions of English-learning individuals currently available through PhonBank. The first is the Inkelas corpus, a diary study of a child code-named E, a typically developing American boy recorded longitudinally between the ages of 0;6 and 3;10 (Inkelas & Rose, 2003; 2007). The second is the Chiat corpus, which documents three British English boys with atypical phonological development aged between 5;5 and 5;8 at the time of data recording; they are code-named DI, SB, and SR (Chiat 1983, 1994). We refer the interested reader to these original studies for more information about the participants as well as theoretical and clinical implications of their production patterns.

Together, these original studies raise the significant question as to whether the speech patterns of children with atypical development are ‘delayed’ (in which case their productions would resemble those of younger children with typical development), or are ‘disordered’ (in that they qualitatively differ from those of children with typical development). As we will see, while the data from the two datasets are similar in many respects (e.g. segmental, prosodic), issues about variability may point to a differentiation between the two populations of learners.

Data preparation

After downloading the data, the first step consists of labelling the IPA transcriptions for syllable-level positions and for pairwise phone alignments between model (e.g. adult-like) word forms attempted by the child and his/her actual productions of these forms. Phon automatically generates these annotations through specialised algorithms (with different syllabification rules to accommodate language-specific phonotactics). However, these annotations must be systematically verified by the user, who can also modify them according to his/her specific research needs. The interface developed within Phon is designed to make these tasks easy and time-effective. For example, in Figure 1, the alignment data readily reveal a case of |r|-initial deletion as well as a case of velar fronting, whereby |k| is produced as [t].2

Figure 1.

Figure 1

Phon data: Orthography and IPA forms; syllabification and alignment.

We can also see in this interface how Phon visually represents syllabification data through colour coding: Blue for onsets, red for nuclei and green for codas.3

Data query

After these annotations are verified, the corpus is ready for analysis. As mentioned above, Phon can perform a number of different queries. For example, the program can extract inventories of units (e.g. words; syllables; phones; features) based on orthographic, IPA target and/or their corresponding IPA actual forms. Phon can also perform comparisons between pairs of aligned phones, which facilitates research on both segmental and syllable-level production patterns.

As reported by Stoel-Gammon (1996), velar fronting, the process whereby a target velar |k, g| is produced as coronal [t, d] (e.g. ‘go’ produced as [do]) may occur across all target velars, or in a positionally determined fashion, typically affecting onsets of prosodically strong syllables (word-initial and/or stressed). Given these observations, a broad assessment of the pronunciations of velar consonants, irrespective of their position within the word or syllable is likely to yield results which are difficult to interpret, as consonants across positions will display different behaviours. Phon supports the formulation of queries limiting the scope to each relevant context, using a combination of text strings and check boxes within a dedicated interface. For example, the screen shot in Figure 2a illustrates a search based on IPA target forms for velar stops in onset position, while the syllable filter in Figure 2b limits this query to word-medial and -final unstressed syllables.

Figure 2.

Figure 2

Figure 2

(a) Search expression; (b) positional criteria.

In addition to this context, we queried our data for the following additional contexts: Word-initial onsets (irrespective of syllable stress) as well as word-medial onsets of stressed versus unstressed syllables.

The results of each query can be visualised within Phon, and can be saved for further reference. The results can then be exported for post-processing, to which we proceed next.

Data reporting and post-processing

Phon provides an interface to export query results into CSV (comma-separated value) text files which can be opened in spreadsheet applications (e.g. OpenOffice/LibreOffice Calc or Microsoft Excel) or in statistical analysis programs (e.g. SPSS or R). For the sake of illustration, we first focus on E’s diary data, which we processed within OpenOffice Calc version 4.1.

Given the longitudinal nature of the Inkelas corpus, we tracked the child’s productions of velars across the time period covered by the diary. We then generated an aggregated inventory, which consists of every realization of a given target phone across all sessions returned by the query. The screen shot in Table 1 illustrates a (sample) aggregated inventory as generated by Phon. Table 2 represents the same inventory after manual sorting of the data using spreadsheet functions and the addition of a code for the place of articulation of the consonant produced by the child. The more concise representation in Table 3 collapses these data according to age (on a monthly basis) and the place of articulation of the child’s realization of the target consonant (if any) and can be quickly formatted into a bar graph such as that in Figure 3 for data comparison or presentation purposes.

Table 1.

Aggregated inventory (as generated by Phon; age labels formatted).

2;2.24 2;2.28 2;3.4 2;3.5
g ↔ ∅ 0 3 1 2
g ↔ d 10 12 4 3
g ↔ n 2 0 1 0
k ↔ ∅ 2 0 0 1
k ↔ k 2 7 29 17
k ↔ t 39 14 7 4

Table 2.

Column transform and labelling (using spreadsheet functions).

2;2.24 2;2.28 2;3.4 2;3.5
g d COR 10 12 4 3
g n COR 2 0 1 0
k t COR 39 14 7 4
g DEL 0 3 1 2
k DEL 2 0 0 1
k k VEL 2 7 29 17

Table 3.

Data merge (age/labels).

2;2 2;3
Deletion 5 4
Velar 9 46
Coronal 77 19

Figure 3.

Figure 3

Chart generation.

Moving back to the Inkelas corpus, our queries returned data from 199 different recording days, which we grouped into 5 different time periods, monthly between the child’s ages 2;0 and 2;3, with the remainder of the data between 2;4 and 3;9 collapsed together. This decision was based on Inkelas and Rose (2003, 2007), who reported a systematic change in the child’s behaviour between 2;2 and 2;3, also verified in the graphs below. We also grouped the various results according to place of articulation, ignoring differences in voicing or nasality. For example, we grouped all cases of target |k| produced as [t], [t∫], [d], or [n] into a single ‘Coronal’ category. The two other significant categories are ‘Velar’ and ‘Laryngeal’; all other realizations were put in the ‘Other’ category (e.g. marginal realizations of target velars as labials). Using OpenOffice Calc’s chart generation functions, we then extracted bar graphs representing the child’s production patterns across each age range. (In the examples below, the x-axis represents child/age, and the y-axis the number of productions.)

As mentioned above, this classification of the raw data clearly represents the patterns previously documented by Inkelas and Rose (2003, 2007): E abruptly revised his productions of target velars in strong onsets between the ages of 2;2 and 2;3.

We then processed Chiat’s data on three children with atypical phonological development through the same workflow. In the speech of children with typical development, velar consonants appear relatively early, usually by age 3;0 (Stoel-Gammon & Dunn, 1985). Children who have no velar consonants by this age may be labelled as phonologically disordered, especially if other aspects of their phonological development are delayed or deviant. The data from the three children with atypical development were gathered between the ages 5;5 and 5;8, well beyond the age at which most children produce velars across all prosodic positions.

Our analysis of the Chiat data differs slightly from our analysis of the Inkelas corpus, because of methodological differences between the original studies. For each child, the subset of the Chiat corpus we utilise here documents productions recorded during two consecutive sessions, which we tabulated according to the same prosodic contexts as above. The results are displayed in Figure 5: The first two bars represent DI’s production patterns, SB’s are in the middle two bars, with SR’s in the final two.

Figure 5.

Figure 5

Figure 5

Figure 5

DI’s, SB’s, and SR’s productions of target velars across contexts. Prosodically weak environments ((a) codas; (b) unstressed word-medial onsets). (c) Prosodically strong environments (initial onsets; stressed word-medial onsets).

In comparison with the Inkelas corpus, the picture appears to be more mixed. The three children generally follow the same positional pattern of velar fronting as the younger child; however, we also note more variability, most notably concerning SB’s productions in both codas and weak onsets, and in DI’s onsets during the second recording session.

Discussion

The summaries shown above illustrate how the methodological advances achieved within the framework of PhonBank allow us to efficiently compare productions of velar targets across phonological contexts, both within and across corpora. An interesting observation emerging from the comparison above is the larger degree of variability within Chiat’s clinical data when compared to the typical productions documented in the Inkelas corpus. There are a number of possible explanations for this observation, many of which could be explored through additional queries. One factor to consider is the presence (or absence) of velars in the child’s phonetic inventory at a younger age, as it may be the case that the atypically-developing children were transitioning from productions with no velars in any position to velars in prosodically weak positions. In this case, however, we would expect a wider variety of realizations for velar targets. We must also consider the possible effects of intervention; we do not know what types of treatment the children were receiving, which would likely influence their production patterns. Finally, it is important to examine other potential phonetic influences (e.g. McAllister Byun 2012 on the relevance of voicing as a determining factor on the realization of target velars in the speech of a phonologically-disordered child).4

All such working hypotheses, which transcend the scope of the current illustration, can be investigated quickly through additional queries within Phon. This is particularly relevant in that the original studies of velar fronting mentioned above were conducted through painstaking manual tabulations and cross-verifications of complex spreadsheets over several weeks. Using Phon, we were able to obtain our results from the two corpora within a matter of hours, the entirety of the process described above, from data download to chart generation performed within less than a regular day’s work. These facts, while simple at face value, hold tremendous implications in terms of research output, and highlight the central importance of data access, combined with the availability of specialised, computer-assisted methods for corpus mining. Success in these areas can greatly facilitate our investigation of theoretical questions such as those formulated above. These methodological advances also strengthen our basis for the refinement of assessment methods, central to both academic research and clinical applications.

Outlook, and the importance of data sharing

As shown in the example above, Phon now readily facilitates studies of phonological development, with many more assessment methods, including acoustic measurements, now available within the second generation of Phon, released in October 2014, which incorporates support for acoustic measurements extracted through Praat (http://www.fon.hum.uva.nl/praat/). Among other actions, researchers using this recent upgrade are now able to:

  • Generate TextGrids from Phon records and/or import TextGrids previously generated within Praat

  • Set and automate acoustic analyses directly within Phon

  • Visualise acoustic data within Phon and/or export these data for further processing

Through these newly introduced functions, Phon now facilitates a number of tasks related to the study of speech acoustics. It also supplements Praat’s analytic functions in all tasks that relate to corpus data management.

This new generation of Phon also incorporates assessment measures particularly relevant to clinical research (e.g. PMLU, Ingram 2002; ePMLU, Arias & Lleó 2014). In addition, we have engaged in collaborative work to support the analyses available within PROPH+ into Phon. This is a central element of our current development plan, as PROPH+ now works only on out-of-production 32-bit DOS systems.

In parallel, PhonBank is set to expand its current dataset on typical phonological development and to incorporate new corpora in areas such as phonological disorders and multilingual development. Data sharing is indeed a central condition for the long-term success of this community-based effort. As discussed in Rose (2010, 2012) and Rose and MacWhinney (2014), the biggest barrier against progress in the study of phonological development is the current, inadequate level of data sharing. Given that instrumental and quantitative analyses are becoming increasingly important in our characterizations of developmental phenomena, these technological and related empirical advances are absolutely essential. For example, as discussed above, a major issue in clinical phonology is the question of ‘delay’ versus ‘disorder’ in phonological development, as it remains unclear whether a disordered phonological system is developing in a typical way, but at a slower pace, or whether it is qualitatively different from typically developing systems. In a similar way, it is unclear how closely the phonological systems of children with atypical phonology resemble one another.

Data sharing also provides direct benefits to the researcher. For example, the use of CHILDES/PhonBank corpus data must be accompanied with proper references to the original studies published by the corpus contributor(s). In addition to fulfilling basic copyright requirements, this often serves to increase the visibility of a scholar’s works and related scores on citation indexes, in addition to providing further networking opportunities to everyone involved. Note as well that each corpus contributed to the CHILDES/PhonBank database legally counts as a publication, as it is assigned a unique International Standard Book Number (ISBN).

More generally, data sharing should ideally become a condition for the publication of results in refereed venues, as is it across most fields of science. Given that data recorded for a specific purpose can often be used to address additional research questions, significant financial efficiencies can also be obtained through data sharing. Proposals for collecting data on speech and language development are carefully reviewed by institutional review boards and, in some cases, it is difficult to receive approval for data sharing. However, we have observed positive advances in this direction over the last number of years, as institutional and granting regulations now advocate in favour of non-sensitive data release. Audio (as opposed to video) recordings from which names of people, places, and dates are removed generally match this definition. In our view, new research should be designed with up-front provisions for data sharing, for example as part of ethics clearance and informed consent documentation and forms. (See http://talkbank.org/share/irb/ for useful resources and additional information on how to address these issues.)

Finally, the open-source nature of the Phon program readily offers a platform for the easy development of additional functions for data coding and analysis. The inclusion of these functions will supply the entire field of research with as many additional research opportunities.

Conclusion

Together, the PhonBank database and related Phon software program provide modern tools for the study of phonological development and speech disorders. Taking advantage of the empirical and methodological improvements afforded through these initiatives, our research community now has access to unprecedented support for exploring both traditional and novel questions, and for testing empirical predictions made by current models of phonology and acquisition.

We emphasise that these positive advances will continue to the extent that both established and new researchers actively embrace them, in particular through their involvement in the increasingly important area of data sharing. Data is one of our main currencies, and data sharing our most powerful means to increase the wealth of our knowledge. It is on this positive tone that we invite every researcher in phonological development and speech disorders to engage in these compelling and far-reaching initiatives.

Figure 4.

Figure 4

Figure 4

Figure 4

E’s productions of target velars across contexts. Prosodically weak environments ((a) codas; (b) word-medial unstressed onsets). (c) Prosodically strong environments (initial onsets; stressed word-medial onsets).

Acknowledgments

We wish to thank everyone who has supported the PhonBank database and Phon software program thus far. We are also grateful to several ICPLA attendees for their questions and feedback, many of which are already proving useful to the successful continuation of these communal initiatives. Finally, wish to thank two anonymous reviewers for their useful comments on previous versions of this article.

Declaration of Interest

The PhonBank and Phon projects are currently supported through the project ‘A Shared Database for the Study of Phonological Development’ funded by the National Institutes of Health (Grant #2 R01 HD051698-06A1).

Footnotes

1

Of these seminal studies, only those by Smith (1973) and Compton & Streeter (1977) are currently available in PhonBank.

2

We use vertical bars to denote the English adult target phone as perceived and represented by the child, as opposed to what could be construed as the underlying representation in the adult language (e.g. Rose & Inkelas 2011 for a similar notation).

3

Other syllable positions are available, to support research needs based on more elaborate theories of syllabification (e.g. Fikkert, 1994; Goad & Rose, 2004 for discussion in the context of acquisition).

4

Other factors potentially important for interpretation include perceptual issues within certain phonological contexts (e.g. Macken 1980) or lexical exceptions (e.g. Menn & Matthei 1992; Vihman 2014).

Contributor Information

Yvan Rose, Memorial University, St. John’s, Newfoundland, Canada.

Carol Stoel-Gammon, University of Washington, Seattle, WA, USA.

References

  1. Arias J, Lleó C. Rethinking assessment measures of phonological development and their application in bilingual acquisition. Clinical Linguistics & Phonetics. 2014;28(3):153–175. doi: 10.3109/02699206.2013.840681. [DOI] [PubMed] [Google Scholar]
  2. Bernhardt BM, Romonath R, Stemberger J. Children with protracted phonological development. In: Yavaş M, editor. Unusual productions in phonology: Universals and language-specific considerations. Psychology Press; New York: 2014. [Google Scholar]
  3. Bernhardt BM, Stemberger JP. Handbook of phonological development from the perspective of constraint-based nonlinear phonology. Academic Press; San Diego: 1998. [Google Scholar]
  4. Buder EH. Experimental phonology with acoustic phonetic methods: Formant measures from child speech. In: Bernhardt BM, Gilbert J, Ingram D, editors. Proceedings of the UBC international conference on phonological acquisition. Cascadilla Press; Somerville, MA: 1996. pp. 254–265. [Google Scholar]
  5. Chiat S. Why Mikey’s right and my key’s wrong: The significance of stress and word boundaries in a child’s output system. Cognition. 1983;14:275–300. doi: 10.1016/0010-0277(83)90007-0. [DOI] [PubMed] [Google Scholar]
  6. Chiat S. From lexical access to lexical output: What is the problem for children with impaired phonology? In: Yavaş M, editor. First and second language phonology. Singular; San Diego, CA: 1994. pp. 107–133. [Google Scholar]
  7. Compton AJ, Streeter M. Child phonology: Data collection and preliminary analyses. Papers and Reports on Child Language Development. 1977;13:99–109. [Google Scholar]
  8. Dinnsen DA, Green CR, Gierut JA, Morrisette ML. On the anatomy of a chain shift. Journal of Linguistics. 2011;47:275–299. doi: 10.1017/S0022226710000368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dodd B. Differential diagnosis and treatment of children with speech disorder. Whurr; London: 1995. [Google Scholar]
  10. Edwards J, Fourakis M, Beckman ME, Fox RA. Characterizing knowledge deficits in phonological disorders. Journal of Speech, Language, and Hearing Research. 1999;42:169–186. doi: 10.1044/jslhr.4201.169. [DOI] [PubMed] [Google Scholar]
  11. Eilers RE, Oller DK, Benito-Garcia CR. The acquisition of voicing contrasts in Spanish and English learning infants and children: A longitudinal study. Journal of Child Language. 1984;11:313–336. doi: 10.1017/s0305000900005791. [DOI] [PubMed] [Google Scholar]
  12. Ferguson CA, Farwell C. Words and sounds in early language acquisition: Initial consonants in the first fifty words. Language. 1975;51:419–439. [Google Scholar]
  13. Fikkert P. On the acquisition of prosodic structure. Holland Academic Graphics; The Hague: 1994. [Google Scholar]
  14. Freitas MJ. Aquisição da estrutura silábica do Português Europeu (Ph.D. Dissertation) University of Lisbon; 1997. [Google Scholar]
  15. Goad H, Rose Y. Input Elaboration, Head Faithfulness and Evidence for Representation in the Acquisition of Left-edge Clusters in West Germanic. In: Kager R, Pater J, Zonneveld W, editors. Constraints in Phonological Acquisition. Cambridge University Press; Cambridge: 2004. pp. 109–157. [Google Scholar]
  16. Hinkley A. A case of retarded speech development. >Pediatric Seminars. 1915;33:3–48. [Google Scholar]
  17. Ingram D. The measurement of whole-word productions. Journal of Child Language. 2002;29(4):713–733. doi: 10.1017/s0305000902005275. [DOI] [PubMed] [Google Scholar]
  18. Inkelas S. J’s rhymes: A longitudinal case study of language play. Journal of Child Language. 2003;30:557–581. [PubMed] [Google Scholar]
  19. Inkelas S, Rose Y. Velar fronting revisited. In: Beachley B, Brown A, Conlin F, editors. Proceedings of the 27th Annual Boston University Conference on Language Development. Cascadilla Press; Somerville, MA: 2003. pp. 334–345. [Google Scholar]
  20. Inkelas S, Rose Y. Positional neutralization: A case study from child language. Language. 2007;83:707–736. [Google Scholar]
  21. Kehoe M, Stoel-Gammon C, Buder E. Acoustic correlates of stress in young children’s speech. Journal of Speech and Hearing Research. 1995;38:338–350. doi: 10.1044/jshr.3802.338. [DOI] [PubMed] [Google Scholar]
  22. Leopold W. Speech development of a bilingual child. Northwestern University Press; Evanston, IL: 1939-1949. [Google Scholar]
  23. Levelt C. On the acquisition of place. Holland Academic Graphics; The Hague: 1994. [Google Scholar]
  24. Li C, Thompson S. The acquisition of tone in Mandarin-speaking children. Journal of Child Language. 1977;4:185–199. [Google Scholar]
  25. Lieberman P, Ferguson CA. On the development of vowel production in young children. In: Yeni-Komshian G, Kavanagh J, editors. Child phonology. Vol. 1. Academic Press; New York: 1980. pp. 113–142. [Google Scholar]
  26. Long SH, Fey ME, Channell RW. Computerized profiling (PROPH+) 2006 [Google Scholar]
  27. Macken MA. Developmental reorganization of phonology: A hierarchy of basic units of acquisition. Lingua. 1979;49:11–49. [Google Scholar]
  28. Macken MA. The child’s lexical representation: The “puzzle-puddle-pickle” evidence. Journal of Linguistics. 1980;16:1–17. [Google Scholar]
  29. Macken MA, Barton D. The acquisition of the voicing contrast in English: A study of voice onset time in word-initial stop consonants. Journal of Child Language. 1980;7:41–74. doi: 10.1017/s0305000900007029. [DOI] [PubMed] [Google Scholar]
  30. MacWhinney B. The CHILDES project: Tools for analyzing talk. 3rd Lawrence Erlbaum Associates; Mahwah, NJ: 2000. [Google Scholar]
  31. Masterson J, Bernhardt B. Computerized articulation and phonological evaluation system. AGS Publishing/Pearson Assessments; San Antonio, TX: 2001. [Google Scholar]
  32. McAllister Byun T. Positional velar fronting: An updated articulatory account. Journal of Child Language. 2012;39:1043–1076. doi: 10.1017/S0305000911000468. [DOI] [PubMed] [Google Scholar]
  33. McLeod S. The international guide to speech acquisition. Thomson Delmar Learning; Clifton Park, NY: 2007. [Google Scholar]
  34. Menn L, Matthei E. The “two-lexicon” approach of child phonology: Looking back, looking ahead. In: Ferguson CA, Menn L, Stoel-Gammon C, editors. Phonological development: Models, research, implications. York Press; Timonium, MD: 1992. pp. 211–248. [Google Scholar]
  35. Oller DK, Delgado R. Logical international phonetic programs. Intelligent Hearing Systems; Miami: 1990. [Google Scholar]
  36. Poole I. Genetic development of articulation of consonant sounds in speech. Elementary English Review. 1934;11:159–161. [Google Scholar]
  37. Rose Y. Headedness and prosodic licensing in the L1 acquisition of phonology (Ph.D. Dissertation) McGill University; 2000. [Google Scholar]
  38. Rose Y. ChildPhon: A database solution for the study of child phonology. In: Beachley B, Brown A, Conlin F, editors. Proceedings of the 27th Annual Boston University conference on language development. Cascadilla Press; Somerville, MA: 2003. pp. 674–685. [Google Scholar]
  39. Rose Y. The PhonBank initiative and second language phonological development: Innovative tools for research and data sharing. In: Henderson A, editor. English pronunciation: Issues and practices (EPIP) - Proceedings of the first international conference. 2010. pp. 223–241. Chambéry: Université de Savoie. [Google Scholar]
  40. Rose Y. Multilingual phonological corpus analysis: The tools behind the PhonBank project. In: Schmidt T, Wörner K, editors. Multilingual corpora and multilingual corpus analysis. John Benjamins Publishing Company; Amsterdam: 2012. pp. 365–381. [Google Scholar]
  41. Rose Y. Corpus-based investigations of child phonological development: Formal and practical considerations. In: Durand J, Gut U, Kristoffersen G, editors. The Oxford handbook of corpus phonology. Oxford University Press; Oxford: 2014. pp. 265–285. [Google Scholar]
  42. Rose Y, Inkelas S. The interpretation of phonological patterns in first language acquisition. In: Ewen CJ, Hume E, van Oostendorp M, Rice K, editors. The Blackwell Companion to Phonology. Wiley-Blackwell; Malden, MA: 2011. pp. 2414–2438. [Google Scholar]
  43. Rose Y, MacWhinney B. The PhonBank initiative. In: Durand J, Gut U, Kristoffersen G, editors. The Oxford Handbook of Corpus Phonology. Oxford University Press; Oxford: 2014. pp. 380–401. [Google Scholar]
  44. Rose Y, MacWhinney B, Byrne R, Hedlund G, Maddocks K, O’Brien P, Wareham T. Introducing Phon: A software solution for the study of phonological acquisition. In: Bamman D, Magnitskaia T, Zaller C, editors. Proceedings of the 30th Annual Boston University Conference on Language Development. Cascadilla Press; Somerville, MA: 2006. pp. 489–500. [PMC free article] [PubMed] [Google Scholar]
  45. Shriberg LD. PEPPER: Programs to examine phonetic and phonologic evaluation records. Lawrence Erlbaum; Hillsdale, NJ: 1986. [Google Scholar]
  46. Smit A. Phonologic error distribution in the Iowa-Nebraska articulation norms project: Consonant singletons. Journal of Speech and Hearing Research. 1993;36:533–547. doi: 10.1044/jshr.3603.533. [DOI] [PubMed] [Google Scholar]
  47. Smit A, Hand L, Freilinger J, Bernthal J, Bird A. The Iowa articulation norms project and its Nebraska replication. Journal of Speech and Hearing Disorders. 1990;55:779–798. doi: 10.1044/jshd.5504.779. [DOI] [PubMed] [Google Scholar]
  48. Smith NV. The acquisition of phonology: A case study. Cambridge Press; Cambridge, U.K.: 1973. [Google Scholar]
  49. Stampe DL. The acquisition of phonetic representation. In: Binnick RI, editor. Papers from the 5th Regional Meeting of the Chicago Linguistic Society. Chicago Linguistic Society; Chicago: 1969. pp. 433–444. [Google Scholar]
  50. Stoel-Gammon C. On the acquisition of velars in English. In: Bernhardt BH, Gilbert J, Ingram D, editors. Proceedings of the UBC international conference on phonological acquisition. Cascadilla Press; Somerville, MA: 1996. pp. 201–214. [Google Scholar]
  51. Stoel-Gammon C, Bernhardt BM. Child phonology, phonological theories, and clinical phonology: Reviewing the history of our field. In: Peter B, MacLeod AN, editors. Comprehensive perspectives on speech sound development and disorders: Pathways from linguistic theory to clinical practice. Nova Publishers; New York: 2013. pp. 3–29. [Google Scholar]
  52. Stoel-Gammon C, Cooper JA. Patterns of early lexical and phonological development. Journal of Child Language. 1984;11:247–271. doi: 10.1017/s0305000900005766. [DOI] [PubMed] [Google Scholar]
  53. Stoel-Gammon C, Dunn C. Normal and disordered phonology in children. University Park Press; Baltimore, MD: 1985. [Google Scholar]
  54. Templin M. Institute of Child Welfare Monographs. Vol. 26. University of Minnesota Press; Minneapolis: 1957. Certain language skills in children: Their development and interrelationships. [Google Scholar]
  55. Velten HV. The growth of phonemic and lexical patterns in infant language. Language. 1943;19:231–292. [Google Scholar]
  56. Vihman MM. Phonological development: The first two years. Second Edition Wiley-Blackwell; Hoboken: 2014. [Google Scholar]
  57. Wellman B, Case I, Mengert I, Bradbury D. Speech sounds of young children. University of Iowa Studies in Child Welfare. 1931;5:1–82. [Google Scholar]
  58. Williams AL, McLeod S, McCauley RJ. Interventions for speech sound disorders. Brookes Publishing Company; Baltimore, MD: 2010. [Google Scholar]

RESOURCES