Abstract
In this article, we review the advantages of language sample analysis (LSA) and explain how clinicians can make the process of LSA faster, easier, more accurate, and more insightful than LSA done “by hand” by using free, available software programs such as Computerized Language Analysis (CLAN). We demonstrate the utility of CLAN analysis in studying the expressive language of a very large cohort of 24-month-old toddlers tracked in a recent longitudinal study; toddlers in particular are the most likely group to receive LSA by clinicians, but existing reference “norms” for this population are based on fairly small cohorts of children. Finally, we demonstrate how a CLAN utility such as KidEval can now extract potential normative data from the very large number of corpora now available for English and other languages at the Child Language Data Exchange System project site. Most of the LSA measures that we studied appear to show developmental profiles suggesting that they may be of specifically higher value for children at certain ages, because they do not show an even developmental trajectory from 2 to 7 years of age.
Keywords: Language sample analysis, transcription, MLU, IPSyn, DSS
For the assessment of child language abilities, language sample analysis (LSA) provides the very high degree of ecological validity and “authenticity” mandated by current educational policies.1 It supplements standardized assessment by providing a snapshot of the child’s language “in action.” More critically, it provides baseline insights into the child’s strengths and weaknesses across the range of language skills necessary for age-appropriate communication, from vocabulary to syntax to pragmatics; these skills can then be tracked in natural contexts over time.2 LSA provides the clinician with tangible goals for therapy unlikely to emerge from results of standardized testing. These results then can be prioritized.3 In the absence of norm-referenced assessments for children speaking nonmainstream dialects or English as a second language, LSA also can provide less biased and more informative information about the child’s expressive language skills and needs.4,5
However, there are several practical issues in using LSA for clinical purposes that tend to diminish the frequency (and depth) of its use in actual clinical practice.5 Although the self-reported use of LSA has been steadily climbing in reports from 1993 to 2000,6–8 most speech-language pathologists (SLPs) report compiling relatively short samples in real-time notation and using them primarily to compute mean length of utterance (MLU),2 despite the fact that it is not a good stand-alone measure for identifying language impairment and does not deeply inform the child’s grammatical development, let alone proficiency with vocabulary or other aspects of expressive language.8 Fewer than one-third of respondents in one study computed any additional measures, the most popular being Developmental Sentence Score (DSS).9 Why might this be?
It is well acknowledged that good LSA can be quite time-consuming.1 Some studies have estimated that it takes up to 8 hours of training and 45 minutes to an hour of work after a transcript has been generated to compute DSS.10,11 One study estimated that it takes more than 30 minutes per sample following transcription to compute the Index of Productive Syntax (IPSyn).12,13 It also seems to one of us, after a long career as a university instructor, that most LSA measures, even the time-honored MLU, are quite prone to error. (Quick quiz: how many morphemes are there in the word upstairs? See the end of the article for our recommended answer.) It is obviously difficult to use same worksheet to compute multiple linguistic measures, and it is a waste of time to transfer handwritten scribbles of what the child said to most scoring protocols. Thus, even by self-report, LSA is not used by some clinicians and is not deeply exploited by most to inform child language assessment. Those who do LSA often use a sample that is much too short to meet the intended sample size for the measures that are computed,14 sometimes 50 to 75% fewer utterances than recommended.
As this issue emphasizes, and as other authors have noted over the years, computer-assisted LSA can solve all of the problems listed previously (time, accuracy, and depth of analysis),2,15–17 but is not very frequently used in practice. A recent study estimated that only 12.5% of SLPs in Australia use computer-assisted transcription and analysis,14 and there is little to suggest that their counterparts in other countries use such procedures at a significantly higher rate.2 As we will suggest, use of computers to aid in sample transcription and analysis, particularly using free utilities such as CLAN that additionally link the sample to an audio- or video-recorded record of the child’s actual speech sample, can greatly improve the speed, accuracy, and informativeness of LSA, and by extension, clinical assessment, therapy planning, and measurement of therapeutic progress.
In this article, we will illustrate the utility of LSA conducted using CLAN and the KidEval utility using two separate data sets. The first is a large cohort of very young children followed as part of a single research study. The second is a review of data obtained from the Child Language Data Exchange System (CHILDES) project archive that we use to evaluate the potential utility of certain LSA measures at particular ages. It is of separate concern that many LSA measures lack robust “normative” or comparison reference values, and the CHILDES project can immeasurably augment what we currently know about typical performance as measured by MLU, DSS, IPSyn, and other measures, to great clinical benefit.
KIDEVAL IN ACTION ON A LARGE DATA SET: HOW CLAN UTILITIES CAN GREATLY SPEED AND IMPROVE YOUR CLINICAL PRACTICE
In this section, we summarize how we and our colleagues used the relatively new KidEval utility in assessing the dyadic interactions of a large cohort of infants and their mothers (n = 125), who were sampled at 7, 10, 11, 18, and 24 months as part of a larger study examining possible predictors of later child language skills.18 The scope of the project was quite daunting: we had ~125 families and five play sessions, with both child and mother verbal interactions a focus of analysis. This produced roughly 1,250 15- to 30-minute transcripts. Given traditional estimates of time required per transcript to compute multiple measures, we projected a total time commitment of 6,250 hours to finish this part of the project, and the granting agency did not, in fact, predict that we would obtain any findings during the actual grant time window. However, they were wrong. This is because the first benefit of CLAN to transcribe and analyze LSAs is a huge savings in time required to generate the actual transcript. CLAN media linkage (see MacWhinney, this issue) cuts down the time required to make an accurate transcript of the child’s sample dramatically. When a clinician uses the transcription utilities of CLAN and the Walker Controller playback function, she can cut down on the time required to generate the first level transcript by about three-quarters. We additionally have ongoing work to demonstrate that it will also be a more accurate transcription.
Next, we used the automated MOR function to assign and disambiguate grammatical descriptions of all the words in these 1,250 transcripts.
A resulting transcript looked like this excerpt:
*CHI: mommy this xxx .
%mor: n|mommy det|this .
*CHI: these shoes on .
%mor: det|these n|shoe-PL adv|on .
*MOT: ok I can get her shoes on .
%mor: ?|ok pro:sub|I mod|can v|get pro:poss: det|her n|shoe-PL adv|on .
*CHI: +< tiger .
%mor: n|tiger .
*MOT: is that a tiger ?
%mor: aux|be&3S rel|that det|a n|tiger ?
*MOT: or is that a zebra ?
%mor: coord|or aux|be&3S rel|that det|a n| zebra ?
*CHI: zebra .
Although most clinicians will run analyses on a single child’s case at a time, we would like to note that the free CLAN utilities are so powerful (and run on virtually any laptop) that this process analyzed more than 1,000 records in less than a minute. Following this simple step, a single command, KidEval, then generated spreadsheet output of each child (and parent’s) language features on more than two dozen variables (see MacWhinney, this issue, for a listing of output variables).
Specifically, we used the command:
where Kideval is the program name, +leng specifies that the language that we wanted to analyze was English (see Brundage et al, this issue, for issues in computing other languages and children who are bilingual) and +t* CHI specifies the speaker that we wanted to analyze. Minutes later, we had data for 121 children (a few of our 125 children were missing some information). We will share this data with you now, as we compare the children’s performance with available reference values for LSA measures suitable for use with 2-year-old children.
LANGUAGE SAMPLE ANALYSIS NORMS—SHOULD WE BE CONCERNED, AND CAN WE IMPROVE OUR CONFIDENCE IN WHAT THEY TELL US?
In reviewing the literature on clinical use of language samples, LSA appears to be used most often when standardized test data cannot be obtained or are difficult to interpret.2 It appears to be particularly favored for assessment of very young children. However, there are conceptual issues in LSA for children at 24 months of age, which was the outcome measurement period for our study toddlers. (We note that Systematic Analysis of Language Transcripts [SALT] contains self-contained norms for this age group available if one purchases and uses SALT to perform LSA.19,20) Many of the normative or reference values are based on relatively few cases at lowest age ranges. For example, for MLU, a recent report included 37 children at 24 months.21 Miller and Chapman,22 the classic reference for MLU in clinical practice, reported on only 16 children in this age bracket, and the largest recent study to report expected values for MLU (as well as number of different words [NDWs]) had only 17 typically developing and six late-talking participants in the age bracket from 2;6 (years; months) to 2;11.23 These are not extremely large populations on which to generalize impressions of a child’s linguistic profile, which is why some researchers have expressed serious concerns about using MLU to identify whether a child is typically developing or impaired (also see Eisenberg and Guo, this issue).8
For type-token ratio (TTR) or NDW, the situation is similar, because most of the studies referenced previously also reported these measures, and few additional studies are available. For DSS and IPSyn, reference cohorts are similarly restricted. DSS reference tables report on only 10 children from 24 to 27 months of age.24 In this age range, IPSyn provides data for only 15 children.12
This article is not meant to contribute normative data on these LSA measures at this time, although we are working with the CHILDES project archive (see discussion that follows) to improve the generalizability of these and other LSA measures. However, we can illustrate how the children in our study performed on these measures (all were typically developing, and as is so often the case in research reports, from families of relatively high socioeconomic status).
In general, data from this sample show values for MLU, DSS, and IPSyn that are consistent with prior, smaller samples. (See Figs. 1, 2, and 3.)
These data suggest that KidEval is a useful clinical tool for the assessment of spontaneous language data in 24-month-old children, a group for which few robust measures of LSA performance exist. Our results are comparable, and computed quickly and automatically, to data derived from much more time-intensive manual coding. However, we do note that the unaffected sample of Rice et al did achieve higher MLU values than our and other comparison cohorts.23
We also computed correlations among LSA values and standardized test outcomes at 24 months of age. We obtained significant but weak correlations that probably justify larger studies of the available measures for toddlers, and their construct validity. For instance, we correlated the children’s MLU with IPSyn and DSS values; correlations were significant. This should not be surprising, because both IPSyn and DSS award points for various syntactic elements, and utterances with longer MLU values have greater opportunity to contain such features. However, it is perhaps surprising that the actual correlations between MLU and DSS are relatively low, even though they reach significance given our large sample size. (See Figs. 4 and 5.) Notably, DSS correlates much more poorly with MLU than does IPSyn, in all likelihood because fewer utterances at 24 months meet DSS eligibility standards, and because very early utterances do not achieve DSS sentence points. Likewise, IPSyn and DSS do not correlate well with one another, probably for the same reasons, indicating that they are not interchangeable assessments of a toddler’s language sample.
IMPROVING NORMS FOR TODDLER (AND OLDER CHILDREN’S) LANGUAGE SAMPLE ANALYSIS
Our recent study suggests that, at young ages in English, some potential LSA measures do not appear to be measuring the same constructs. Clearly, a single LSA (especially MLU, which has been critiqued extensively) cannot provide the whole picture,8 and doing multiple LSAs is much too time-consuming, unless more researchers and therapist use computer-assisted analysis to generate more data responsive to these concerns. We are encouraged by the fact that our large sample of toddlers does resemble smaller reference study reports. However, we believe that psychometric evaluation of confidence intervals around mean values will be necessary to improve the robustness of measures such as DSS and IPSyn for distinguishing between typical and atypical performance, even though we do have some data to inform this decision-making process.
MOVING THE CHILD LANGUAGE DATA EXCHANGE SYSTEM ARCHIVE AND CLAN UTILITIES TO THE NEXT LEVEL OF SUPPORT FOR SPEECH LANGAUGE PATHOLOGISTS
We are currently working to move the CHILDES project archive from a repository and resource for researchers to a dynamic source of reference data that can be used to assess and treat children across the world’s languages. To this end, the TalkBank project is working to do the following things that should greatly enhance clinician’s abilities to apply LSA to a broader range of children more easily and insightfully:
Increase the number of languages that can beautomatically parsed and reported using CLAN utilities. As other contributions to this issue note, the free CLAN utilities now have grammars for a large number of languages; this number is growing yearly. Thus, clinicians working in Spanish, French, German, Dutch, Mandarin, Cantonese, and other frequently used languages now have resources to perform accurate LSA of languages other than English.
Deploy existing corpora in the CHILDES archive to improve “norms” for commonly used LSA outcome measures.
We are currently in the process of completing this second ambitious task. Recently, we completed KidEval analysis of a large set of corpora (n = 630 children), all of whom spoke North American English, and all of whom were engaged in free play with their parents (a similar context). Results have been fairly interesting, and we provide only a brief taste of our findings here. First, we note with some delight that Roger Brown’s old advisement that MLU is most useful when the child is fairly young or up until the point that MLU reaches a value of roughly 4.0 appears to be validated by this large sample, where MLU plateaus for our children past these values and ages.25 (See Fig. 6.)
We are also noticing that IPSyn and DSS appear to be differentially sensitive to changes in age, as do two alternative ways of computing lexical (vocabulary) diversity, TTR and Vocabulary Diversity (VocD),26,27 a computer algorithm less sensitive to variations in sample size. CLAN reports both in the KidEval utility. (See Figs. 7 and 8.) We note that, similar to our findings reported earlier for the children in the Newman et al study, IPSyn and DSS appear to measure different things, particularly across the broader age span covered by the archive data. For example, IPSyn appears more sensitive to growth across very early childhood, whereas DSS appears to be more sensitive at older ages, perhaps as a function of the “sentence point” that provides more credit when a sentence is considered grammatical, an important construct in distinguishing typical from atypical development as children mature (see Eisenberg and Guo, this issue).
TTR and VocD display a somewhat more difficult profile to evaluate. VocD appears to track better with age across this sample than does TTR. Currently, VocD is reported in several research reports,28–31 but has no published norms; we hope to rectify this shortly. TTR has long been known to be vulnerable to several issues, particularly sample size; whether VocD can improve on this to inform clinical assessment remains to be seen. Extending norms and evaluating the utility of various LSA measures is an ongoing initiative of great potential value to SLPs. We also note that there are no robust norms for LSA conducted with bilingual or English language learning children, a major clinical cohort where LSA is used, given the parallel lack of standardized assessment norms for this population.4
TAKE-AWAY MESSAGES
LSA is an important tool that one can use to appraise and understand child language ability in an ecologically valid way. Having said this, it is underutilized for several reasons, primarily because when done “by hand,” it is very time-consuming. Because it is time-consuming, we know that clinicians do not fully exploit what can be learned from LSA, transcribing very short samples, and primarily deriving only a few measures such as MLU, which are not maximally informative for assessment, therapy planning, or outcome measurement. Media-linked transcription, such as is available using the free CLAN utilities available through TalkBank/CHILDES, speeds transcription of a child’s language sample by an incredible amount. Once completed, this transcript can be used to generate a plethora of useful, accurately computed measures of child language performance. These in turn can be used both to augment other assessment measures to make diagnostic and eligibility judgments, as well as prioritize the most functional targets for intervention. Periodic LSA can also judge the child’s progress in language growth, using the original LSA as a baseline measure. As other articles in this issue note, the child’s transcript can be paired with other utilities, such as the free utility Phon available at childes/psy.cmu.edu/phon/ for phonological analysis, with little additional effort. CLAN grammatical parsers can also enable clinicians to evaluate bilingual children speaking a variety of languages, a unique benefit when working with a growing and challenging demographic in our profession.
When asked if they would use computer-assisted programs to analyze language samples more quickly and more informatively, the majority in a recent survey agreed that they would, if they could identify how to accomplish this.14 We were intrigued to read of a successful pilot program to use SLP assistants/aides to generate transcripts and measures using SALT,19,20 another LSA software program. Thus, we are optimistic that issues such as this, Web tutorials, and the continued growth of programs available to SLPs will help clinicians to exploit the potential of LSA more fully. In sum, the CHILDES/TalkBank utilities are an invaluable tool in an SLP’s repertoire of clinical resources—free, time-saving, and computationally powerful. Power up your laptop and take computer-assisted LSA for a spin—we predict that you will become a fast and loyal fan.
POSTSCRIPT
For those wanting an answer to our MLU quiz, we would suggest that the answer is one. Why? The s in upstairs is not a plural form, and the prefix up- is not productive (i.e., there is only upstairs and downstairs, rather than a larger set of options). We point this out only because large classes of students and professionals spend a lot of time considering problems such as this—which are better solved using grammatical parsers such as those found in CLAN, which has an average accuracy rating of 94%, which definitely exceeds the mean for many class projects on MLU. We do not use an abacus to solve major math problems any more—we use calculators. CLAN is just a very smart calculator. It remains the clinician’s job to interpret the child’s LSA profiles, but we would suggest we have reached the point where LSA grammatical parsing and computation can and should be done using software.
Learning Outcomes:
As a result of this activity, the reader will be able to (1) summarize the advantages and disadvantages of computer-assisted and traditional language sample analysis (LSA); (2) explain LSA measures generated by the free CLAN utilities and how they may be used in assessment, therapy planning, and progress measurement; and (3) discuss the need for larger normative samples in interpreting age-appropriate performance on a range of LSA measures.
Footnotes
Automating Child Speech, Language and Fluency Analysis; Guest Editor, Brian MacWhinney, Ph.D.
REFERENCES
- 1.Overton S, Wren Y. Outcome measurement using naturalistic language samples: a feasibility pilot study using language transcription software and speech and language therapy assistants. Child Lang Teach Ther 2014;30(2):221–229 [Google Scholar]
- 2.Price LH, Hendricks S, Cook C. Incorporating computer-aided language sample analysis into clinical practice. Lang Speech Hear Serv Sch 2010; 41(2):206–222 [DOI] [PubMed] [Google Scholar]
- 3.Overton SW Y. Outcome measurement using naturalistic language samples: a feasibility pilot study using language transcription software and speech and language therapy assistants. Child Lang Teach Ther 2014;30(3):221–229 [Google Scholar]
- 4.Caesar LG, Kohler PD. The state of school-based bilingual assessment: actual practice versus recommended guidelines. Lang Speech Hear Serv Sch 2007;38(3):190–200 [DOI] [PubMed] [Google Scholar]
- 5.Gorman K, et al. Automated morphological analysis of clinical language samples. NAACL HLT 2015:108. [PMC free article] [PubMed] [Google Scholar]
- 6.Hux K, et al. Language sampling practices: a survey of nine states. Lang Speech Hear Serv Sch 1993; 24(2):84–91 [Google Scholar]
- 7.Kemp K, Klee T. Clinical language sampling practices: results of a survey of speech-language pathologists in the United States. Child Lang Teach Ther 1997;13(2):161–176 [Google Scholar]
- 8.Eisenberg SL, Fersko TM, Lundgren C. The use of MLU for identifying language impairment in preschool children: a review. Am J Speech Lang Pathol 2001;10(4):323 [Google Scholar]
- 9.Lee LL, Canter SM. Developmental sentence scoring: a clinical procedure for estimating syntactic development in children’s spontaneous speech. J Speech Hear Disord 1971;36(3):315–340 [DOI] [PubMed] [Google Scholar]
- 10.Long SH, Channell RW. Accuracy of four language analysis procedures performed automatically. Am J Speech Lang Pathol 2001;10(2):180–188 [Google Scholar]
- 11.Cochran PS, Masterson JJ. Not using a computer in language assessment/intervention in defense of the reluctant clinician. Lang Speech Hear Serv Sch 1995;26(3):213–222 [Google Scholar]
- 12.Scarborough HS. Index of productive syntax. Appl Psycholinguist 1990;11(1):1–22 [Google Scholar]
- 13.Hassanali KN, Liu Y, Iglesias A, Solorio T, Dollaghan C. Automatic generation of the index of productive syntax for child language transcripts. Behav Res Methods 2014;46(1):254–262 [DOI] [PubMed] [Google Scholar]
- 14.Westerveld MF, Claessen M. Clinician survey of language sampling practices in Australia. Int J Speech-Language Pathol 2014;16(3):242–249 [DOI] [PubMed] [Google Scholar]
- 15.Heilmann JJ. Myths and realities of language sample analysis. SIG 1 Perspect Lang Learn Educ 2010;17(1):4–8 [Google Scholar]
- 16.Evans JL, Miller J. Language sample analysis in the 21st century. Semin Speech Lang 1999;20(2): 101–115 [DOI] [PubMed] [Google Scholar]
- 17.Miller JF. Focus on schools. Having trouble monitoring language intervention? Language sample analysis is the solution. ASHA Leader 2001;6(16):5 [Google Scholar]
- 18.Newman RS, Rowe ML, Bernstein Ratner N. Input and uptake at 7 months predicts toddler vocabulary: the role of child-directed speech and infant processing skills in language development. J Child Lang 2015;24:1–16 [DOI] [PubMed] [Google Scholar]
- 19.Miller JF, et al. Assessing Language Production Using SALT Software. Middleton, WI: SALT Software LLC; 2011 [Google Scholar]
- 20.Miller JF, Iglesias A, Rojas R. SALT 2010 Bilingual S/E Version: A Tool for Assessing the Language Production of Bilingual (Spanish/English) Children. Baltimore: Brookes Publishing Company; 2010 [Google Scholar]
- 21.Rispoli M, Hadley P, Holt J. Stalls and revisions: a developmental perspective on sentence production. J Speech Lang Hear Res 2008;51(4):953–966 [DOI] [PubMed] [Google Scholar]
- 22.Miller JF, Chapman RS. The relation between age and mean length of utterance in morphemes. J Speech Hear Res 1981;24(2):154–161 [DOI] [PubMed] [Google Scholar]
- 23.Rice ML, Smolik F, Perpich D, Thompson T, Rytting N, Blossom M. Mean length of utterance levels in 6-month intervals for children 3 to 9 years with and without language impairments. J Speech Lang Hear Res 2010;53(2): 333–349 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lee LL. Developmental Sentence Analysis: A Grammatical Assessment Procedure for Speech and Language Clinicians. Evanston, IL: Northwestern University Press; 1974 [Google Scholar]
- 25.Brown R. A First Language: The Early Stages. 1973. Cambridge, MA: Harvard University Press [Google Scholar]
- 26.Malvern D, Richards B. Investigating accommodation in language proficiency interviews using a new measure of lexical diversity. Lang Test 2002; 19(1):85–104 [Google Scholar]
- 27.McKee G, Malvern D, Richards B. Measuring vocabulary diversity using dedicated software. Lit Linguist Comput 2000;15(3):323–338 [Google Scholar]
- 28.Dn Pilar, et al. Developmental trends in lexical diversity. Appl Linguist 2004;25(2):220–242 [Google Scholar]
- 29.Silverman S, Ratner NB. Measuring lexical diversity in children who stutter: application of vocd. J Fluency Disord 2002;27(4):289–303, quiz 303–304 [DOI] [PubMed] [Google Scholar]
- 30.Owen AJ, Leonard LB. Lexical diversity in the spontaneous speech of children with specific language impairment: application of D. J Speech Lang Hear Res 2002;45(5):927–937 [DOI] [PubMed] [Google Scholar]
- 31.Wong AMY, Klee T, Stokes SF, Fletcher P, Leonard LB. Differentiating Cantonese-speaking preschool children with and without SLI using MLU and lexical diversity (D). J Speech Lang Hear Res 2010;53(3):794–799 [DOI] [PMC free article] [PubMed] [Google Scholar]