Skip to main content
Journal of Speech, Language, and Hearing Research : JSLHR logoLink to Journal of Speech, Language, and Hearing Research : JSLHR
. 2025 Aug 18;68(9):4337–4357. doi: 10.1044/2025_JSLHR-25-00076

Development and Validation of Nomogram-Based Prediction Models for Developmental Language Disorder in Bilingual Children

Joseph Hin Yan Lam a,, Michelle N Ramos b, Jiali Wang c, Aquiles Iglesias d, Elizabeth D Peña a, Lisa M Bedore e, Ronald B Gillam f
PMCID: PMC12453021  PMID: 40824863

Abstract

Purpose:

The challenges of language assessment in bilinguals include a lack of assessment tools and bilingual speech-language pathology services. Additionally, the weighting of subtests in standardized tests has not been empirically explored to maximize sensitivity and specificity. Language exposure might also inform the decision to diagnose bilinguals with developmental language disorder (DLD). This study uses a nomogram, a user-friendly prediction tool, to explore the weighting between bilingual language tasks for diagnostic accuracy and single-language assessment feasibility with language exposure information.

Method:

Four hundred nine Spanish–English bilingual children aged 4–7 years completed a standardized bilingual language assessment with six subtests, and caregivers reported language exposure. Additional 326 and 296 Spanish–English bilingual children completed the Spanish or English portions of the assessment, respectively. Nomogram-based prediction models were constructed to evaluate the probability of DLD. Classification accuracy, calibration curves, and decision curve analysis were reported.

Results:

The nomogram for the bilingual language assessment was generalizable to another sample, with varying subtest weightings. The addition of bilingual exposure did not improve the classification accuracy of the bilingual assessment, but it was an important variable when assessment was in one language only. Spanish-only assessment with bilingual exposure achieved the minimum acceptable sensitivity and specificity, whereas English-only assessment with bilingual exposure did not.

Conclusions:

This study suggests that subtest weighting ratios can help classify Spanish–English bilingual children with DLD. The feasibility of single-language assessment and the clinical use of nomograms are discussed.

Supplemental Material:

https://doi.org/10.23641/asha.29874254


More than one third of school-age children have exposure to a language other than English at home (U.S. Census Bureau, 2019). Accurately diagnosing developmental language disorder (DLD) in bilingual children has been an ongoing challenge (Morgan et al., 2015; Sullivan & Bal, 2013) due to a lack of appropriate standardized language assessment tools and a shortage of bilingual speech-language pathologists (SLPs; American Speech-Language-Hearing Association [ASHA], n.d., 2023; Guiberson & Vigil, 2021). Currently available standardized tests may be improved by using weighted scoring of subtests or a battery of tests. A single-language approach using only English tests for diagnosis may also be feasible when language exposure is considered. The current study explores the classification accuracy of each of these approaches when using a standardized bilingual language assessment with young Spanish–English bilinguals. This study examined the classification accuracy of these approaches using a standardized bilingual language assessment for young Spanish–English bilinguals and employed predictive modeling with nomogram as a clinician-friendly tool for visualization and implementation.

Standardized Tests and Weighting

Standardized tests are commonly used by SLPs for diagnosing DLD in children (Fulcher-Rood et al., 2018). School-based SLPs view the use of norm-referenced, standardized assessments as effective and efficient (Denman et al., 2021; Ogiela & Montzka, 2021). Standardized language assessments typically consist of multiple subtests measuring different aspects of language skills. Composite and index scores are then calculated, typically by summing or averaging the subtests, implying an equal weighting among subtests. For example, the Clinical Evaluation of Language Fundamentals–Fifth Edition, which is a commonly used standardized assessment, has 16 subtests that can compute six index scores and a core language score (Wiig et al., 2013). These index scores are based on the sum of the scaled scores of different subtests. Thus, each subtest contributes equally to the index score. Standardized language assessments therefore provide clinicians with a profile of participants' language abilities, highlighting strengths and weaknesses and supporting diagnostic decision-making.

Equal weighting of all subtests may not be optimal. Prior research has shown that some tasks, such as sentence repetition, semantics tasks, and narrative skills, have good diagnostic accuracy for bilingual children (Boerma et al., 2016; Peña, Bedore, & Kester, 2016; Pratt et al., 2021). On the other hand, vocabulary tasks, including picture naming, might misclassify children with and without DLD (Gray et al., 1999; Rose et al., 2022). Such vocabulary tasks are often included in standardized tests along with tasks such as sentence repetition. While all these tasks provide information on the language profile, they might not be equally important in identifying language impairment for a diagnostic decision. Plante and Vance (1994) suggest 80% sensitivity and 80% specificity as the minimum requirement for acceptable discrimination validity. Introducing a weighting scheme might help maximize the overall diagnostic accuracy by allowing more weighting on the tasks that individually have higher sensitivity and specificity. However, differential weighting between subtests and its impact on classification of DLD have not yet been empirically explored. Therefore, the first research question aims to examine the weighting between different language tasks that maximize diagnostic accuracy.

Single-Language Assessment for Bilinguals

Given the lack of adequate bilingual services and assessments needed to ensure the quality of care and improve health equity for culturally and linguistically diverse populations (Office of Minority Health, 2016), it is critical to identify assessment methods that can be delivered in a single language but still provide valid clinical information. Assessing bilingual children in both of their languages is considered best practice and provides the most accurate diagnosis of DLD (Arias & Friberg, 2017; ASHA, 2019; Peña, Bedore, & Kester, 2016). However, only 8% of SLPs in the United States are able to provide bilingual services (ASHA, 2023; Guiberson & Vigil, 2021), and interpreters and translators are similarly challenging to access in many locations (Kritikos, 2003; Santhanam & Parveen, 2018). Such shortages are also not limited to the United States (e.g., Hassan et al., 2024; Jordaan, 2008; Williams & McLeod, 2012).

One of primary sources of test bias lies in the comparison of bilingual to monolingual standards. One approach for addressing this while adapting standardized monolingual assessment procedures for bilinguals has been to empirically derive cutoffs for bilinguals (e.g., Altman et al., 2021; Gillam et al., 2013). This approach uses the range of bilingual performance to identify a cutoff that accurately differentiates typical and impaired language. Another approach has been to tailor test items according to levels of exposure to the target language (e.g., Bedore et al., 2018; Jasso et al., 2020; Pratt et al., 2024). Rather than administering all items in a subtest to all children, a specified subset of those items is given according to the child's level of English exposure. The proposed analysis of single-language assessment data includes elements of both of these approaches: adjusting the weighting of assessment measures according to language exposure (i.e., “tailoring” the weighting rather than the items within tasks) and empirically deriving a cutoff for the composite based on bilingual performance data.

Language Exposure in Bilingual Language Assessment

It is important to consider exposure when determining the weighting of assessment variables because the difficulty of a given language task—and, thus, its diagnostic potential—varies with language experience and proficiency (Paradis, 2010; Pratt et al., 2024). A bilingual's relative exposure to each of their languages has been quantified in different ways across previous studies (e.g., Bedore et al., 2012; Thordardottir, 2015), but despite measurement differences, linguistic performance is consistently and closely associated with exposure to the target language. For this reason, it is important to account for the range of bilingual experience when using their performance on language tasks for classification purposes.

It is especially critical to account for exposure when employing a single-language approach for differentiating DLD. In contrast with a dual-language assessment, which allows the child's best performance across languages to represent their overall language ability, a single-language approach relies on accuracy in only one language, which is highly influenced by exposure. Limited experience is very difficult to disentangle from disorder based on accuracy alone, largely because of the overlap in grammatical characteristics of typical second-language acquisition and language impairment. A single-language approach must therefore base classification on comparisons of speakers with similar levels of language exposure in order to identify language disorder within linguistic variation (Bedore et al., 2018; Oetting et al., 2016). The second research question of the study aims to examine the diagnostic accuracy of single-language assessment with consideration of language exposure.

Nomogram-Based Predictive Modeling

In the current study, nomogram-based prediction models are used. Substantial studies in the speech-language pathology field used explanatory modeling, in which diagnostic accuracy is computed based on the whole sample. This maximizes statistical power by using all observations in the data set but limits generalizability of the model because the model is developed to fit that data set and may not perform as well with new data or another sample (Brooks & Thompson, 2017). Predictive modeling aims to prevent overfitting and increase generalizability by creating a model for predicting new data with reference to the observation of the existing data (Brooks & Thompson, 2017). Predictive modeling splits the data set into a training data set and a validation data set. A model is developed based on the training data set and then applied to the validation data set. This process then generates new training and validation sets each time, and it is evaluated based on all the training (seen) and validation (unseen) data sets with the diagnostic accuracy information.

Nomograms use logistic regression results to construct a graphical tool that provides a visual way to calculate or predict the probability of an outcome based on several variables. Nomograms are widely used by other health-related fields for diagnosis in medical and clinical settings because of their simplicity and ease of use for integrating multiple data points (Blasco-Fontecilla et al., 2024; Gao et al., 2024; Parkhurst et al., 2023). The nomogram is based on a statistical model, such as a logistic regression model. It translates complex mathematical formulas into a visual format that simplifies calculation and interpretation for clinicians. Figure 1 shows a guide to interpreting a nomogram. First, the values for each predictor are plotted along their corresponding lines. Predictor values are then converted into points by aligning the plotted value on the predictor line to the corresponding value directly above it on the reference line. Second, the total score is added up. Finally, the total score is converted into a predicted probability of the outcome of interest using the scale at the bottom of the nomogram. In the current study, on the basis of the cutoff value derived from the prediction model, clinicians can interpret the client's predicted probability as indicative of typical language or DLD directly from the nomogram.

Figure 1.

An illustration of 3 steps to determine the probability from a nomogram. Five scales are displayed from top to bottom. The first scale ranging from 0 to 100 in increments of 10 represents the point. The second scale ranging from 28 down to 0 in decrements of 2 represents test 1. The third scale ranging from 14 down to 0 in decrements of 1 represents test 2. The fourth scale ranging from 0 to 180 in increments of 20 represents the total points. The fifth scale ranging from 0.1 to 0.8 in increments of 0.1 represents the probability. The steps are as follows. Step 1: Convert raw score to point. Test 1 score of 8 is mapped to 72 points. Test 2 score of 6 is mapped to 48 points. Step 2: Calculate the total points. The total points of 120 is obtained by adding 48 with 72. Step 3: Determine the probability. A total point of 120 is mapped to a probability of 0.38 which corresponds to 38 percent.

Steps to interpret a hypothetical nomogram.

Current Study

The current study aims to examine the relative importance of different language tasks and the potential of single-language assessment with language exposure in predicting the presence of DLD. Results are used to develop a nomogram, which is a user-friendly visualization of statistical prediction models, to allow clinicians to evaluate the language performance of bilingual children and predict DLD. Specifically, the current study had two aims:

  1. Develop and validate a nomogram-based prediction model for DLD in children 4–6 years old (a) using a bilingual language standardized test and (b) with the addition of bilingual language exposure.

  2. Develop and validate a nomogram-based prediction model for DLD in children 4–6 years old using (a) the Spanish portion only and (b) the English portion only of the bilingual language standardized test and bilingual language exposure.

Method

Participants

Table 1 shows the participants' demographics for both research questions. Participants were drawn from three data sets that studied bilingual language performance and development in bilingual children with and without DLD. These three data sets included the target measures for the study. The inclusion criteria for the current study are completion of all six subtests of the Bilingual English–Spanish Assessment (BESA; for Research Question 1) or completion of all subtests in either Spanish or English part of the BESA (for Research Question 2).

Table 1.

Participants' demographics.

Variable Research Question 1
DTHC data set (n = 114)
DM data set (n = 186)
CL data set (n = 109)
Total (N = 409)
n % M SD n % M SD n % M SD n % M SD
Age 69.29 10.08 69.27 4.81 69.85 4.49 69.43 6.63
Female 59 52 91 49 55 51 205 50
Spanish exposure 54.53 21.21 51.01 17.30 45.99 28.00 50.55 21.87
Maternal education 2.75 1.98 2.69 1.65 2.95 1.76 2.76 1.75
Participant with DLD 20 18 21 11 16 15 57 14

Research Question 2
Additional Spanish-only participants (n = 326) Total Spanish participants (N = 735) Additional English-only participants (n = 296) Total English participants (N = 705)

Variable

n

%

M

SD

n

%

M

SD

n

%

M

SD

n

%

M

SD
Age 64.05 10.07 67.03 8.76 64.44 10.33 67.68 8.46
Female 159 49 364 50 152 51 357 51
Spanish exposure 76.42 20.14 60.10 24.64 19.84 19.46 39.58 25.67
Maternal education 3.29 1.64 2.98 1.73 2.59 1.87 2.71 25.67
Participant with DLD 61 19 118 16 74 25 131 19

Note. Maternal education: 1 = less than seventh-grade education, 2 = ninth-grade education, 3 = partial high school, 4 = high school graduate, 5 = partial college or specialized training, 6 = college degree, and 7 = graduate degree. DTHC = Development of a Test for Hispanic Children in the U.S.; DM = Diagnostic Marker of Language Impairment; CL = Cross-Language Outcomes of Typical and Atypical Development in Bilinguals; DLD = developmental language disorder.

Development of a Test for Hispanic Children in the U.S. (DTHC) Data Set

The DTHC project was used to develop and validate a new diagnostic language assessment to identify Spanish–English children with DLD. The institutional review board of Temple University granted this study ethics approval (Project No. 199700578-005). For this study, 1,119 participants aged 4;0–6;11 (years;months) were recruited from school districts in California, Texas, and Pennsylvania, where the majority of the students were from low-income families. A one-gate design was employed for this study, which was intended to select a large and representative sample including both typically developing (TD) children and children with DLD (Dollaghan & Horner, 2011). Children with language impairments and TD peers were recruited from the same classrooms. TD children were identified based on parent and teacher reports, as well as clinical observations indicating that they were learning language without any difficulties. On the basis of the inclusion criteria for the current study, 114 participants (94 TD children and 20 children with DLD) who completed both the Spanish and English parts of the BESA were included in the analysis for Research Question 1. In addition, 326 additional participants (265 TD children and 61 children with DLD) completed only the Spanish part of the BESA. There were 296 additional participants (222 TD children and 74 children with DLD) who completed only the English part of the BESA. These participants were included in the analysis for Research Question 2.

Diagnostic Marker of Language Impairment (DM) Data Set

The DM study was a three-phase longitudinal research project in Texas and Utah (Peña et al., 2006). The institutional review board of The University of Texas at Austin granted this study ethics approval (Project No. 2005-09-0096). The screening phase took place when the children were in preschool, with 1,192 participants involved. Children who scored below the 30th percentile on one of two subtests (semantics and morphosyntax) in Spanish and one of two subtests (semantics and morphosyntax) in English in the BESA and who were reported by parents to use both English and Spanish at least 20% of the time were selected for the subsequent two phases. In the longitudinal portion of the study, children completed a series of standardized assessments and language sample tests during kindergarten (Phase 2) and first grade (Phase 3). In the current study, only Phase 2 data were used for analysis as this was the first time the BESA was administered. This consisted of 186 participants (165 TD children and 21 children with DLD).

Cross-Language Outcomes of Typical and Atypical Development in Bilinguals (CL) Data Set

The CL study was a longitudinal cross-sequential data set (Peña et al., 2010). The institutional review board of The University of Texas at Austin granted this study ethics approval (Project No. 2009-11-0110). The screening phase took place when the children were in preschool, first grade, or third grade. Children who (a) had standard scores below 85 on semantic and/or morphosyntax subtests for better language in the Bilingual English–Spanish Oral Language Screener (BESOS; Peña et al., 2010), (b) were reported by parents to use both English and Spanish at least 20% of the time, and (c) were below the age of 5 years at first English exposure were selected for the subsequent longitudinal study for up to 4 years. In addition, two matching TD children were recruited for each above-selected children for the subsequent longitudinal study for up to 4 years. The matching criteria include the following: (a) within 1-point difference in Hollingshead score from the mother's education, (b) within 1-year difference on the first English exposure, (c) within 22% language exposure (mean difference of the matched group = 7%), (d) within 3-month difference in age, and (e) overall counterbalanced of sex. Throughout the longitudinal portion of the study, the children underwent a series of standardized assessments and language sample tests one time each year. In the current study, children who completed the BESA in the first year of the longitudinal study (1 year after screening) were used for analysis. This consisted of 109 participants (93 TD children and 16 children with DLD).

Age, language exposure, and maternal education are normally distributed based on skewness and kurtosis. Analysis of variance was conducted on these variables across three data sets. Language exposure and maternal education did not show any significant difference across data sets, ps > .05. Age showed a significant difference across data sets, p < .001. Post hoc analysis with Bonferroni correction showed that DTHC data sets had a significantly lower age than the other two data sets, ps < .001.

DLD Diagnostic Criteria

Across the three data sets, children were identified with DLD on the basis of converging evidence, which included an evaluation of their performance in both languages. This evaluation included screening results, narrative samples, morphosyntax, and semantics/vocabulary testing in conjunction with teacher and parent reports. Table 2 shows a summary of the similarities and differences across different data sets.

Table 2.

Diagnostic criteria of developmental language disorder across different data sets.

Type of evidence DTHC data set
DM data set
CL data set
Content Evaluation Content Content Evaluation
Language background information 1. Parent ITALK (experimental version) Binary indication of language concern Parent ITALK 1. Parent ITALK (published version) 4.2 or below
2. Teacher ITALK (experimental version) Binary indication of language concern Parent BIOS
Teacher questionnaire
Standardized testing BESA (both languages) 2. BESOS −1 SD below mean on both languages
TOLD-P:3 3. Morphosyntax subtests of BESA/BESA-ME
4. Semantics subtests of BESA/BESA-ME
Language sample analysis 3. A conversational and narrative sample Test of Narrative Language 5. Test of Narrative Language
(a) Ungrammatical utterances > 20% on better language
(b) Mean length of utterance −1 SD below mean on better language
(c) Number of different words
Observation of learning 4. Clinician rating on responsivity and transfer ≤ 4 points
Overall judgment 3 out of 4 criteria Three bilingual SLPs rating 4 out of 5 criteria

Note. DTHC = Development of a Test for Hispanic Children in the U.S.; DM = Diagnostic Marker of Language Impairment; CL = Cross-Language Outcomes of Typical and Atypical Development in Bilinguals; ITALK = Instrument to Assess Language Knowledge (Gutiérrez-Clellen & Kreiter, 2003; Peña et al., 2018); BIOS = Bilingual Input–Output Survey (Peña et al., 2018); BESA = Bilingual English–Spanish Assessment (Peña et al., 2018); BESOS = Bilingual English–Spanish Oral Screener (Peña et al., 2010); TOLD-P:3 = Test of Language Development–Primary: Third Edition (Newcomer & Hammill, 1997); BESA-ME = Bilingual English–Spanish Assessment–Middle Extension (Peña, Bedore, Gutiérrez-Clellen, et al., 2016); SLPs = speech-language pathologists.

DTHC Data Set

Children were identified with DLD on the basis of language sample measures, parent and teacher reports, and clinical observation. Children were identified with language impairment if they met at least three of the following criteria: (a) more than 20% ungrammatical utterances in their better language on a conversational and narrative sample combined, mean length of utterance, or number of different words more than 1 SD below the mean compared to same-age peers in the better language and (b) parent report of concern about the child's language development as compared to similar-age peers using the experimental version of the Instrument to Assess Language Knowledge (ITALK; Gutiérrez-Clellen & Kreiter, 2003). ITALK interviews were completed with parents over the phone and with teachers in person, asking them to rate children on a 5-point scale across five language domains: vocabulary, speech production, sentence production, grammar, and comprehension. Each domain included language-specific examples. Other criteria include (c) teacher concerns about language development as compared to similar-age peers and (d) clinical observation indicating concerns about language impairment or clinical diagnosis by bilingual SLPs. SLPs rate responsivity on a 5-point scale (initiated and maintained interest and attention to the task, highly verbal and attended during model and to prompts) and transfer (did not need reminders of the task goal [telling the story]). A total score below 4 indicates clinical concerns.

DM Data Set

In the DM study, children were identified as having DLD based on an expert review process in Phase 3 (i.e., first grade). Three bilingual SLPs independently examined test protocols, narrative samples, and questionnaires from teachers and parents collected during the study's third phase, when the children were in first grade. Using their experience with bilingual children, the SLPs applied Tomblin et al.'s (1996; Records & Tomblin, 1994) clinical judgment procedure to assess narrative, semantics, and grammar in each language on a 6-point scale (0 = severe/profound impairment, 5 = above normal performance). They then provided a summary rating based on their evaluations and notes. Children were classified as having DLD if two out of the three raters' summary scores indicated mild, moderate, or severe DLD in the child's strongest language (for further details, see Gillam et al., 2013; Wang, Choi-Tucci, et al., 2024).

CL Data Set

The CL study used a set of comprehensive and converging criteria for DLD identification. Bilingual children are classified as having DLD if they met four out of five criteria: (a) below 4.2 on the parent or teacher rating of language concern using the published version of the ITALK (Peña et al., 2018), (b) below −1 SD on the BESOS (Peña et al., 2010) in both languages, (c) below −1 SD on the Test of Narrative Language in both languages (Gillam & Pearson, 2004; Gillam et al., 2017), (d) below −1 SD on morphosyntax in both languages on the BESA or BESA–Middle Extension (BESA-ME; Peña, Bedore, Gutiérrez-Clellen, et al., 2016), and (e) below −1 SD on semantics in both languages on the BESA or BESA-ME.

Measures

Bilingual Language Performance

The BESA was administered to evaluate bilingual language performance. The present study used data from the three phases of test development of the BESA (Peña et al., 2018). The initial language targets were formulated using cross-linguistic developmental patterns observed in Spanish, English, and bilingual Spanish–English speakers, as outlined in existing literature (detailed in the BESA manual; Peña et al., 2018). These language targets were integrated into various elicitation tasks (e.g., cloze tests, sentence repetition, category generation) to minimize systematic measurement error and align with domain sampling theory. Pratt et al. (2024) provided a further description of the dimensionality and hierarchical structure of the BESA.

During the test development phases, language targets that consistently distinguished bilingual children with typical development from those with DLD were kept, while uninformative language targets were removed. The current study only included the items that were administered across three studies. This included 25 and 25 semantic items in Spanish and English, 11 and 17 morphosyntax cloze items in Spanish and English, and 28 and 14 sentence repetition items in Spanish and English, respectively. Note that compared to the published version of the BESA, the current analysis had four and seven fewer morphosyntax cloze items in Spanish and English and seven and 16 fewer sentence repetition items in Spanish and English, respectively. Supplementary Material S1 shows the items included in the current study in comparison to the published version of BESA. These items were only administered in the Temple and CL studies. The published version of the BESA was standardized on a sample of 756 children, aged 4;0–6;11, who exhibited varying levels of bilingualism and dialectal usage across three regions of the United States.

Semantic subtests in Spanish and English. In both languages, semantics concepts were elicited using receptive and expressive questions incorporated into three illustrated stories. The semantic subtests assessed six different skills: category, similarities and differences, analogy, linguistic characteristics, function, and characteristic properties. Each subtest included 25 questions in English and Spanish, targeting parallel concepts such as object shapes, colors, sizes, and functions. These items were validated using teacher–child interaction data (see Peña et al., 2003, for additional information on the semantics measure). The Spanish semantic subtest focused on home-based routines in Spanish-speaking families, while the English semantic subtest focused on academic concepts commonly encountered in English-speaking classrooms. The targets in both subtests were controlled for culturally embedded familiarity. The difficulty level of questions increased throughout the subtests. The interrater reliability for Spanish and English semantics subtests was above .98. The internal reliability for Spanish and English semantics subtests was .88 and .87, respectively.

Morphosyntax cloze subtests in Spanish and English. The cloze subtests required the child to complete a stimulus sentence using the correct morpheme(s). In Spanish, the subtests targeted article/noun agreement, preterite verb forms, and subjunctive verb forms. In English, it targeted possessive nouns, third-person singular present tense, regular and irregular past tense verbs, plural nouns, copula verbs, negations with auxiliaries, and passive-voice constructions. Two-sample items were administered before test items for each morphosyntax target. The interrater reliability for Spanish and English morphosyntax subtests was above .96. The internal reliability for Spanish and English morphosyntax cloze subtests was .83 and .94, respectively.

Sentence repetition subtests in Spanish and English. The sentence repetition task required the child to repeat a sentence verbatim. Instruction and sample items were first given to the child before the test items. These sentences targeted various morphosyntactic constructions that were difficult to elicit using a cloze task. Although not every word in the sentence was scored, the child was required to repeat the entire sentence. Each sentence was only read once. The interrater reliability for Spanish and English sentence repetition subtests was above .96. The internal reliability for Spanish and English sentence repetition subtests was .92 and .89, respectively.

Bilingual Language Exposure

The Bilingual Input–Output Survey, a parent-reported questionnaire, was used to assess the language exposure of bilinguals (Peña et al., 2018). Caregivers were asked to provide information about the child's daily language exposure in terms of input and output on an hourly basis. The percentage of English exposure was calculated following the guidelines in the manual (Peña et al., 2018). English exposure percentage was calculated by averaging the weekly English input and output. The percentage of Spanish exposure was calculated by subtracting the English exposure percentage from 100%.

Data Analysis

Descriptive statistics, including means and standard deviations, were reported. Independent t tests were performed to examine the difference between TD and DLD groups. The data set was split into 80% training data and 20% validation data. The glmnet package (Version 4.1-8) was utilized to fit the logistic regression (Friedman et al., 2010). A 10-fold cross-validation approach was implemented in the training set to determine the optimal penalty parameter, λ. In 10-fold cross-validation, the training data were iteratively partitioned into 10 subsets. At each iteration, one fold was used as the validation set, while the remaining nine folds were used for training. Variables with nonzero coefficients in the model were included in the nomogram model to predict DLD.

To develop a nomogram model for identification of DLD, we used the rms package (Version 6.8-2; Harrell, 2017). Receiver operating characteristic (ROC) curves were generated using the pROC package (Version 1.18.5) to assess the model's predictive accuracy with the selected variables (Robin et al., 2011). For each ROC curve, the area under the ROC curve, sensitivity, specificity, positive likelihood ratio (LR+), negative likelihood ratio (LR−), and threshold were reported. Calibration curves were applied to compare the alignment between observed and predicted probabilities using the rms package (Version 1.18.5; Harrell, 2017). Finally, the DCA package (Version 2.0) was used to generate a decision curve analysis to evaluate the net clinical benefit of the model (Vickers & Elkin, 2006). All statistical analyses were performed using R (Version 4.4.1; R Core Team, 2020).

Results

Nomogram-Based Prediction Model Using Bilingual Subtests

Table 3 shows the means and standard deviations of the variables. There were significant differences between TD children and children with DLD across all subtests, t(407) = 6.60–11.53, ps < .001. Lasso logistic regression was used, and the coefficient of the variables ranged from −.08 to .02 (Supplemental Material S2 shows coefficient profiles of the predictors). Age, Spanish semantics, Spanish morphosyntax cloze, Spanish sentence repetition, English semantics, and English sentence repetition were selected for the prediction model with reference to the importance value. Figure 2a presents the final nomogram based on the prediction model, which was established according to the multivariate logistic regression. Supplemental Material S3 shows the multivariate logistic regression results on the training set. In the nomogram, among individual performance, Spanish sentence repetition had the highest weighting in predicting DLD, followed by English sentence repetition, English semantics, Spanish morphosyntax cloze, age, Spanish semantics, and sex. Supplemental Material S4 shows the conversion of points and the corresponding probability of DLD of the nomogram (see Figure 2a).

Table 3.

Means and standard deviations of variables.

Research Question 1
TD participants (n = 352)
DLD participants (n = 57)
Variable M SD M SD t(407) p
Spanish exposure 50.92 21.63 48.14 23.41 0.86 .39
Spanish semantics 15.67 5.67 10.02 4.81 7.13 < .001
Spanish morphosyntax cloze 5.46 2.18 2.18 2.07 10.05 < .001
Spanish sentence repetition 17.29 7.83 7.61 5.50 11.53 < .001
English semantics 13.52 5.55 8.40 4.60 6.60 < .001
English morphosyntax cloze 7.13 5.24 3.11 3.07 8.16 < .001
English sentence repetition 7.70 4.20 3.25 3.39 8.88 < .001

Research Question 2 with additional Spanish-only participants
TD participants (n = 617) DLD participants (n = 118)

Variable

M

SD

M

SD

t(733)

p
Spanish exposure 58.28 24.54 61.39 27.00 1.18 .24
Spanish semantics 15.41 5.68 8.69 4.96 12.38 < .001
Spanish morphosyntax cloze 6.15 3.17 2.77 2.18 14.13 < .001
Spanish sentence repetition 17.87 7.99 7.56 6.10 16.40 < .001

Research Question 2 with additional English-only participants
TD participants (n = 574) DLD participants (n = 131)

Variable

M

SD

M

SD

t(703)

p
Spanish exposure 41.18 25.35 33.53 25.98 3.11 .002
English semantics 14.79 5.57 9.09 5.05 10.94 < .001
English morphosyntax cloze 8.58 5.35 3.79 3.28 13.19 < .001
English sentence repetition 8.58 4.12 4.01 3.40 13.48 < .001

Note. TD = typically developing; DLD = developmental language disorder.

Figure 2.

The image displays 2 nomograms. a. The first nomogram has 10 scales which are as follows. The first scale ranging from 0 to 100 in increments of 10 represents the points. The second scale ranging from 45 to 95 in increments of 5 represents the Age. The third scale ranges from 1 down to 0 and it represents Female. The fourth scale ranges from 26 down to 2 in decrements of 8 and it represents Sp_Sem. The fifth scale ranges from 24 down to 0 in decrements of 2 and it represents Eng_Sem. The sixth scale ranges from 11 down to 0 in decrements of 1 and it represents Sp_MorCloze. The seventh scale ranges from 28 down to 0 in decrements of 26 and it represents SP_SR. The eighth scale ranges from 14 down to 0 in decrements of 1 and it represents Eng_SR. The ninth scale ranges from 0 to 400 in increments of 50 and it represents the total points. The tenth scale ranges from 0.1 to 0.9 in increments of 0.1 and it represents the probability. b. The second nomogram has 11 scales. The first 3 scales are identical to that of the first nomogram. The fourth scale ranges from 100 down to 0 in decrements of 10 and it represents SP_Exposure. The fifth scale ranges from 26 down to 2 in decrements of 4 and it represents Sp_Sem. The sixth scale ranges from 24 down to 0 in decrements of 2 and it represents Eng_Sem. The seventh scale ranges from 11 down to 0 in decrements of 1 and it represents Sp_MorCloze. The eighth scale ranges from 28 down to 0 in decrements of 2 and it represents SP_SR. The ninth scale ranges from 14 down to 0 in decrements of 1 and it represents Eng_SR. The tenth scale ranges from 0 to 450 in increments of 50 and it represents the total points. The eleventh scale ranges from 0.1 to 0.9 in increments of 0.1 and it represents the probability.

Prediction nomogram for developmental language disorder (a) using Bilingual English–Spanish Assessment subtests and (b) with the addition of bilingual exposure. Sp = Spanish; Eng = English; Sem = semantics; MorCloze = morphosyntax cloze; SR = sentence repetition.

Figures 3a and 3b present the ROC curves of the training and validation sets, respectively, and Table 4 shows their classification accuracy, which suggests the model has acceptable sensitivity and specificity. The calibration curves showed that there was relatively good agreement between the predicted probability and observed probability in these two sets (see Figures 3c and 3d). In addition, the decision curve analyses showed that there was a net benefit of the prediction model in both the training set and the validation set when the high risk threshold is between 7% and 75% (see Figures 3e and 3f). These findings suggest a significant potential clinical use of the model.

Figure 3.

The image displays 6 line graphs. The first two graphs display ROC curves for the training set and the validation set. In both graphs, the y-axis represents the sensitivity and the x-axis represents the specificity. The area under the curve is 0.904 for the training set and 0.954 for the validation set. The third and fourth graphs are apparent, bias corrected, and ideal values for the predicted and actual probabilities for the training and validation sets. The fifth and sixth graphs plot curves related to the net benefit versus the high risk threshold for treat all, treat none, and the prediction model.

Receiver operating characteristic curves in (a) training and (b) validation sets, calibration curves for predicting the probability of developmental language disorder (DLD) in (c) training and (d) validation sets, and decision curve in children with DLD in (e) training and (f) validation sets based on Bilingual English–Spanish Assessment subtests. AUC = area under the receiver operating characteristic curve.

Table 4.

Classification accuracy of different nomogram prediction models.

Model Threshold Sensitivity Specificity LR+ LR− AUC
Bilingual subtests
 Training set 0.02 0.82 0.82 4.56 0.24 0.91
 Validation set 0.18 0.92 0.87 7.03 0.10 0.95
Bilingual subtests with language exposure
 Training set 0.18 0.82 0.80 4.16 0.22 0.91
 Validation set 0.29 0.92 0.91 10.54 0.09 0.95
Spanish-only subtests with language exposure
 Training set 0.19 0.81 0.80 3.97 0.24 0.86
 Validation set 0.22 0.80 0.79 3.81 0.26 0.87
English-only subtests with language exposure
 Training set 0.07 0.73 0.74 2.79 0.37 0.82
 Validation set 0.27 0.74 0.82 4.03 0.32 0.87

Note. LR+ = positive likelihood ratio; LR− = negative likelihood ratio; AUC = area under the receiver operating characteristic curve.

Nomogram-Based Prediction Model Using Bilingual Subtests and Bilingual Language Exposure

There was no significant difference in Spanish exposure between TD children and children with DLD, t(407) = 0.86, p = .39. Lasso logistic regression was used, and the coefficients ranged from −.08 to .02 (Supplemental Material S5 shows coefficient profiles of the predictors). Age, sex, bilingual language exposure, Spanish semantics, Spanish morphosyntax cloze, Spanish sentence repetition, English semantics, and English sentence repetition were selected for the prediction model with reference to the importance value. Figure 2b presents the final nomogram visualization based on the prediction model established according to the multivariate logistic regression. Supplemental Material S6 shows the multivariate logistic regression results on the training set. In the nomogram, among individual performance, English sentence repetition had the highest weighting in predicting DLD, followed by Spanish sentence repetition, English semantics, Spanish morphosyntax cloze, age, Spanish exposure, Spanish semantics, and sex. Supplemental Material S7 shows the conversion of points and the corresponding probability of DLD of the nomogram (see Figure 2b).

Figures 4a and 4b present the ROC curves of the training and validation sets, respectively, and Table 4 shows their classification accuracy, which suggests that the model has acceptable sensitivity and specificity. The calibration curves showed that there was relatively good agreement between the predicted probability and observed probability in these two sets (see Figures 4c and 4d). In addition, the decision curve analyses showed that there was a net benefit of the prediction model in both the training set and the validation set when the high risk threshold is between 7% and 80% (see Figures 4e and 4f). These findings suggest a significant potential clinical use of the model.

Figure 4.

The image displays 6 line graphs. a and b. The first 2 line graphs depict the ROC curves for training and validation set. In both graphs, the x-axis represents the specificity and the y-axis represents the sensitivity. The area under the curve is 0.904 and 0.954 for training and validation sets, respectively. c and d. The third and fourth graphs plot the predicted probability of the training set and the actual probability of the validation set. Curves corresponding to the apparent, bias-corrected and ideal measures are plotted. e and f. The fifth and sixth graphs plot curves related to treat all, treat none, and prediction model. The y and x-axes represent net benefit and high risk threshold, respectively.

Receiver operating characteristic curves in (a) training and (b) validation sets, calibration curves for predicting the probability of developmental language disorder (DLD) in (c) training and (d) validation sets, and decision curve in children with DLD in (e) training and (f) validation sets based on Bilingual English–Spanish Assessment subtests and bilingual exposure. AUC = area under the receiver operating characteristic curve.

Nomogram-Based Prediction Model Using Spanish-Only Subtests and Bilingual Language Exposure

Table 3 shows the means and standard deviations of the variables for Spanish-only participants. There were significant differences between TD children and children with DLD across all Spanish subtests, t(733) = 12.38–16.40, ps < .001. However, there was no significant difference in Spanish exposure, t(733) = 1.18, p = .24. Lasso logistic regression was used, and the coefficients ranged from −.12 to .02 (Supplemental Material S8 shows coefficient profiles of the predictors). Sex, bilingual language exposure, Spanish semantics, Spanish morphosyntax cloze, and Spanish sentence repetition were selected for the prediction model with reference to the importance value. Figure 5a presents the nomogram based on the prediction model that was established according to the multivariate logistic regression. Supplemental Material S9 shows the multivariate logistic regression results on the training set. In the nomogram, among individual performance, Spanish sentence repetition had the highest weighting in predicting DLD, followed by Spanish semantics, Spanish exposure, Spanish morphosyntax cloze, and sex. Supplemental Material S10 shows the conversion of points and the corresponding probability of DLD of the nomogram (see Figure 5a).

Figure 5.

The image displays 2 nomograms. a. The first nomogram has 8 scales. The first scale ranges from 0 to 100 in increments of 10 and it represents the points. The second scale ranges from 1 down to 0 and it represents Female. The third scale ranges from 0 to 90 in increments of 30 and it represents Sp_Exposure. The fourth scale ranges from 26 down to 2 in decrements of 4 and it represents Sp_Sem. The fifth scale ranges from 11 down to 3 in decrements of 4 and it represents Sp_MorCloze. The sixth scale ranges from 40 down to 0 in decrements of 5 and it represents Sp_SR. The seventh scale ranges from 0 to 180 in increments of 20 and it represents the total points. The eighth scale ranges from 0.1 to 0.8 in increments of 0.1 and it represents the probability. b. The second nomogram has 8 scales. The first scale represents the points and it is identical to that in nomogram a. The second scale represents the Age and it ranges from 95 down to 40 in decrements of 5. The third scale represents Female and it is identical to that in nomogram a. The fourth scale ranges from 100 down to 0 in decrements of 10 and it represents Sp_Exposure. The fifth scale ranges from 26 down to 0 in decrements of 2 and it represents Eng_Sem. The sixth scale ranges from 14 down to 0 in decrements of 1 and it represents Eng_SR. The seventh scale ranges from 0 to 280 in increments of 20 and it represents the total points. The eighth scale ranges from 0.1 to 0.8 in increments of 0.1 and it represents the probability.

Prediction nomogram for developmental language disorder using (a) Spanish-only subtests and bilingual exposure and (b) English-only subtests and bilingual exposure. Sp = Spanish; Eng = English; Sem = semantics; MorCloze = morphosyntax cloze; SR = sentence repetition.

Figures 6a and 6b present the ROC curves of the training and validation sets, respectively, and Table 4 shows their classification accuracy, which suggests the model has adequate sensitivity and specificity. The calibration curves showed that there was fair and relatively good agreement between the predicted probability and observed probability in the training and validation sets, respectively (see Figures 6c and 6d). In addition, the decision curve analyses showed that there was a net benefit of the prediction model in both the training set and the validation set when the high risk threshold is between 7% and 50% (see Figures 6e and 6f). These findings suggest a significant potential clinical use of the model.

Figure 6.

The image displays six line graphs. a, b. The first 2 line graphs are ROC curves for the training and validation set. The AUC for the training set is 0.862. The AUC for the validation set is 0.871. c, d. The third and fourth line graphs plot the predicted probability for the training set and the actual probability of the validation set. For the training set, the probability deviates from the linear profile. For the validation set, the actual probability follows the linear profile. Curves for apparent, bias corrected, and ideal results are plotted. e, f. The fifth and sixth graphs plot curves related to the net benefit versus high risk threshold. The curves are treat all, treat none, and prediction model.

Receiver operating characteristic curves in (a) training and (b) validation sets, calibration curves for predicting the probability of developmental language disorder (DLD) in (c) training and (d) validation sets, and decision curve in children with DLD in (e) training and (f) validation sets based on Spanish-only subtests and bilingual exposure. AUC = area under the receiver operating characteristic curve.

Nomogram-Based Prediction Model Using English-Only Subtests and Bilingual Language Exposure

Table 3 shows the means and standard deviations of the variables for English-only participants. There were significant differences between TD children and children with DLD across all English subtests and Spanish exposure, t(703) = 3.11–13.48, ps < .01. Lasso logistic regression was used, and the coefficients ranged from −.11 to −.02 (Supplemental Material S11 shows the coefficient profiles of the predictors). Age, sex, bilingual language exposure, English semantics, and English sentence repetition were selected for the prediction model with reference to the importance value. Figure 5b presents the nomogram based on the prediction model that was established according to the multivariate logistic regression. Supplemental Material S12 shows the multivariate logistic regression results on the training set. In the nomogram, among individual performance, English sentence repetition had the highest weighting in predicting DLD, followed by English semantics, age, Spanish exposure, and sex. Supplemental Material S13 shows the conversion of points and the corresponding probability of DLD of the nomogram (see Figure 5b).

Figures 7a and 7b present the ROC curves of the training and validation sets, respectively, and Table 4 shows their classification accuracy, which suggests that the model does not have adequate sensitivity and specificity. However, the calibration curves showed that there was relatively good agreement between the predicted probability and observed probability in these two sets (see Figures 7c and 7d). In addition, the decision curve analyses showed that there was a net benefit of the prediction model in both the training set and the validation set when the high risk threshold is between 7% and 75% (see Figures 7e and 7f). These findings suggest a significant potential clinical use of the model.

Figure 7.

The image displays 6 graphs. a and b. The first 2 graphs plot the ROC curves for the training data set and the validation data set. The area under the ROC curve is 0.824 for the training data set and 0.872 for the validation data set. In both these graphs, the y-axis represents the sensitivity and the x-axis represents the specificity. c and d. The third and fourth graphs plot the predicted probability of the training data set and the actual probability of the validation data set. The apparent, bias-corrected, and ideal results are plotted. In both graphs, the results follow a linear profile. e and f. The fifth and sixth graphs plot the net benefit versus high risk threshold for the training data set and validation data set. Three curves representing the results for treat all, treat none, and prediction model are plotted.

Receiver operating characteristic curves in (a) training and (b) validation sets, calibration curves for predicting the probability of developmental language disorder (DLD) in (c) training and (d) validation sets, and decision curve in children with DLD in (e) training and (f) validation sets based on English-only subtests and bilingual exposure. AUC = area under the receiver operating characteristic curve.

Discussion

The current study used nomogram-based predictive modeling to evaluate the accuracy of classifying DLD using a standardized bilingual language assessment, using the assessment with the addition of bilingual language exposure, and using portions of the standardized assessment in one language only. Results showed that the weighted bilingual standardized language assessment accurately classified language ability, and the model generalized to another sample. While the addition of bilingual exposure did not improve the classification accuracy of the standardized bilingual language assessment, bilingual exposure was an important factor when assessment was conducted in one language only. For our bilingual samples, Spanish-only assessment with bilingual exposure achieved minimum sensitivity and specificity as suggested by Plante and Vance (1994); however, English-only assessment with bilingual exposure did not.

Importance of Different Subtests

While the current analysis had reduced items in morphosyntax cloze and sentence repetition subtests compared to the final version of the BESA, Spanish and English sentence repetition subtests had the highest importance values in all analyses. This result is consistent with prior studies, which suggest that sentence repetition tasks have good classification accuracy (Archibald & Joanisse, 2009; Conti-Ramsden et al., 2001; Pratt et al., 2021; Wang et al., 2022). This is because sentence repetition is associated not only with grammatical knowledge, which is the hallmark impairment of DLD (Leonard, 2014), but also with lexical knowledge and working memory (Pratt et al., 2021). Children with DLD show impairment in both lexical knowledge (Kan & Windsor, 2010) and working memory (Niu et al., 2024) compared to TD children. Therefore, sentence repetition, with its complex task demands, is an informative tool for assessing Spanish–English bilingual children.

It is also worth noticing that semantic tasks in Spanish and English were important in all analyses. This is consistent with prior research that semantic tasks, while not as strong a predictor compared to morphosyntax tasks, are an important indicator of DLD to prevent underidentification (Peña, Bedore, Lugo-Neris, & Albudoor, 2020). Semantic tasks, which assess vocabulary depth, can help document the semantic representations that can affect the functioning of language comprehension (Jasso et al., 2020). It is different from morphosyntax cloze or sentence repetition tasks, which focus on clinical markers. The inclusion of semantics in the final nomograms suggests that assessments targeting different language domains can complement each other for diagnosis.

Broadly speaking, the current findings suggest that diagnostic accuracy can be maximized by adjusting the weighting between subtests. In the future, this can also be further applied to other tests and tasks, including for standardized assessments of monolinguals. Nomograms can provide a way to easily apply empirically derived weightings of different subtests for clinical decision making. Visually, the scales of each subtests show the relative weighting in prediction. Clinicians can easily convert different raw scores and data into weighted scores, compute the total weighted score, and then convert it into the probability of DLD. Clinicians can make clinical decisions with reference to the probability of DLD derived from the nomograms.

Role of Language Exposure in Bilingual Assessment

The current study also explored the role of language exposure in bilingual language assessment for the purpose of diagnosis, which suggested that language exposure did not change much of the classification accuracy when assessment was administered in both languages. This is not surprising as prior research suggests that language exposure is not associated with children's better language score when they are tested in both languages (Peña et al., 2023). The nomograms in Figure 2 include the language performance in both languages, in which the better language performance was considered. Therefore, the language exposure information did not significantly change the classification accuracy. The adequate classification accuracy of bilingual assessment also aligns with ASHA guidelines, and prior studies suggest that conducting assessment in both languages is considered as the best practice (ASHA, n.d.; Peña, Bedore, & Kester, 2016).

On the other hand, the current study also showed that language exposure was an important predictor of the probability of DLD when language assessment was only considered in one language. The importance of language exposure in the single-language nomogram in Spanish and English was higher than that in the bilingual language nomogram. This is consistent with prior literature on the relationship between language exposure and language performance in bilingual children with and without DLD (Peña, Bedore, Shivabasappa, & Niu, 2020; Smolander et al., 2021; Thordardottir, 2015). Therefore, when assessment data are available in only one language, language exposure can moderate the relationship between probability of DLD and language performance.

Single-Language Assessments for Bilinguals

An interesting finding of the current study was that Spanish assessment with language exposure achieved minimum sensitivity and specificity, while English assessment with language exposure did not. This finding was different from previous studies on using single-language assessment to assess bilingual language performance (Altman et al., 2021; Ehl et al., 2020; Rose et al., 2022), which found low classification accuracy. The current study had the strengths of using a measurement that was developed for the Spanish–English bilingual population, assessment on more than one domain of language, and modified weighting among subtests and including language exposure information. The adequate classification accuracy of the Spanish language assessment with language exposure suggests the possibility of a single-language assessment in bilinguals, and the significance of language exposure information in the nomogram suggests that language exposure is an important consideration for single-language assessment to yield accurate diagnosis of DLD in bilinguals. That being said, it should be noted that clinicians should use converging evidence, including sources such as parent report and dynamic assessment, rather than solely relying on a standardized test for making clinical decisions. In addition, while this nomogram achieved minimum acceptable sensitivity and specificity, this only targets the assessment purpose of making a clinical diagnosis. Clinicians might need more evidence to understand the strengths and weaknesses of the language profile and monitor the progress of language development.

On the other hand, the English assessment with language exposure did not achieve minimum sensitivity and specificity. This could be related to children's relative dominance in Spanish, as most children had exposure to Spanish from birth with a high degree of variation in their first exposure to English. Therefore, they had high Spanish exposure and more cumulative exposure at home, and most of the participants in the current study only received 1–3 years of English exposure at school. It may be that because many children are still in the process of learning English, English-only testing is not sufficient to reliably identify DLD for them. Another explanation is that only a few tasks were included in the analysis. Specifically, the tasks used in the current analysis are decontextualized standardized language tasks. The semantics and morphosyntax skills were assessed using elicited response. This differs from contextualized tasks, such as language sample analysis, which allows the evaluation of language performance in functional contexts. With less cumulative exposure to English than to Spanish, decontextualized standardized language tasks may be more challenging in English for young bilingual children. Regarding the pragmatic concern of limited numbers of bilingual SLPs available that motivated this analysis, subtest weighting and inclusion of exposure alone do not make assessing in English only a viable option for young bilingual children with these three language tasks. Additionally, the additional sample in Research Question 2 completed testing in only one language, changing the distribution of age and language exposure of the data set. Other findings, however, suggest that tailoring assessment targets to different levels of exposure can improve the English-only assessment (Bedore et al., 2018; Jasso et al., 2020; Pratt et al., 2024). Accurate diagnosis of DLD in bilinguals using English-only data may require more granularity in accounting for language exposure at the level of items and targets versus the level of subtest scores and their weighting. The importance of exposure within the model also reiterates the risk of misidentification when assessing bilingual children in only one language and the potential for creating health disparities with lifelong negative impacts.

Clinical Implications

The current study has two major clinical implications. The first implication is the generalizability of the prediction model to a new sample using the same model. The use of k-fold validation avoids overfitting on a single sample and provides a more robust estimate of new data (Brooks & Thompson, 2017). Clinicians can be confident in using the nomogram to assess the probability of DLD as it has the generalizability of new data.

The nomograms from the current study are also easy to use for clinicians to evaluate the probability of DLD. As an example, let us consider a boy who is 6;5 of age and received the following raw scores on the six subtests of the BESA: 6 in Spanish semantics, 2 in Spanish morphosyntax cloze, 4 in Spanish sentence repetition, 4 in English semantics, 2 in English morphosyntax cloze, and 2 in English sentence repetition. Using Figure 2a, the clinician can convert it into the corresponding points and calculate the total points. Figure 8 shows the conversion process using the nomogram, resulting in a total of 320 points. So, this boy has an 80% probability of having DLD, which is a high probability and higher than the identified threshold in Table 4. Clinicians can consider conducting further assessments to gather additional evidence for diagnosis.

Figure 8.

A nomogram for an example case. The nomogram has 10 scales. The first scale ranges from 0 to 100 in increments of 10 and it represents the points. The second scale ranges from 45 to 95 in increments of 5 and it represents the age. The third scale ranges from 1 down to 0 and it represents Female. The fourth scale ranges from 26 down to 2 in decrements of 8 and it represents Sp_Sem. The fifth scale ranges from 24 down to 0 in decrements of 2 and it represents Eng_Sem. The sixth scale ranges from 11 down to 0 in decrements of 1 and it represents Sp_MorCloze. The seventh scale ranges from 28 down to 0 in decrements of 2 and it represents Sp_SR. The eighth scale ranges from 14 down to 0 in decrements of 1 and it represents Eng_SR. The ninth scale ranges from 0 to 400 in increments of 50 and it represents the total points. The tenth scale ranges from 0.1 to 0.9 in increments of 0.1 and it represents the probability. The age is 65 months. A value of 65 on the age scale is mapped to 22 on the points scale. The gender is male. A value of 0 on the Female scale is mapped to 4 points on the points scale. A value of 6 on the Sp_Sem scale is mapped to 11 on the points scale. A value of 4 on the Eng_Sem scale is mapped to 62 on the points scale. A value of 2 on the Sp_MorCloze scale is mapped to 52 on the points scale. A value of 4 on the Sp_SR scale is mapped to 86 on the points scale. A value of 2 on the Eng_SR scale is mapped to 83 on the points scale. The total points is the sum of 4, 11, 22, 52, 62, 83, and 86 yields 320 points. A value of 320 on the total points scale corresponds to 0.8 on the probability scale.

A case example of using the nomogram for diagnosis. Sp = Spanish; Eng = English; Sem = semantics; MorCloze = morphosyntax cloze; SR = sentence repetition.

Limitations

The first limitation of the study is the reduced items for analysis compared to the published version of the BESA. Specifically, some targets of Spanish and English morphosyntax cloze items and some sentence repetition items were not tested in the DM data set and were removed from the current analysis. Thus, the classification accuracy and other psychometric properties were not directly comparable to the published BESA. It might also lower the classification accuracy and other psychometric properties of the single-language assessment.

The second limitation of the study is the characteristic of the additional sample for single-language assessment. While it is statistically rigorous to include a bigger sample for analysis, the additional samples for Spanish only and English only were distinct. Therefore, the classification accuracy and other psychometric properties between Spanish only and English only were not comparable. In addition, the additional samples were Spanish- or English-dominant bilingual children based on reported language exposure. Thus, the skewness of the sample increased and might not be truly representative of the population.

The last limitation was the age difference among data sets. While all demographic variables were normally distributed, participants in the DTHC are significantly younger compared to the children in the other two data sets. The lower age of participants might also have lower English cumulative exposure, resulting in challenges to the single-language assessment in English.

Future Directions

The current study has two future directions. First, it is to widen the scope of measurements to examine the relative importance of multiple measures on diagnostic accuracy. The ASHA guideline and prior studies suggest using a converging evidence framework for making diagnoses in bilinguals (ASHA, n.d.; Castilla-Earls et al., 2020). While this study suggests that different subtests in a standardized test can have different weightings, future studies can examine the weighting of different types of evidence to help maximize the diagnostic accuracy of bilingual children. For example, it might be useful to include parent and/or teacher rating scales and/or the results of narrative measures in future studies of nomograms.

The second direction is to examine the classification accuracy of single-language assessment in older bilingual children. The current study shows that language assessment in Spanish achieved minimum sensitivity and specificity in preschool to early school age, which can be explained by the higher cumulative first-language input. Bilingual children have more exposure to English when they enter school, and dominance patterns can shift to English (Bedore et al., 2012; Kohnert et al., 1999; Wang et al., 2025). Thus, the feasibility of the single-language assessment, especially English-only assessment, can be examined in older bilingual children to address the pragmatic concerns of limited bilingual SLPs available.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Supplementary Material

Supplemental Material S1. Items comparison between this study and the published Bilingual English-Spanish Assessment (BESA; Pena et al., 2018).
JSLHR-68-4337-s001.pdf (35.9KB, pdf)
Supplemental Material S2. Lasso logistic regression results on training set for bilingual subtests.
JSLHR-68-4337-s002.pdf (52.5KB, pdf)
Supplemental Material S3. Conversion of points and risk for the nomogram with bilingual subtests.
JSLHR-68-4337-s003.pdf (32.9KB, pdf)
Supplemental Material S4. Lasso logistic regression results on training set for bilingual subtests and language exposure.
JSLHR-68-4337-s004.pdf (54.5KB, pdf)
Supplemental Material S5. Conversion of points and risk for the nomogram with bilingual subtests and language exposure.
JSLHR-68-4337-s005.pdf (33.6KB, pdf)
Supplemental Material S6. Lasso logistic regression results on training set for Spanish subtests and language exposure.
JSLHR-68-4337-s006.pdf (54.2KB, pdf)
Supplemental Material S7. Conversion of points and risk for the Spanish assessment only nomogram.
JSLHR-68-4337-s007.pdf (31.3KB, pdf)
Supplemental Material S8. Lasso logistic regression results on training set for English subtests and language exposure.
JSLHR-68-4337-s008.pdf (54.3KB, pdf)
Supplemental Material S9. Conversion of points and risk for the English assessment only nomogram.
JSLHR-68-4337-s009.pdf (31.4KB, pdf)
Supplemental Material S10. Regression coefficients of predictors for bilingual subtests.
JSLHR-68-4337-s0010.pdf (71.2KB, pdf)
Supplemental Material S11. Regression coefficients of predictors for bilingual subtests and language exposure.
JSLHR-68-4337-s0011.pdf (82.7KB, pdf)
Supplemental Material S12. Regression coefficients of predictors for Spanish subtests and language exposure.
JSLHR-68-4337-s0012.pdf (95.6KB, pdf)
Supplemental Material S13. Regression coefficients of predictors for English subtests and language exposure.
JSLHR-68-4337-s0013.pdf (79.6KB, pdf)

Acknowledgments

This work was supported by National Institute on Deafness and Other Communication Disorders (NIDCD) Grants N01DC82100 (awarded to Aquiles Iglesias), R01 DC007439-01 (awarded to Elizabeth D. Peña), and R01DC010366 (awarded to Elizabeth D. Peña). This report does not necessarily reflect the views or policy of NIDCD.

Funding Statement

This work was supported by National Institute on Deafness and Other Communication Disorders (NIDCD) Grants N01DC82100 (awarded to Aquiles Iglesias), R01 DC007439-01 (awarded to Elizabeth D. Peña), and R01DC010366 (awarded to Elizabeth D. Peña). This report does not necessarily reflect the views or policy of NIDCD.

References

  1. Altman, C., Harel, E., Meir, N., Iluz-Cohen, P., Walters, J., & Armon-Lotem, S. (2021). Using a monolingual screening test for assessing bilingual children. Clinical Linguistics & Phonetics, 36(12), 1132–1152. 10.1080/02699206.2021.2000644 [DOI] [PubMed] [Google Scholar]
  2. American Speech-Language-Hearing Association. (n.d.). Multilingual service delivery in audiology and speech-language pathology [Practice Portal]. http://www.asha.org/Practice-Portal/ProfessionalIssues/Bilingual-Service-Delivery/
  3. American Speech-Language-Hearing Association. (2023). Profile of ASHA multilingual service providers, year-end 2023. https://www.asha.org/siteassets/surveys/2023-profile-of-multilingual-service-providers.pdf [PDF]
  4. Archibald, L. M. D., & Joanisse, M. F. (2009). On the sensitivity and specificity of nonword repetition and sentence recall to language and memory impairments in children. Journal of Speech, Language, and Hearing Research, 52(4), 899–914. 10.1044/1092-4388(2009/08-0099) [DOI] [PubMed] [Google Scholar]
  5. Arias, G., & Friberg, J. (2017). Bilingual language assessment: Contemporary versus recommended practice in American schools. Language, Speech, and Hearing Services in Schools, 48(1), 1–15. 10.1044/2016_LSHSS-15-0090 [DOI] [PubMed] [Google Scholar]
  6. Bedore, L. M., Peña, E. D., Anaya, J. B., Nieto, R., Lugo-Neris, M. J., & Baron, A. (2018). Understanding disorder within variation: Production of English grammatical forms by English language learners. Language, Speech, and Hearing Services in Schools, 49(2), 277–291. 10.1044/2017_LSHSS-17-0027 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bedore, L. M., Peña, E. D., Summers, C. L., Boerger, K. M., Resendiz, M. D., Greene, K., Bohman, T. M., & Gillam, R. B. (2012). The measure matters: Language dominance profiles across measures in Spanish–English bilingual children. Bilingualism: Language and Cognition, 15(3), 616–629. 10.1017/S1366728912000090 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Blasco-Fontecilla, H., Li, C., Vizcaino, M., Fernández-Fernández, R., Royuela, A., & Bella-Fernández, M. (2024). A nomogram for predicting ADHD and ASD in Child and Adolescent Mental Health Services (CAMHS). Journal of Clinical Medicine, 13(8), Article 2397. 10.3390/jcm13082397 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Boerma, T., Leseman, P., Timmermeister, M., Wijnen, F., & Blom, E. (2016). Narrative abilities of monolingual and bilingual children with and without language impairment: Implications for clinical practice. International Journal of Language & Communication Disorders, 51(6), 626–638. 10.1111/1460-6984.12234 [DOI] [PubMed] [Google Scholar]
  10. Brooks, C., & Thompson, C. (2017). Predictive modelling in teaching and learning. In Lang C., Siemens G., Wise A., & Gašević D. (Eds.), Handbook of learning analytics (pp. 61–68). Society for Learning Analytics Research. 10.18608/hla17.005 [DOI] [Google Scholar]
  11. Castilla-Earls, A., Bedore, L., Rojas, R., Fabiano-Smith, L., Pruitt-Lord, S., Restrepo, M. A., & Peña, E. (2020). Beyond scores: Using converging evidence to determine speech and language services eligibility for dual language learners. American Journal of Speech-Language Pathology, 29(3), 1116–1132. 10.1044/2020_AJSLP-19-00179 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Conti-Ramsden, G., Botting, N., & Faragher, B. (2001). Psycholinguistic markers for specific language impairment (SLI). The Journal of Child Psychology and Psychiatry, 42(6), 741–748. 10.1111/1469-7610.00770 [DOI] [PubMed] [Google Scholar]
  13. Denman, D., Cordier, R., Kim, J. H., Munro, N., & Speyer, R. (2021). What influences speech-language pathologists' use of different types of language assessments for elementary school-age children? Language, Speech, and Hearing Services in Schools, 52(3), 776–793. 10.1044/2021_LSHSS-20-00053 [DOI] [PubMed] [Google Scholar]
  14. Dollaghan, C. A., & Horner, E. A. (2011). Bilingual language assessment: A meta-analysis of diagnostic accuracy. Journal of Speech, Language, and Hearing Research, 54(4), 1077–1088. 10.1044/1092-4388(2010/10-0093) [DOI] [PubMed] [Google Scholar]
  15. Ehl, B., Bruns, G., & Grosche, M. (2020). Differentiated bilingual vocabulary assessment reveals similarities and differences compared to monolinguals: Conceptual versus single-language scoring and the relation with home language and literacy activities. International Journal of Bilingualism, 24(4), 715–728. 10.1177/1367006919876994 [DOI] [Google Scholar]
  16. Friedman, J. H., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22. 10.18637/jss.v033.i01 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Fulcher-Rood, K., Castilla-Earls, A. P., & Higginbotham, J. (2018). School-based speech-language pathologists' perspectives on diagnostic decision making. American Journal of Speech-Language Pathology, 27(2), 796–812. 10.1044/2018_AJSLP-16-0121 [DOI] [PubMed] [Google Scholar]
  18. Gao, T., Yang, L., Zhou, J., Zhang, Y., Wang, L., Wang, Y., & Wang, T. (2024). Development and validation of a nomogram prediction model for ADHD in children based on individual, family, and social factors. Journal of Affective Disorders, 356, 483–491. 10.1016/j.jad.2024.04.069 [DOI] [PubMed] [Google Scholar]
  19. Gillam, R. B., & Pearson, N. (2004). Test of Narrative Language. Pro-Ed. [Google Scholar]
  20. Gillam, R. B., Peña, E. D., Bedore, L. M., Bohman, T. M., & Mendez-Perez, A. (2013). Identification of specific language impairment in bilingual children: I. Assessment in English. Journal of Speech, Language, and Hearing Research, 56(6), 1813–1823. 10.1044/1092-4388(2013/12-0056) [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Gillam, R. B., Peña, E. D., Bedore, L. M., & Pearson, N. (2017). Test of Narrative Language, Spanish [Manuscript in preparation]. University of California, Irvine.
  22. Gray, S., Plante, E., Vance, R., & Henrichsen, M. (1999). The diagnostic accuracy of four vocabulary tests administered to preschool-age children. Language, Speech, and Hearing Services in Schools, 30(2), 196–206. 10.1044/0161-1461.3002.196 [DOI] [PubMed] [Google Scholar]
  23. Guiberson, M., & Vigil, D. (2021). Speech-language pathology graduate admissions: Implications to diversify the workforce. Communication Disorders Quarterly, 42(3), 145–155. 10.1177/1525740120961049 [DOI] [Google Scholar]
  24. Gutiérrez-Clellen, V. F., & Kreiter, J. (2003). Understanding child bilingual acquisition using parent and teacher reports. Applied Psycholinguistics, 24(2), 267–288. 10.1017/S0142716403000158 [DOI] [Google Scholar]
  25. Harrell, F. E., Jr. (2017). Package 'rms' (Version 6.8-2). Vanderbilt University. [Google Scholar]
  26. Hassan, F. H. B., Lee, G. Z. H., Razak, R. A., Aziz, M. A. A., & Joginder Singh, S. (2024). The management of multilingual adults with aphasia in Malaysia: Current practices, needs, and challenges. Aphasiology, 38(3), 487–509. 10.1080/02687038.2023.2214299 [DOI] [Google Scholar]
  27. Jasso, J., McMillen, S., Anaya, J. B., Bedore, L. M., & Peña, E. D. (2020). The utility of an English semantics measure for identifying developmental language disorder in Spanish–English bilinguals. American Journal of Speech-Language Pathology, 29(2), 776–788. 10.1044/2020_AJSLP-19-00202 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Jordaan, H. (2008). Clinical intervention for bilingual children: An international survey. Folia Phoniatrica et Logopaedica, 60(2), 97–105. 10.1159/000114652 [DOI] [PubMed] [Google Scholar]
  29. Kan, P. F., & Windsor, J. (2010). Word learning in children with primary language impairment: A meta-analysis. Journal of Speech, Language, and Hearing Research, 53(3), 739–756. 10.1044/1092-4388(2009/08-0248) [DOI] [PubMed] [Google Scholar]
  30. Kohnert, K. J., Bates, E., & Hernandez, A. E. (1999). Balancing bilinguals: Lexical-semantic production and cognitive processing in children learning Spanish and English. Journal of Speech, Language, and Hearing Research, 42(6), 1400–1413. 10.1044/jslhr.4206.1400 [DOI] [PubMed] [Google Scholar]
  31. Kritikos, E. P. (2003). Speech-language pathologists' beliefs about language assessment of bilingual/bicultural individuals. American Journal of Speech-Language Pathology, 12(1), 73–91. 10.1044/1058-0360(2003/054) [DOI] [PubMed] [Google Scholar]
  32. Leonard, L. B. (2014). Specific language impairment across languages. Child Development Perspectives, 8(1), 1–5. 10.1111/cdep.12053 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Morgan, P. L., Farkas, G., Hillemeier, M. M., Mattison, R., Maczuga, S., Li, H., & Cook, M. (2015). Minorities are disproportionately underrepresented in special education: Longitudinal evidence across five disability conditions. Educational Researcher, 44(5), 278–292. 10.3102/0013189X15591157 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Newcomer, P., & Hammill, D. (1997). Test of Language Development–Primary: 3. Pro-Ed. [DOI] [PubMed] [Google Scholar]
  35. Niu, T., Wang, S., Ma, J., Zeng, X., & Xue, R. (2024). Executive functions in children with developmental language disorder: A systematic review and meta-analysis. Frontiers in Neuroscience, 18, Article 1390987. 10.3389/fnins.2024.1390987 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Oetting, J. B., Gregory, K. D., & Rivière, A. M. (2016). Changing how speech-language pathologists think and talk about dialect variation. Perspectives of the ASHA Special Interest Groups, 1(16), 28–37. 10.1044/persp1.SIG16.28 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Office of Minority Health. (2016). National culturally and linguistically appropriate services standards. U.S. Department of Health and Human Services. https://thinkculturalhealth.hhs.gov/clas/standards [Google Scholar]
  38. Ogiela, D. A., & Montzka, J. L. (2021). Norm-referenced language test selection practices for elementary school children with suspected developmental language disorder. Language, Speech, and Hearing Services in Schools, 52(1), 288–303. 10.1044/2020_LSHSS-19-00067 [DOI] [PubMed] [Google Scholar]
  39. Paradis, J. (2010). Bilingual children's acquisition of English verb morphology: Effects of language exposure, structure complexity, and task type. Language Learning, 60(3), 651–680. 10.1111/j.1467-9922.2010.00567.x [DOI] [Google Scholar]
  40. Parkhurst, J. T., Vesco, A. T., Ballard, R. R., & Lavigne, J. V. (2023). Improving diagnostic accuracy: Comparison of nomograms and classification tree analyses for predicting the diagnosis of oppositional defiant disorder. Psychological Services, 20(S2), 184–195. 10.1037/ser0000670 [DOI] [PubMed] [Google Scholar]
  41. Peña, E. D., Bedore, L. M., & Gillam, R. B. (2006). Diagnostic markers of language impairment in Spanish–English bilinguals. National Institute on Deafness and Other Communication Disorders. [Google Scholar]
  42. Peña, E. D., Bedore, L. M., Gutiérrez-Clellen, V. F., Iglesias, A., & Goldstein, B. A. (2010). Bilingual English–Spanish Oral Screener (BESOS). The University of Texas at Austin. [Google Scholar]
  43. Peña, E. D., Bedore, L. M., Gutiérrez-Clellen, V. F., Iglesias, A., & Goldstein, B. A. (2016). Bilingual English–Spanish Assessment–Middle Extension (BESA-ME) [Unpublished manuscript].
  44. Peña, E. D., Bedore, L. M., & Kester, E. S. (2016). Assessment of language impairment in bilingual children using semantic tasks: Two languages classify better than one. International Journal of Language & Communication Disorders, 51(2), 192–202. 10.1111/1460-6984.12199 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Peña, E. D., Bedore, L. M., Lugo-Neris, M. J., & Albudoor, N. (2020). Identifying developmental language disorder in school age bilinguals: Semantics, grammar, and narratives. Language Assessment Quarterly, 17(5), 541–558. 10.1080/15434303.2020.1827258 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Peña, E., Bedore, L. M., & Rappazzo, C. (2003). Comparison of Spanish, English, and bilingual children's performance across semantic tasks. Language, Speech, and Hearing Services in Schools, 34(1), 5–16. 10.1044/0161-1461(2003/001) [DOI] [PubMed] [Google Scholar]
  47. Peña, E. D., Bedore, L. M., Shivabasappa, P., & Niu, L. (2020). Effects of divided input on bilingual children with language impairment. International Journal of Bilingualism, 24(1), 62–78. 10.1177/1367006918768367 [DOI] [Google Scholar]
  48. Peña, E. D., Bedore, L. M., & Vargas, A. G. (2023). Exploring assumptions of the bilingual delay in children with and without developmental language disorder. Journal of Speech, Language, and Hearing Research, 66(12), 4739–4755. 10.1044/2023_JSLHR-23-00117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Peña, E. D., Gutiérrez-Clellen, V. F., Iglesias, A., Goldstein, B. A., & Bedore, L. M. (2018). Bilingual English–Spanish Assessment (BESA). Brookes. [Google Scholar]
  50. Plante, E., & Vance, R. (1994). Selection of preschool language tests: A data-based approach. Language, Speech, and Hearing Services in Schools, 25(1), 15–24. 10.1044/0161-1461.2501.15 [DOI] [Google Scholar]
  51. Pratt, A. S., Durant, K., Peña, E. D., & Bedore, L. M. (2024). Modeling dimensionality of bilingual kindergarteners' language knowledge in Spanish and English. Journal of Speech, Language, and Hearing Research, 67(7), 2244–2268. 10.1044/2024_JSLHR-22-00140 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Pratt, A. S., Peña, E. D., & Bedore, L. M. (2021). Sentence repetition with bilinguals with and without DLD: Differential effects of memory, vocabulary, and exposure. Bilingualism: Language and Cognition, 24(2), 305–318. 10.1017/S1366728920000498 [DOI] [Google Scholar]
  53. Pratt, A., Ramos, M. N., Peña, E. D., & Bedore, L. (2024). Tailoring grammatical items in English according to language experience improves classification of DLD in bilinguals [Manuscript submitted for publication]. University of Cincinnati College of Allied Health Sciences. [Google Scholar]
  54. R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-pro-ject.org
  55. Records, N. L., & Tomblin, J. B. (1994). Clinical decision making: Describing the decision rules of practicing speech-language pathologists. Journal of Speech and Hearing Research, 37(1), 144–156. 10.1044/jshr.3701.144 [DOI] [PubMed] [Google Scholar]
  56. Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., & Müller, M. (2011). pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12(1), Article 77. 10.1186/1471-2105-12-77 [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Rose, K., Armon-Lotem, S., & Altman, C. (2022). Profiling bilingual children: Using monolingual assessment to inform diagnosis. Language, Speech, and Hearing Services in Schools, 53(2), 494–510. 10.1044/2021_LSHSS-21-00099 [DOI] [PubMed] [Google Scholar]
  58. Santhanam, S. P., & Parveen, S. (2018). Serving culturally and linguistically diverse clients: A review of changing trends in speech-language pathologists' self-efficacy and implications for stakeholders. Clinical Archives of Communication Disorders, 3(3), 165–177. 10.21849/cacd.2018.00395 [DOI] [Google Scholar]
  59. Smolander, S., Laasonen, M., Arkkila, E., Lahti-Nuuttila, P., & Kunnari, S. (2021). L2 vocabulary acquisition of early sequentially bilingual children with TD and DLD affected differently by exposure and age of onset. International Journal of Language & Communication Disorders, 56(1), 72–89. 10.1111/1460-6984.12583 [DOI] [PubMed] [Google Scholar]
  60. Sullivan, A. L., & Bal, A. (2013). Disproportionality in special education: Effects of individual and school variables on disability risk. Exceptional Children, 79(4), 475–494. 10.1177/001440291307900406 [DOI] [Google Scholar]
  61. Thordardottir, E. (2015). The relationship between bilingual exposure and morphosyntactic development. International Journal of Speech-Language Pathology, 17(2), 97–114. 10.3109/17549507.2014.923509 [DOI] [PubMed] [Google Scholar]
  62. Tomblin, J. B., Records, N. L., & Zhang, X. (1996). A system for the diagnosis of specific language impairment in kindergarten children. Journal of Speech and Hearing Research, 39(6), 1284–1294. 10.1044/jshr.3906.1284 [DOI] [PubMed] [Google Scholar]
  63. U.S. Census Bureau. (2019). American Community Survey (ACS). https://www.census.gov/programs-surveys/acs
  64. Vickers, A. J., & Elkin, E. B. (2006). Decision curve analysis: A novel method for evaluating prediction models. Medical Decision Making, 26(6), 565–574. 10.1177/0272989X06295361 [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Wang, D., Choi-Tucci, A., Mendez-Perez, A., Gillam, R. B., Bedore, L. M., & Peña, E. D. (2024). Where to start: Use of the Bilingual Multidimensional Ability Scale (B-MAS) to identify developmental language disorder (DLD) in bilingual children. International Journal of Speech-Language Pathology, 27(2), 172–188. 10.1080/17549507.2024.2322646 [DOI] [PubMed] [Google Scholar]
  66. Wang, D., Lam, J. H. Y., McMillen, S., Su, P. L., Iglesias, A., Bedore, L. M., & Peña, E. D. (2025). Dual language profiles in Spanish–English bilingual children with and without developmental language disorder. Bilingualism. Advance online publication. 10.1017/S1366728925100102. [DOI]
  67. Wang, D., Zheng, L., Lin, Y., Zhang, Y., & Sheng, L. (2022). Sentence repetition as a clinical marker for Mandarin-speaking preschoolers with developmental language disorder. Journal of Speech, Language, and Hearing Research, 65(4), 1543–1560. 10.1044/2021_JSLHR-21-00401 [DOI] [PubMed] [Google Scholar]
  68. Wiig, E. H., Semel, E., & Secord, W. A. (2013). Clinical Evaluation of Language Fundamentals–Fifth Edition (CELF-5). Pearson. [Google Scholar]
  69. Williams, C. J., & McLeod, S. (2012). Speech-language pathologists' assessment and intervention practices with multilingual children. International Journal of Speech-Language Pathology, 14(3), 292–305. 10.3109/17549507.2011.636071 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material S1. Items comparison between this study and the published Bilingual English-Spanish Assessment (BESA; Pena et al., 2018).
JSLHR-68-4337-s001.pdf (35.9KB, pdf)
Supplemental Material S2. Lasso logistic regression results on training set for bilingual subtests.
JSLHR-68-4337-s002.pdf (52.5KB, pdf)
Supplemental Material S3. Conversion of points and risk for the nomogram with bilingual subtests.
JSLHR-68-4337-s003.pdf (32.9KB, pdf)
Supplemental Material S4. Lasso logistic regression results on training set for bilingual subtests and language exposure.
JSLHR-68-4337-s004.pdf (54.5KB, pdf)
Supplemental Material S5. Conversion of points and risk for the nomogram with bilingual subtests and language exposure.
JSLHR-68-4337-s005.pdf (33.6KB, pdf)
Supplemental Material S6. Lasso logistic regression results on training set for Spanish subtests and language exposure.
JSLHR-68-4337-s006.pdf (54.2KB, pdf)
Supplemental Material S7. Conversion of points and risk for the Spanish assessment only nomogram.
JSLHR-68-4337-s007.pdf (31.3KB, pdf)
Supplemental Material S8. Lasso logistic regression results on training set for English subtests and language exposure.
JSLHR-68-4337-s008.pdf (54.3KB, pdf)
Supplemental Material S9. Conversion of points and risk for the English assessment only nomogram.
JSLHR-68-4337-s009.pdf (31.4KB, pdf)
Supplemental Material S10. Regression coefficients of predictors for bilingual subtests.
JSLHR-68-4337-s0010.pdf (71.2KB, pdf)
Supplemental Material S11. Regression coefficients of predictors for bilingual subtests and language exposure.
JSLHR-68-4337-s0011.pdf (82.7KB, pdf)
Supplemental Material S12. Regression coefficients of predictors for Spanish subtests and language exposure.
JSLHR-68-4337-s0012.pdf (95.6KB, pdf)
Supplemental Material S13. Regression coefficients of predictors for English subtests and language exposure.
JSLHR-68-4337-s0013.pdf (79.6KB, pdf)

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.


Articles from Journal of Speech, Language, and Hearing Research : JSLHR are provided here courtesy of American Speech-Language-Hearing Association

RESOURCES