Abstract
Objective:
Broadband parent rating scales are commonly used to assess behavioral problems in children. Multiple rating scales are available, yet agreement between them is not well-understood. The objective of this study was to evaluate agreement between the Behavior Assessment System for Children, Third Edition (BASC-3) and Child Behavior Checklist 1 ½ - 5 years (CBCL) in a sample of children born very preterm.
Method:
We assessed 73 children born < 30 weeks gestational age whose caregivers completed the BASC-3 and CBCL at age 4. We examined correlations, within-person differences, and agreement in clinical categorization for all corresponding subscales and composites.
Results:
Comparable subscales on the BASC-3 and CBCL were significantly correlated, albeit to differing magnitudes. Subscales indexing hyperactivity and attention problems were the most comparable across the two measures, evidenced by strong correlations and few to no differences in mean T-scores. Composite scores indexing internalizing, externalizing, and total problems were also strongly correlated, and there were no differences in the mean T-scores for externalizing or total problems across measures. Agreement in clinical classifications were weak to moderate, though again, the highest agreement was found for hyperactivity, attention, externalizing and total problems.
Conclusion:
Agreement between BASC-3 and CBCL subscales was weak to moderate, with the exception of subscales related to attention and hyperactivity, as well as composite scores indicating overall behavior problems. Researchers and clinicians should consider these discrepancies when interpreting results of behavior rating scales with preschool children, as conclusions could differ based on the assessment that is used.
Broadband parent rating scales are commonly used in clinical and research settings to measure symptoms of behavioral problems in children. Multiple rating scales are available, yet the agreement between similarly named constructs measured on different scales is not well understood. The Behavior Assessment System for Children, Third Edition (BASC-3) Parent Rating Scale - Preschool (PRS-P)1 and Child Behavior Checklist 1 ½ - 5 years (CBCL 1.5–5)2 are two behavior rating scales commonly used to assess children, including typically-developing children and children at elevated risk for psychopathology. Little research has been conducted comparing results from the two measures, though this information would be useful for interpreting and comparing research findings as well as aiding in clinical decision making. The aim of the present study is to evaluate agreement between measures by comparing results from a group of 73 very preterm children who were assessed using both the BASC-3 PRS-P and CBCL 1.5–5 at age 4. This group is especially important group to study given that children born preterm are at increased risk for behavioral difficulties.3
METHODS
The Neonatal Neurobehavior and Outcomes in Very Preterm Infants (NOVI) study recruited infants from nine Neonatal Intensive Care Units (NICUs) affiliated with six universities from April 2014 to June 2016. This paper includes infants from one NOVI site recruited from the NICU of Women & Infants Hospital of Rhode Island (WIHRI) in Providence, RI. Only this site was included as this was the only site that administered both the BASC and the CBCL at age 4. Inclusion criteria for the NOVI study included: (1) born <30 weeks gestational age (GA); (2) parental ability to read and speak English or Spanish; and (3) residence within 3 hours of the NICU and follow-up clinic. Exclusion criteria included major congenital anomalies, maternal age <18 years, cognitive impairment, or death. Families were approached when survival to discharge was determined to be likely by the attending neonatologist and parents provided informed consent per the WIHRI institutional review board. More information regarding recruitment have been published previously.4
Demographic and medical information about the mother and child were collected at the time of recruitment. Child behavior problems were assessed via parent report at a 4-year follow up visit. Children were included in this analysis if their caregiver completed both the BASC-3 PRS-P and the CBCL 1.5–5 at the age 4 year follow up (N = 73).
Measures
Behavior Assessment System for Children, Third Edition (BASC-3).1
The BASC-3 Parent Rating Scale-Preschool (PRS-P) is a widely used measure of preschool-aged children’s adaptive and maladaptive behavior in home and community settings for use in children ages 2 through 5. Parents are asked to rate 139 statements describing their child’s behavior on a 4-point rating scale of 0 (“Never”), 1 (“Sometimes”), 2 (“Often”), or 3 (“Almost Always”). Items correspond to 12 subscales indexing Hyperactivity, Aggression, Anxiety, Depression, Somatization, Attention Problems, Atypicality, Withdrawal, Adaptability, Social Skills, Activities of Daily Living, and Functional Communication. Four composite scales are also scored: Externalizing Problems (sum of items from Hyperactivity, Aggression, and Conduct Problems subscales), Internalizing Problems (sum of items from Anxiety, Depression, and Somatization subscales), Behavioral Symptoms Index (BSI; sum of all items in Externalizing and Internalizing composites as well as Attention Problems, Atypicality, and Withdrawal subscales), and Adaptive Skills (sum of Adaptability, Social Skills, Activities of Daily Living, and Functional Communication subscales). Raw scores are summed for each subscale and composite score and transformed into standardized T-scores (M = 50; SD = 10) for interpretation. T-scores in the range of 60–69 are considered “at risk” whereas T-scores of 70 and above are considered “clinically significant.”1 The BASC-3 PRS-P has been shown to have strong internal consistency and test-retest reliability.5
Child Behavior Checklist 1 ½ - 5 years (CBCL 1 ½−5)2.
The CBCL is another widely used parent-report measure of behavioral problems in preschool-aged children. Parents are asked to rate 99 behaviors on a 3-point rating scale of 0 (“Not True”), 1 (“Somewhat or Sometimes True”), or 2 (“Very True or Often True”). Items correspond to 7 syndrome subscales of Emotionally Reactive, Anxious/Depressed, Somatic Complaints, Withdrawn, Sleep Problems, Attention Problems, and Aggressive Behaviors as well as 5 DSM-Oriented subscales of Affective Problems, Anxiety Problems, Pervasive Developmental Problems (PDD), Attention Deficit/Hyperactivity Problems (ADHD), and Oppositional Defiant Problems. Three composite scales are also scored: Internalizing Problems (sum of items from Emotionally Reactive, Anxious/Depressed, Withdrawn, and Somatic Complaints subscales), Externalizing Problems (sum of items from Attention Problems and Aggressive Behaviors subscales), and Total Problems (sum of all items). Like the BASC-3, raw scores are summed for each subscale and composite score and converted to normalized T-scores. Unlike the BASC-3, CBCL T-scores are truncated to a lower bound of 50 for subscale scores, meaning the distribution of T-scores deviate from the expected M of 50 and SD of 10.2 For the subscales, T-scores in the range of 65–69 are considered to be in the “borderline” clinical range whereas T-scores of 70 and above are considered to fall in the clinical range. For the composite scores, T-scores of 60–63 are considered “borderline” whereas scores of 64 and higher fall in the clinical range. The CBCL has been shown to have strong internal consistency and test-retest reliability.6
Statistical Analysis
We conducted three sets of analyses to assess agreement between the BASC and the CBCL. Of note, some subscales were not compared because they were not assessed on both instruments. These included the BASC subscales Adaptability, Social Skills, Activities of Daily Living, and Functional Communication, and the composite index Adaptive Skills. The CBCL syndrome subscale Sleep Problems was also excluded, along with the DSM-Oriented subscale Oppositional Defiant Problems. Other subscales were used in multiple comparisons due to overlap between scales.
First, we examined the correlations between corresponding subscales and composites. We used Spearman’s rank correlation due to the non-normally distributed data (e.g., truncated CBCL scores, high skewness and kurtosis values for BASC). The magnitude of the correlation coefficient describes the extent to which children who scored high on one measure also scored high on the other measure. However, high correlations do not necessarily mean that scores on the two measures can be used interchangeably.
Thus, as a second step, we examined the magnitude and significance of within-person differences on the two subscales. That is, we subtracted participants’ CBCL T-score from their BASC T-score for corresponding scales and tested whether the difference was significantly different from 0, using Wilcoxon signed rank tests for paired samples. Again, we used a nonparametric test to account for non-normally distributed data. A significant result would indicate that the two test scores vary systematically, with CBCL scores either being lower than BASC scores (positive test statistic) or CBCL scores being higher than BASC scores (negative test statistic), on average. Because CBCL T-scores are truncated at 50 for subscale scores2, there was a high likelihood that CBCL T-scores would differ systematically from BASC T-scores, which can take on values below 50. Thus, to put the CBCL and BASC T-scores on the same metric, we similarly truncated BASC T-scores to a lower bound of 50 (i.e., all T-scores lower than 50 were recoded as 50) for all subscales and then repeated the Wilcoxon signed rank tests for paired samples.
Finally, we assessed agreement in clinical categorization across the two measures. We used cross-tabulation tables to describe the proportion of children categorized as below or above the clinical cutoff on both, neither, or only one of the two measures. Total agreement was defined as the proportion of children who were categorized as below the cutoff on both measures plus the proportion of children who were categorized as above the cutoff on both measures. We also calculated a kappa statistic to further quantify the extent of agreement between the two measures. A kappa value of 0.4–0.6 indicates moderate agreement, whereas a value of 0.6–0.8 indicates substantial agreement and 0.8–1.0 indicates very strong to near perfect agreement.7 We assessed agreement using the borderline/at-risk thresholds defined by the test creators (i.e., T-score > 60 on BASC but T-score > 65 on the CBCL).1,2 We also assessed agreement when applying a uniform threshold (T-score > 65) to both measures, using a threshold previously reported.8
RESULTS
Descriptive Statistics
Our sample included 73 infants from 65 mothers recruited at WIHRI (Table 1). In the included sample, 3.1% of mothers identified as Asian, 12% Black, 1.5% Native Hawaiian or other Pacific Islander, 57% White, 14% more than one race, and 28% Hispanic/Latino. There were 22% of mothers with less than a high school degree. A little more than half (56%) of infants were male. Bronchopulmonary dysplasia (BPD) was the most common neonatal medical morbidity, with 63% of infants affected.
Table 1.
Demographic and medical characteristics of study sample
Maternal characteristics (N = 65) | M (SD) or % (n) |
---|---|
| |
Minoritized race or ethnicity | 49% (32/65) |
American Indian / Alaska Native race | 0% (0/65) |
Asian race | 3.1% (2/65) |
Black or African American race | 12% (8/65) |
Native Hawaiian or Other Pacific Islander race | 1.5% (1/65) |
White race | 57% (37/65) |
More than one race | 14% (9/65) |
Unknown/not reported race | 12% (8/65) |
Hispanic/Latino/a ethnicity | 28% (18/65) |
Low SES: Hollingshead level 5 | 17% (11/65) |
Maternal education: < HS/GED | 22% (14/65) |
No partner | 28% (18/65) |
| |
Infant characteristics (N = 73) | M (SD) or % (n) |
| |
Sex = Male | 56% (41/73) |
Multiple gestation | 29% (21/73) |
Vaginal delivery | 30% (22/73) |
Severe retinopathy of prematurity (ROP) | 2.7% (2/73) |
Necrotizing enterocolitis/sepsis | 19% (14/73) |
Bronchopulmonary dysplasia (BPD) | 63% (46/73) |
Serious brain injury | 12% (9/73) |
GA at birth (weeks) | 27.13 (1.9) |
Head circumference (cm) | 24.48 (2.12) |
GA at NICU discharge (weeks) | 41.15 (4.5) |
Length of NICU stay (days) | 99.93 (37.7) |
Birth weight (g) | 914.3 (241) |
Weight at discharge (g) | 3309 (934) |
Note. There were 73 infants born to 65 mothers included in the current analyses. Number of infants and mothers are different due to multiple births. Minority race or ethnicity was defined as any non-White race (e.g., Black, Asian) or ethnicity (e.g., Hispanic and/or Latino/a). Serious brain injury included parenchymal echodensity, periventricular leukomalacia, or ventricular dilation diagnosed via cranial ultrasound. SES, socioeconomic status; HS, high school; GED, General Equivalency Diploma; GA, gestational age; NICU, neonatal intensive care unit.
Some demographic differences were noted between the included sample and the larger NOVI sample (Supplemental Table 1). Mothers in the included sample were more likely to identify as White (57% vs 42%) and were more likely to have less than a high school degree (22% vs 13%), compared the mothers in the larger NOVI sample (all p < .05). Concerning infant medical characteristics, BPD was more prevalent (63% vs 51%) and infant weight at discharge was larger in the included sample (M = 3309g, SD = 934g) when compared to the overall NOVI sample (M = 3014g, SD = 905g), all p < .05.
Comparisons of BASC and CBCL T-Scores
All comparable scales on the BASC and CBCL were significantly correlated (Table 2), but with differing magnitudes. Overall, BASC and CBCL subscales were modestly correlated (r = .40 to .72, all p < .001). The weakest correlations were observed between the BASC Depression and the CBCL Anxious/Depressed scale (r = .40), the BASC Anxiety and the CBCL Anxious/Depressed scale (r = .41), and the BASC Somatization and the CBCL Somatic Complaints scale (r = .41), all p < .001. The strongest correlations among the subscale scores were observed between BASC Attention Problems and CBCL Attention Problems (r = .74) and CBCL ADHD Problems (r = .72), all p < .001.
Table 2.
Spearman’s rho correlation coefficients comparing CBCL and BASC-3 subscale and composite scores
BASC-3 Scales |
|||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
CBCL Scales | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| |||||||||||
Anxious/Depressed | .37** | .30** | .35** | .41 *** | .40 *** | .34** | .39*** | .46*** | .50*** | .39*** | .51*** |
Somatic Complaints | .17 | .23* | .19 | .22 | .24* | .41 *** | .41*** | .18 | .38*** | .20 | .33** |
Withdrawn | .34** | .52*** | .21 | .19 | .25* | .33** | .64*** | .60 *** | .36** | .29* | .55*** |
Attention Problems | .61 *** | .74 *** | .34** | .18 | .35** | .36** | .57*** | .38*** | .38*** | .53*** | .68*** |
Aggressive Behavior | .73*** | .53*** | .56 *** | .37*** | .61*** | .36** | .48*** | .30** | .53*** | .72*** | .76*** |
Affective Problems | .43*** | .53*** | .38*** | .25* | .48 *** | .46*** | .55*** | .25* | .52*** | .45*** | .59*** |
Anxiety Problems | .36** | .24* | .37*** | .50 *** | .34** | .26* | .35** | .47*** | .45*** | .38*** | .47*** |
Pervasive Developmental Problems | .40*** | .42*** | .33** | .29* | .33** | .35** | .63 *** | .68*** | .43*** | .38*** | .62*** |
ADHD Problems | .66 *** | .72 *** | .39*** | .27* | .42*** | .37*** | .55*** | .33** | .43*** | .58*** | .71*** |
Internalizing | .45*** | .50*** | .35** | .44*** | .47*** | .50*** | .60*** | .58*** | .61 *** | .43*** | .67*** |
Externalizing | .70*** | .69*** | .53*** | .37*** | .60*** | .48*** | .61*** | .41*** | .60*** | .69 *** | .83*** |
Total Problems | .62*** | .64*** | .47*** | .38*** | .56*** | .50*** | .65*** | .49*** | .60*** | .61*** | .80 *** |
Note. 1 = Hyperactivity; 2 = Attention; 3 = Aggression; 4 = Anxiety; 5 = Depression; 6 = Somatization; 7 = Atypicality; 8 = Withdrawal; 9 = Internalizing; 10 = Externalizing; 11 = Behavior Symptoms Index. Bolding indicates correlations between corresponding CBCL and BASC-3 scales.
p ≤ .05;
p ≤ .01;
p ≤ .001
We observed stronger correlations between the BASC and CBCL composite scores (r = .61 to .80, all p < .001). The largest correlation was observed between the BASC Behavior Symptoms Index and the CBCL Total Problems score (r = .80, p < .001). After truncating the BASC scores, we observed similar trends (Supplemental Table 2).
When comparing CBCL and BASC scores within individuals, the CBCL almost always resulted in a higher mean score than the BASC, and this was true whether or not the BASC was truncated (Table 3). As expected, truncating the BASC subscale scores led to smaller differences when compared to the CBCL. After truncating BASC subscale scores, there were several subscales that were found to be comparable, including Hyperactivity (BASC) with ADHD Problems (CBCL), Attention Problems (BASC) with ADHD Problems (CBCL), Aggression (BASC) with Aggressive Behavior (CBCL), Anxiety (BASC) with Anxious/Depressed (CBCL), and Depression (BASC) with Anxious/Depressed (CBCL). We also found that the composite scores of Externalizing and BSI/Total Problems were comparable between the two measures (all p > .05). By comparable, we mean that on average, scores obtained on one measure (i.e., Hyperactivity scores on the BASC) were not significantly different from scores obtained on the other measure (i.e., ADHD problems on the CBCL), Z = −0.88, p > .05 (Table 3).
Table 3.
Mean T-Scores and Within-Child Comparisons for Comparable BASC-3 and CBCL subscale and composite scores
BASC-3 Scale—CBCL Scale | BASC-3 | BASC-3 (Truncated) | CBCL | Z Values (BASC-3 vs CBCL) |
Z Values (BASC-3 Truncated vs CBCL) |
---|---|---|---|---|---|
| |||||
Hyperactivity—ADHD Problems | 51.3 | 54.6 | 55.2 | −4.2*** | −.88 |
Hyperactivity—Attention Problems | 51.3 | 54.6 | 57.4 | −5.8*** | −4.0*** |
Attention Problems—Attention Problems | 52.0 | 54.8 | 57.4 | −5.3*** | −3.0** |
Attention Problems—ADHD Problems | 52.0 | 54.8 | 55.2 | −3.2*** | −.34 |
Aggression—Aggressive Behavior | 48.7 | 53.2 | 54.4 | −5.4*** | −1.7 |
Anxiety—Anxious/Depressed | 46.2 | 52.3 | 53.2 | −5.3*** | −1.7 |
Anxiety—Anxiety Problems | 46.2 | 52.3 | 54.2 | −6.0*** | −2.5* |
Depression—Anxious/Depressed | 45.9 | 52.8 | 53.2 | −5.3*** | −.98 |
Depression—Affective Problems | 45.9 | 52.8 | 54.5 | −5.8*** | −3.0** |
Somatization—Somatic Complaints | 47.0 | 52.8 | 54.8 | −5.6*** | −2.5* |
Atypicality—Pervasive Developmental Problems | 49.2 | 52.8 | 56.3 | −6.4*** | −4.5*** |
Withdrawal—Withdrawn | 48.5 | 52.9 | 55.4 | −5.8*** | −3.4*** |
Externalizing—Externalizing | 50.0 | 50.0 | 48.6 | −1.2 | −1.2 |
Internalizing—Internalizing | 45.6 | 45.6 | 48.9 | −2.6** | −2.6** |
Behavioral Symptoms Index—Total Problems | 49.1 | 49.1 | 48.9 | −0.4 | −0.4 |
Note. Z Values were obtained from nonparametric Wilcoxon Signed Ranks Tests. Truncated BASC-3 scores are those in which T-scores < 50 were set to 50, for maximum comparability with CBCL T-scores which have a lower bound of 50.
p ≤ .05;
p ≤ .01;
p ≤ .001
Agreement in Clinical Classifications using BASC and CBCL
Finally, we assessed the consistency of clinical classifications using clinical thresholds described by the test creators (Table 4) and using a uniform threshold (Table 5). Overall, kappa values were low (range = .11 to .70), indicating little or moderate agreement between similarly named scales. Conversely, kappa values ≥ 0.5 (Table 4) indicative of stronger concordance were found for the subscales Hyperactivity (BASC) with ADHD (CBCL; kappa = .51), Hyperactivity (BASC) with Attention (CBCL; kappa = .70), and the composite indexes of Externalizing (kappa = .50), and Behavioral Symptoms Index (BASC) with Total Problems (CBCL; kappa = .51). When the uniform threshold was introduced, the subscales of Depression (BASC) and Affective Problems (CBCL) also demonstrated moderate agreement (kappa = .51). When using the uniform threshold, agreement between Externalizing composites (kappa = .67) and Behavioral Symptoms Index (BASC) with Total Problems (CBCL; kappa = .58) were higher than when using the test creator thresholds and indicated moderate to substantial agreement. It is noteworthy that agreement between the BASC and CBCL Internalizing composites was much lower (kappa = .13) when the uniform threshold was applied, as opposed to when we used the test creator thresholds (kappa = .42).
Table 4.
Consistency of Ratings Above Clinically Significant Threshold as Defined by Test Creators Between Comparable BASC-3 and CBCL subscales and composites
BASC-3—CBCL | Both Scales Above Threshold |
Both Scales Below Threshold |
Only CBCL Above Threshold |
Only BASC-3 Above Threshold |
Overall Agreement |
Kappa Value |
---|---|---|---|---|---|---|
| ||||||
Hyperactivity—ADHD | 10% | 77% | 3% | 11% | 87% | 0.507 |
Hyperactivity—Attention | 15% | 75% | 4% | 6% | 90% | 0.699 |
Attention—Attention | 11% | 67% | 8% | 14% | 78% | 0.362 |
Attention—ADHD | 10% | 73% | 3% | 15% | 83% | 0.424 |
Aggression—Aggressive | 6% | 81% | 7% | 7% | 87% | 0.366 |
Anxiety—Anxious/Depressed | 1% | 84% | 7% | 8% | 85% | 0.072 |
Anxiety—Anxiety | 3% | 84% | 7% | 7% | 87% | 0.210 |
Depression—Anxious/Depressed | 3% | 82% | 6% | 10% | 85% | 0.186 |
Depression—Affective | 4% | 84% | 4% | 8% | 88% | 0.334 |
Somatization—Somatic | 3% | 82% | 8% | 7% | 85% | 0.183 |
Atypicality—PDD | 6% | 81% | 10% | 4% | 87% | 0.371 |
Withdrawal—Withdrawn | 3% | 85% | 8% | 4% | 88% | 0.244 |
Externalizing—Externalizing | 8% | 80% | 7% | 6% | 88% | 0.500 |
Internalizing—Internalizing | 7% | 80% | 10% | 4% | 87% | 0.424 |
BSI—Total Problems | 8% | 80% | 12% | 0% | 88% | 0.514 |
Note. BASC-3 threshold is T ≥ 60 whereas CBCL threshold is T ≥ 65 for subscale scores and T ≥ 64 for composite scores. Total percentages in each row may not add to 100% due to rounding. BSI = Behavioral Symptoms Index.
Table 5.
Consistency of Ratings Above Clinically Significant Threshold (T ≥ 65) Between Comparable BASC-3 and CBCL subscales and composites
BASC-3—CBCL | Both Scales ≥65 |
Both Scales <65 |
Only CBCL ≥65 |
Only BASC-3 ≥65 |
Overall Agreement | Kappa Value |
---|---|---|---|---|---|---|
| ||||||
Hyperactivity—ADHD | 7% | 85% | 6% | 3% | 92% | 0.580 |
Hyperactivity—Attention | 10% | 81% | 10% | 0% | 91% | 0.618 |
Attention—Attention | 7% | 78% | 12% | 3% | 85% | 0.399 |
Attention—ADHD | 6% | 84% | 7% | 4% | 90% | 0.440 |
Aggression—Aggressive | 3% | 85% | 10% | 3% | 88% | 0.251 |
Anxiety—Anxious/Depressed | 1% | 86% | 7% | 6% | 87% | 0.116 |
Anxiety—Anxiety | 3% | 86% | 7% | 4% | 89% | 0.275 |
Depression—Anxious/Depressed | 3% | 88% | 6% | 4% | 91% | 0.312 |
Depression—Affective | 4% | 89% | 4% | 3% | 93% | 0.509 |
Somatization—Somatic | 3% | 85% | 8% | 4% | 88% | 0.244 |
Atypicality—PDD | 4% | 84% | 11% | 1% | 88% | 0.348 |
Withdrawal—Withdrawn | 3% | 85% | 8% | 4% | 88% | 0.244 |
Externalizing—Externalizing | 8% | 85% | 6% | 1% | 93% | 0.668 |
Internalizing—Internalizing | 1% | 86% | 10% | 3% | 87% | 0.130 |
BSI—Total Problems | 6% | 88% | 7% | 0% | 94% | 0.584 |
Note. Total percentages in each row may not add to 100% due to rounding. BSI = Behavioral Symptoms Index.
Sensitivity Analysis
Because not all subscales of the CBCL and BASC were significantly skewed, we examined whether any of our conclusions would change when using parametric statistics (i.e., Pearson correlations and paired samples t-tests instead of Spearman’s rho correlations and Wilcoxon signed rank tests for paired samples). Our findings regarding the magnitude and significance of the discrepancies between the CBCL and BASC were unchanged when data were re-analyzed using parametric statistical tests.
DISCUSSION
The objective of this study was to describe agreement between the BASC-3 PRS-P and CBCL 1.5–5, two commonly used assessments of behavior problems in children. We were specifically interested in testing this question in a sample of children born very preterm, who are at elevated risk of behavior problems in preschool. We found that all comparable subscales on the BASC and CBCL were significantly correlated, albeit to differing magnitudes. Subscales indexing hyperactivity and attention problems were the most comparable across the two measures, evidenced by strong correlations (r’s > .60) and few to no differences in mean T-scores. Composite scores indexing internalizing, externalizing, and total problems were also strongly correlated (r’s > .60), and there were no differences in the mean T-scores for externalizing or total problems across measures. Agreement between clinical classifications were weak to moderate regardless of the threshold used, though again, the highest agreement was found for hyperactivity, attention, externalizing and total problems (kappa ≥ .58).
Our study is the first to examine concordance between the BASC and CBCL in a very preterm sample and our findings are similar to those reported in prior studies conducted with different samples. In a study of clinically referred preschool children, correlations between comparable subscales were also found to be statistically significant, though with varying magnitudes.8 Notably, those authors also reported attention and hyperactivity scales to be among the most highly correlated subscales, and similarly reported high agreement for composite scores, particularly total problems. Interestingly, this prior study noted that mean T-scores were not comparable for either the externalizing or internalizing composites, even after truncating BASC T-scores to be more similar to the CBCL. Another prior study done with typically developing preschool children found that only the mean T-scores for attention and hyperactivity subscales, and for the internalizing composite score, were comparable.9 Therefore, while there appears to be consensus that there is high agreement for attention and hyperactivity subscales on the BASC and CBCL, there is less consensus over which composite scores show the greatest similarity across the two measures. It is notable that the prior studies comparing BASC and CBCL used an earlier version of the BASC (the BASC-2), whereas the present study used the more recent BASC-3, a difference that may explain our disparate findings. Additionally, it is hard to compare findings across samples of children with varying characteristics and varied baseline prevalence of behavior problems (e.g., children born preterm versus children who were clinically referred versus typically-developing children).
As opposed to generally high consistency across subscales indexing attention problems, we found that subscales relating to depression, anxiety, and somatization were the most discordant across the BASC and CBCL. Similar results have been noted previously.8 These differences may be a result of the specific content assessed by the two measures. That is, although the subscales are similarly named, they include different items that assess different child emotions and behaviors. For example, the CBCL Anxious/Depressed scale includes items such as “nervous” and “fearful” while the BASC Anxiety scale assesses more specific scenarios such as “worries about making mistakes” and “worries about things that cannot be changed”. It is possible that the more general wording of the CBCL items could prompt higher scores than the more specific scenarios assessed by the BASC. The combination of anxiety and depression into a single scale on the CBCL could also lead to poor correspondence compared to BASC scales assessing depression and anxiety separately. Moreover, many of the subscales with the weakest correlations across measures were those related to internalizing behaviors, which refer to problems internal to the self that may be more subtle and nuanced, and thus more difficult for parents to report on compared to the disruptive behaviors included in the attention, hyperactivity, and externalizing scores. In fact, externalizing problems are defined as behavior problems that are directed toward the external environment (e.g., other people) and thus by definition are more visible to outside observers.
This study adds to the literature suggesting that the BASC and CBCL are consistent measures of certain subscales (e.g., those related to attention and hyperactivity) and possibly of overall behavior problems, but that the two measures lack consistency on other scales, particularly those assessing internalizing behaviors. Without an external criterion (i.e., clinical diagnosis), we cannot currently determine whether the BASC or CBCL provides a more accurate result. Instead, our study and others show that there are differences between certain BASC and CBCL scales, and that these differences are apparent among multiple unique populations.8,9 It is also possible that different measures may be more accurate for screening children at different levels of the behavior problem spectrum (i.e., those with subclinical symptoms versus those with clinically significant symptoms).
From a research perspective, our results suggest that it will be hard to draw conclusions about the behavior problems of children born very preterm (e.g., prevalence, predictors of, developmental trajectories) based on studies that use different parent-report measures. Further work should explore whether there are differences in how well the BASC and CBCL predict clinical diagnoses in this population. Harmonization procedures could also be explored in an effort to better equate scores derived from each measure. Until then, caution should be used in interpreting the results of studies using different screening measures, as our results and others show that use of the CBCL may lead to higher estimates of behavior problems in preterm children compared to the BASC.8
From a clinical perspective, our findings suggest clinicians should use caution and employ clinical judgement when interpreting children’s scores on different parent-report measures, and that they should avoid making decisions solely based on one parent-report measure. Differences in time constraints and provider preferences could play a role in which parent-report measure(s) are used. Depending on the goal of the clinical assessment, one measure may be preferred over another. For example, for follow-up clinics where the goal is to identify children in need of further screening and/or services, a measure such as the CBCL which tends to result in higher scores than the BASC could be preferable, as it will result in a larger pool of children who are identified as potentially needing further screening.
Our population of very preterm infants provides insight into a less studied group who are at higher risk of behavior problems compared to children born at term, and it is notable that our results were similar to those found with a population of clinically referred children. However, our use of this specific sample potentially limits the generalizability of our findings. Our small sample size and demographic differences in our sample as compared to the overall NOVI sample could further limit generalizability. As pointed out previously8, both parent and child language skills could contribute to parent ratings of child behaviors (e.g., via parental comprehension of rating scale items or applicability of items depending on child verbal ability), yet we did not account for these differences in this analysis. This is a particularly relevant point given that our sample of very preterm children are at higher risk of language delays compared to term-born peers.10 Future research is needed to confirm our findings using a larger sample with greater demographic diversity. Future research should also determine whether the use of different rating scales leads to differing conclusions about prevalence, predictors, or sequelae associated with behavior problems in preschool samples, including in preschoolers born preterm.
In conclusion, we found that overall agreement between BASC and CBCL subscales was weak to moderate, with the exception of subscales related to attention and hyperactivity, as well as composite scores indicating overall behavior problems that were at least moderately consistent. Researchers and clinicians should keep these discrepancies in mind and use caution when interpreting results of any single behavior rating scale with preschool children, as conclusions could differ based on the assessment that is used.
Supplementary Material
Funding Statement:
Research reported in this publication was funded by the National Institutes of Health Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) under award number R01HD072267 and the Office of the Director under award numbers UG3OD023347 and UH3OD023347. The lead author was additionally supported by a career development award from the National Institute of Mental Health (NIMH), grant K01MH129510. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Footnotes
Conflicts of Interest Statement: The authors declare no conflicts of interest.
REFERENCES
- 1.Reynolds C, Kamphaus R. Behavior Assessment System for Children, Third Edition. Circle Pines, MN: Pearson Assessment; 2015. [Google Scholar]
- 2.Achenbach T, Rescorla L. Manual for the ASEBA Preschool Forms & Profiles. Burlington, VT: Achenbach System of Empirically Based Assessment (ASEBA); 2000. [Google Scholar]
- 3.Delobel-Ayoub M, Kaminski M, Marret S, et al. Behavioral outcome at 3 years of age in very preterm infants: The EPIPAGE study. Pediatrics. 2006;117(6):1996–2005. doi: 10.1542/peds.2005-2310 [DOI] [PubMed] [Google Scholar]
- 4.McGowan EC, Hofheimer JA, O’Shea TM, et al. Sociodemographic and medical influences on neurobehavioral patterns in preterm infants: A multi-center study. Early Human Development. 2020;142:104954. doi: 10.1016/j.earlhumdev.2020.104954 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Sato J, Vandewouw MM, Safar K, et al. Social-cognitive network connectivity in preterm children and relations with early nutrition and developmental outcomes. Front Syst Neurosci. 2022;16:812111. doi: 10.3389/fnsys.2022.812111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Frazier JA, Wood ME, Ware J, et al. Antecedents of the Child Behavior Checklist–dysregulation profile in children born extremely preterm. J Am Acad Child Adolesc Psychiatry. 2015;54(10):816–823. doi: 10.1016/j.jaac.2015.07.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–174. [PubMed] [Google Scholar]
- 8.Myers CL, Bour JL, Sidebottom KJ, et al. Same constructs, different results: Examining the consistency of two behavior-rating scales with referred preschoolers. Psychol Sch. 2010;47(3):205–216. doi: 10.1002/pits.20465 [DOI] [Google Scholar]
- 9.England-Mason G, Martin JW, MacDonald A, et al. Similar names, different results: Consistency of the associations between prenatal exposure to phthalates and parent-ratings of behavior problems in preschool children. Environ Int. 2020;142:105892. doi: 10.1016/j.envint.2020.105892 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Van Noort-van Der Spek IL, Franken MCJP, Weisglas-Kuperus N. Language functions in preterm-born children: A systematic review and meta-analysis. Pediatrics. 2012;129(4):745–754. doi: 10.1542/peds.2011-1728 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.