Taking measurement-based care to school: Evaluating teacher-report versions of the Behavior and Feelings Survey and the Top Problems Assessment

Spencer C Evans; Ashley R Karlovich; Katherine A Corteselli; Amanda Jensen-Doss; John R Weisz

doi:10.1037/pas0001447

. Author manuscript; available in PMC: 2026 Mar 11.

Published before final editing as: Psychol Assess. 2026 Mar 9:10.1037/pas0001447. doi: 10.1037/pas0001447

Taking measurement-based care to school: Evaluating teacher-report versions of the Behavior and Feelings Survey and the Top Problems Assessment

Spencer C Evans ¹, Ashley R Karlovich ¹, Katherine A Corteselli ², Amanda Jensen-Doss ¹, John R Weisz ²

PMCID: PMC12974236 NIHMSID: NIHMS2141814 PMID: 41801753

Abstract

There is a need for free, brief, psychometrically sound measures to support youth measurement-based care across settings. Two instruments, the standardized Behavior and Feelings Survey (BFS) and the idiographic Top Problems Assessment (TPA), have shown evidence of validity, reliability, and sensitivity to change by parent- and youth-report. Extending this measurement suite, we examined the psychometric properties of new teacher-report versions of the BFS and TPA. Participants were 110 youths (ages 7–13, 41.6% female) referred for school-based therapy for internalizing and/or externalizing problems. Multi-informant BFS and TPA measures were administered at pre- and post-treatment and repeatedly throughout treatment. Analyses evaluated the teacher-reported BFS-T and TPA-T across informants (relative to the parallel parent and youth versions), within-informant (relative to the Teacher Report Form), and longitudinally throughout treatment. Results showed that the BFS-T scale scores (internalizing, externalizing, and total) had good internal consistency and distributions comparable to parent/youth data. Factor analyses supported the BFS’s original correlated two-factor structure with minor informant-specific modifications that do not impact scoring or interpretation. The BFS-T and TPA-T showed evidence of convergent and discriminant validity with other multi-informant measures. Longitudinal multilevel models documented sensitivity to change via negative loglinear slopes (ps < .005). Mean teacher-report trajectories resembled parent- and youth-report trajectories, but cross-informant correlations were mostly not significant, suggesting incremental utility of teacher-report. The BFS and TPA are free, brief, incrementally useful, and psychometrically sound tools for measurement-based care in youth therapy—now including teacher-report forms to complement the original parent- and youth-report forms.

Keywords: Measurement, evidence-based assessment, psychometrics, idiographic, school mental health, multi-informant, measurement-based care, youth psychotherapy, internalizing and externalizing, child and adolescent mental health

Introduction

School is integral in the lives of children and adolescents (hereafter youths). School fosters youths’ social-emotional development alongside their academic learning (National Academies, 2018) and occupies a large portion of their waking hours (Hall and Nielson, 2020). It is therefore not surprising that school is also where youths’ emotional and behavioral difficulties often emerge and get noticed, which can lead to early identification and treatment (Bradshaw et al., 2008). Although most youths with clinical needs do not receive care (Bitsko et al., 2022), those who do are often treated in the school context (Farmer et al., 2003; Green et al., 2013). One recent meta-analysis of service patterns found that 22.1% of youth with elevated symptoms or diagnoses were served by school-based mental health services, surpassing outpatient care (20.6%) or any other sector (Duong et al., 2021). Even when services are provided out in the community, the referrals often originate from the school (Yeh et al., 2002). Regardless of where the care occurs, school is an important domain for measuring youths’ clinical outcomes, including social, behavioral, emotional, and academic functioning. For all these reasons, assessment of the school context is essential for youth mental health. The current study evaluates new teacher-report versions of two established parent-/youth-report measures: The Behavior and Feelings Survey (BFS) and Top Problems Assessment (TPA).

Best practices and decades of evidence point to the importance of multi-informant assessment in youth mental health (De Los Reyes et al., 2015). These informants may include youths, parents, clinicians, and teachers. Because parents and clinicians typically cannot be present observers throughout a child’s day at school, the perspective of school professionals is critical. In particular, teachers have a unique and valuable perspective on their students’ daily functioning, which can be used to support treatment in many ways (e.g., assessing youth functioning, identifying goals, monitoring response) for school- and community-based services alike. Unfortunately, it is common for pediatric mental health providers to omit teacher-report rating scales from their initial assessment (Wolraich et al., 2010) and even more so for progress-monitoring (Epstein et al., 2014). Clinicians may perceive multi-informant measures to be infeasible or burdensome (Burns & Rapee, 2022; Corteselli et al., 2020), despite best practice recommendations that they be collected (De Los Reyes et al., 2015).

The practice of collecting repeated assessments throughout treatment (in addition to at the beginning and end) is referred to here as progress monitoring, and it is a central component of both measurement-based care (MBC) and evidence-based practice in psychology more broadly (American Psychological Association [APA] Task Force on Professional Practice Guidelines on MBC, 2025). This Task Force describes MBC as a best-practice clinical process that involves using progress-monitoring data to inform treatment planning and enhance communication, which is thought to lead to better clinical outcomes. Unfortunately, clinicians who routinely engage in MBC generally remain the exception rather than the rule (Keepers et al., 2023). Meta-analyses show that those who do implement routine progress-monitoring assessments often use youth, parent, and clinician-rated instruments, but not than teacher-report (Jensen-Doss et al., 2018; Lewis et al., 2019; Tam & Ronan, 2017). Barriers to MBC in school-based care include cost, difficulty accessing measures, and limited time to learn, administer, and interpret; but school-based clinicians also see the benefits for tracking progress and communicating with other professionals (Connors et al., 2015; Lyon et al., 2016). Indeed, when teacher-report is incorporated into MBC and feedback systems, evidence suggests students experience improvements in psychological distress (e.g., Cooper et al., 2013). Further, students report that frequent idiographic assessments resulted in greater problem-solving, increased self-awareness, and helped them to reach their behavioral goals (Duong et al., 2016).

One important factor restricting the uptake of MBC involving schools is the limited availability of teacher-report measures—particularly free, brief rating scales that can be given repeatedly to guide treatment. Many teacher-report measures are administered in the context of screening, baseline, or diagnostic assessments. Examples include the Teacher Report Form (TRF; Achenbach & Rescorla, 2001), Behavior Assessment System for Children (BASC; Reynolds et al., 2015), Conners Teacher Rating Scale (CTRS; Conners, 2022), Social Skills Improvement System Rating Scales (Elliott et al., 1988; Gresham & Elliott, 2008), Vanderbilt ADHD Diagnostic Teacher Rating Scale (Wolraich et al., 1998), and Strengths and Difficulties Questionnaire (SDQ; Goodman, 1997). However, far fewer are freely available and suitable for repeated teacher-report in progress-monitoring or MBC.

What is Needed?

Several considerations highlight specifically what kinds of measures and evidence are needed to advance MBC. First, sensitivity to change is critical. Most teacher-report measures were designed to be administered at a single time-point or repeatedly over long intervals (e.g., annually, pre- to post-treatment) and therefore are not optimally sensitive to change. This is evident in that the written instructions often ask the teacher to rate the child’s behavior over a long period of time (e.g., the last 6 months). Second, brief and free measures are needed. Extant teacher measures can be lengthy (e.g., 100+ items), requiring valuable time and attention from a teacher’s busy workday. Relatedly, some of the strongest measures are copyrighted and available commercially, requiring financial investment as a precondition for their use, thus limiting or restricting access by low-resource providers, schools, and clinics.

Third, idiographic and nomothetic perspectives are both important. The standard practice with most teacher-report measures, and most psychological measures in general, is nomothetic— respondents rate the presence and/or severity of a standard list of items representing behavioral or emotional problems (e.g., anxiety, depression, or ADHD symptoms). In contrast, idiographic measures tap constructs that are unique to a person and measured within each person (e.g., fights with sister, feels lonely, forgets math homework). Guidelines for MBC in youth psychotherapy recommend a combination of both idiographic and nomothetic measures, emphasizing the importance of multi-informant perspectives (Connors et al., 2025). Both approaches have shown evidence of incremental validity and sensitivity to change (Weisz et al., 2011, 2020b).

Fourth, multi-informant measures are needed. Many of the above-mentioned standard measures (e.g., ASEBA, BASC, Conners) have an important strength: They are part of multi-informant systems that include teacher report along with parent and youth perspectives. While other multi-informant MBC tools are available (e.g., Youth Outcome Questionnaire, PROMIS, Peabody Treatment Progress Battery), most offer only two informant perspectives (usually parent and youth), which often do not align (De Los Reyes et al., 2015; De Los Reyes & Epkins, 2023; Hawley & Weisz, 2003; Yeh & Weisz, 2001). Thus, there is a need for progress-monitoring tools that measure parent, youth, and teacher perspectives in parallel fashion.

Finally, beyond what kinds of measures are needed, there remains an important substantive question: Does adding more informants add value to MBC? Across many other assessment contexts (e.g., screening, intake), evidence shows that informant discrepancies are common, that valid conclusions can be drawn from disagreement, and that three perspectives are better than two (De Los Reyes, 2024). However, less is known about the extent and nature of multi-informant discrepancies in the context of MBC and frequently repeated measures. The present study represents a key first step in this direction, which could clarify how discrepant perspectives on MBC may produce valid conclusions about youth treatment response.

BFS, TPA, and Current Study

The present study addresses these issues by extending two existing parent- and youth-report progress monitoring measures—the TPA and BFS—into teacher-report versions, which could inform MBC in school-based and community-based services. Given that the BFS and TPA are complementary measures, the teacher-report versions are studied together in this study.

The TPA was originally developed by Weisz et al. (2011) to complement nomothetic measures of internalizing and externalizing problems. In separate interviews, parents and youth are asked to identify their top three problems to target for treatment. Their answers are recorded by the examiner and turned into a three-item personalized scale where they rate the current severity of those three problems. This allows for the problems of utmost importance to the family—which could be missed by nomothetic scales—to be assessed repeatedly (e.g., weekly, every session) for MBC and clinical outcomes assessment. Initial TPA research supported its validity, incremental utility, and test-retest reliability (Weisz et al., 2011). The TPA has since been used to document trajectories of change in clinical trials in the U.S. (e.g., Chorpita et al., 2017; Weisz et al., 2012, 2020a) and internationally (e.g., Michelson et al., 2020).

The BFS, like the TPA, was designed to be free and brief, for repeated administration throughout youth mental health treatment. Complementing the TPA, the BFS is a nomothetic measure consisting of internalizing (6 items), externalizing (6 items) and total (12 items) problem scales. Developed by distilling parent and child reports to items via factor analysis and multi-sample validation, the BFS overcomes the limitations of other measures that are longer (thus burdensome and infeasible for repeated administration), commercial, not sensitive to change, or not capturing youth problem domains broadly and succinctly. In a series of studies, Weisz et al. (2020b) tested the BFS parent and youth report, establishing its psychometric structure, validity, consistency, and sensitivity to change in youth outpatient treatment.

Both the BFS and TPA have advanced youth treatment from multi-informant MBC perspective. However, they (like other similar instruments) have thus far left out the critical perspective of teachers. In addition to multi-informant considerations, it has been recommended that MBC requires the integration of idiographic and nomothetic data (Ashworth et al., 2019; Green, 2016; Sales et al., 2012). The present study extends the BFS and TPA to include teacher-report forms (BFS-T and TPA-T), complementing the parent/youth versions (BFS-P/Y and TPA-P/Y). Data were collected in school-based mental health services, where teachers were naturally key partners and stakeholders in their students’ treatment. The BFS and TPA were administered at baseline and repeatedly throughout treatment. The resulting data are well-suited to support this initial psychometric evaluation of the teacher-report BFS and TPA oriented toward supporting MBC via multiple informants (teacher, parent, youth) and methods (idiographic, nomothetic).

Methods

Participants and Procedures

Data collection occurred within a larger effectiveness trial evaluating the effects of Modular Approach to Therapy for Children (MATCH) in school-based clinical settings. The study design (Harmon et al., 2021) and results (Weisz et al., under review) have been detailed elsewhere and are summarized here. Youths were referred for school-based mental health services and had elevated internalizing or externalizing symptoms by parent or youth-report. Informed consent was collected from all parents, teachers, and counselors, and assent from all youths. This study was approved by the authors’ institutional review board and carried out in partnership with participating schools, districts, and agencies. Initial results showed clinically and statistically significant improvement on all symptom measures, with no differences between the two treatment conditions (Weisz et al., under review).

The present analysis includes data only from cases for whom teacher-report data were available (N = 110, or 77.5% of the full trial sample). Youth participants were 41.6% female, 7–13 years old (M age = 9.73, SD = 1.76), and in grades 1–7 (M grade = 4.29, SD = 1.78). The sample represents 24 schools in five socioeconomically diverse school districts in a large metropolitan area in the Northeastern U.S. Youths’ racial/ethnic backgrounds were as follows: 56.4% White, 10.9% Black, 5.5% Hispanic, 5.5% Asian, 18.2% multi-racial/ethnic, and 3.6% other. Treatment consisted of MATCH (n = 66) or Usual Care (n = 44), which varied in length (2–36 weeks, M = 17.90, SD = 8.86) and was delivered by 51 counselors already embedded in the school context. Trained bachelor’s-level research assistants administered measures to parents, youths, and teachers. All three informants completed the BFS and TPA (as described below) at baseline, and then repeatedly throughout treatment. Parents and youths completed these surveys weekly, while teachers completed them monthly.

Transparency and Openness

The multi-informant BFS and TPA measures are included in online supplemental materials. Additional information and assessment manuals can be found at https://weiszlab.fas.harvard.edu/measures. The verbatim top problems reported by all teachers, youths, and parents in this study are provided in the supplemental materials. Analytic code and deidentified analytic data are available on reasonable request to the corresponding author. The present analyses were not pre-registered. The larger effectiveness trial was pre-registered (https://clinicaltrials.gov/study/NCT02877875) and outlined in a protocol paper (Harmon et al., 2021). The current manuscript was posted as a preprint at the time of submission (Evans et al., 2025), using a masked view-only link for peer review.

Measures

Behavior and Feelings Survey (BFS; Weisz et al., 2020).

As described above, the BFS is a 12-item rating scale in which the respondent is asked to rate youth internalizing and externalizing problems on a 5-point scale (0 = Not a problem, 4 = A very big problem). Items are summed to produce internalizing, externalizing, and total scores, with higher scores indicating greater symptom severity. Across a series of studies, the BFS has shown evidence of internal consistency, test-retest reliability, validity correspondence with established measures, and a replicated factor structure among outpatient clinical youth samples, both caregiver and youth reports (Rognstad et al., 2022; Weisz et al., 2020b). The BFS is routinely used in conjunction with the TPA (described next) in MBC. Both can be found in the supplemental materials.

Teacher Report Form (TRF; Achenbach & Rescorla, 2001).

The TRF is a 113-item rating scale in which teachers are asked to rate youth behaviors on a 3-point scale (0 = Not True; 2 = Very True or Often True). It has broadband scales for internalizing, externalizing, and total problems, as well as narrowband scales for anxious/ depressed, withdrawn/depressed, somatic complaints, social problems, thought problems, attention problems, rule breaking, and aggressive behavior. The TRF has extensive evidence for its norms, factor structure, reliability, and validity in clinical and non-clinical samples (Achenbach, 2019; Achenbach & Rescorla, 2001).

Analytic Plan

Analyses were designed to evaluate the essential psychometric properties of the BFS-T and TPA-T for MBC following an approach similar to the original BFS and TPA studies (Weisz et al., 2011, 2020b). Given their complementary (nomothetic and idiographic) nature, analyses were carried out in parallel for the TPA-T and BFS-T, allowing us to document their properties relative to one another and to their P/Y counterparts. We used SPSS Version 29 for descriptive statistics and correlations, Mplus Version 8 for factor analyses, SAS Version 9.4 for longitudinal models, and Excel for data visualization and calculations. Across informants, data were available for 99–100% of cases (Ns = 109–110; one case was missing youth data; parent and teacher data were complete). Ratings came from 110 unique parents, 109 unique youths, and 95 unique teachers, of whom 13 teachers rated more than one youth in the sample. There were few instances (25.5% of youths, 13.7% of teachers) where multiple youths were nested¹ within the same informant; accordingly, we treated case-level data as independent observations in analyses.

Descriptive, Reliability, and Factor Analyses.

First, we examined univariate distributions and scale characteristics of the BFS and TPA across informants. Alpha and omega coefficients were used to assess internal consistency for the BFS scales.² Alpha is conventional, while Omega does not make the (often untenable) assumption of tau-equivalence (Dunn et al., 2014). Next, to evaluate the new informant against the established BFS factor structure, we conducted exploratory and confirmatory factor analyses (EFA/CFA) with the 12-item BFS data per all informants. EFA provided a general picture of all BFS factor loadings across informants, while CFA showed how well the correlated two-factor model fit the data. Factor models were evaluated partly in relation to the established measures, as parallel item content and scoring is critical for multi-informant measures (De Los Reyes et al., 2013). Beyond that, we followed recommendations (Kline, 2023) for evaluating model fit wholistically based standardized factor loadings and model fit statistics with thresholds for acceptable/good fit at CFI/TLI ≥ 0.90/0.95 and RSMEA (and 90% CIs) ≤ 0.08/0.05. Models were estimated using robust maximum likelihood (MLR), appropriate for CFA with 5-point scale data (Rhemtulla et al., 2012) and moderate departures from normality (Muthén & Muthén, 2017). Modification indices >10 were inspected, and changes were made when justified theoretically and empirically.

Validity and Multi-Informant Correlations.

Convergent and discriminant validity were examined using Pearson’s r correlations. Cross-informant correlations were handled similarly. For discriminant validity, we used Fisher’s r-to-z transformation and Steiger’s (1980) formulas to test whether BFS-T Internalizing Problems would correlate less strongly with BFS-T Externalizing Problems than with other measures of internalizing-related problems, and vice-versa for BFS-T Externalizing Problems. Discriminant validity was assessed in terms of low or nonsignificant correlations with scales designed to measure constructs that differed from the BFS and TPA scales (e.g., TRF Social Problems and Thought Problems).

Longitudinal Multilevel Models (MLMs).

We modeled teacher-reported BFS-T and TPA-T scale scores (and corresponding parent and youth data) as repeated measures during treatment. These data³ included 592 monthly teacher ratings (M [SD] = 5.38 [2.10] per youth), 2,005 weekly parent ratings (18.23 [8.49] per youth), and 1,886 weekly youth ratings (17.15 [8.21] per youth). Longitudinal outcomes were modeled as log-linear trajectories.⁴ These MLMs were decomposed into a model for the means (i.e., fixed intercepts and log-linear slopes describing average trajectories over time) and a model for the variance (i.e., random effects for slopes, intercepts, and residuals; Hoffman, 2015). Higher intercepts represent greater severity at baseline, and more negative slope values indicate faster clinical improvement. A significant negative slope of time constitutes evidence of sensitivity to change (Youngstrom et al., 2017).

Lastly, parallel-process multivariate models were estimated to examine cross-informant patterns of change over time on corresponding scales (e.g., BFS-T Total Problems with BFS-P Total Problems). These analyses were not meant to evaluate whether a particular psychometric criterion was satisfied, but rather to offer key insights regarding the distinct information contributed by each informant. Specifically, we re-estimated the above MLMs using a double-stacked long format appropriate for unbalanced and incomplete longitudinal data, allowing us to examine correlations between MLM growth trajectories on two different outcome variables (Hoffman, 2015). Two results were of interest: intercept-intercept correlations, indicating the degree to which informants agreed about severity at baseline; and (b) slope-slope correlations, indicating the degree to which informants agreed about the direction and magnitude of change over time. Additional sensitivity analyses are described in results.

Results

Descriptive Characteristics and Internal Consistency

As shown in Table 1, univariate and scale characteristics of the BFS-T and TPA-T largely resembled those of the parent- and youth-report versions. Teacher BFS scales (internalizing, externalizing, and total) showed good to excellent internal consistency (α/ω = 0.80–0.92), comparable to the BFS internal consistencies observed for parent (α/ω = 0.76–0.91) and youth (α/ω = 0.77–0.90) reports. The data approximated a normal distribution for all four scales and per all three informants (skewness = 0.07 to 1.46; kurtosis = −1.00 to 1.84). As anticipated, BFS scales showed a slight right skew (common for mental health symptoms), whereas the TPA showed a slight left skew (common for idiographic tools, which pull for high-severity problems at baseline). Observed scores occupied most of the possible score ranges on all measures and were similar across informants. Teacher BFS ratings trended slightly lower than parent ratings and slightly higher than youth ratings. Teacher TPA ratings were comparable to youth and slightly lower than parent ratings. Overall, these features suggest broad similarities and expected or desirable properties of the data across scales and informants.

Table 1.

Univariate and scale characteristics

	M	SD	Min	Max	Mdn	Skew	Kurt	α	ω	N
Teacher Report
BFS-T Internalizing	6.83	5.04	0	22	6	0.77	0.14	.82	.80	110
BFS-T Externalizing	6.82	6.18	0	24	5.5	0.96	0.28	.92	.92	110
BFS-T Total	13.64	8.85	0	42	13	0.71	0.43	.85	.84	110
TPA-T Severity	2.89	0.65	1	4	3	−0.27	−0.29	-	-	110
Parent Report
BFS-P Internalizing	8.45	5.47	0	21	8	0.31	−0.83	.81	.76	110
BFS-P Externalizing	10.96	6.77	0	24	11	0.07	−1.00	.91	.91	110
BFS-P Total	19.41	9.24	0	41	19	0.22	−0.37	.82	.80	110
TPA-P Severity	3.26	0.59	1.33	4	3.33	−0.62	0.01	-	-	110
Youth Report
BFS-Y Internalizing	6.61	5.12	0	23	6	0.80	0.35	.77	.78	109
BFS-Y Externalizing	5.15	5.76	0	24	3	1.46	1.84	.90	.90	109
BFS-Y Total	11.75	8.66	0	37	10	0.78	0.04	.84	.81	109
TPA-Y Severity	2.80	0.28	1	4	3	−0.30	−0.78	-	-	109

Open in a new tab

One unique result from the TPA is the idiographic character of the items. Teachers often identified top problems like those reported by parents and youths in previous studies, including irritability, depressed mood, anxiety, attention problems, disruptive behavior, and adjustment problems (Evans et al., 2023; Weisz et al., 2011). At the same time, teachers also identified many problems that were unique to the school or classroom setting. Example teacher-reported top problems include: “He shuts down if a task is too difficult or is not interesting to him;” “She prioritizes social groups over academics;” “He blurts things out in class and talks with peers at inappropriate times;” and “She is easily distracted, especially during difficult academic tasks.” See supplemental materials for all top problems reported by teachers, parents, and youths.

Factor Structure

As shown in Table 2, EFAs of BFS data from all three informants largely replicated the original correlated two-factor structure, where items 1–6 are internalizing problems and 7–12 are externalizing. These results were particularly consistent for the externalizing factor, which across all informants showed loadings that were consistently high, positive, and significant (loadings = 0.64 to 0.87; cross-loadings to internalizing all ≤ |0.10|). For the internalizing items, however, a few signs of informant-specific deviations emerged. For parent report, internalizing loadings showed inconsistences that aligned with item content: Items 1–3, pertaining to depression, loaded strongly onto the internalizing factor (loadings = 0.66–0.90), whereas the loadings for items 4–6, pertaining to anxiety, were weaker (loadings = 0.31–0.44). For teacher- and youth-report, these internalizing loadings were more consistent (0.46–0.77). There were also small cross-loadings (|0.24| to |0.40|) suggesting that the externalizing factor was positively correlated with items 1–3 (depression content) and negatively correlated with item 4 (anxiety content).

Table 2.

Exploratory factor analysis results for two-factor BFS model across all informants

		Teacher Report		Parent Report		Youth Report
Item	Content	Factor 1	Factor 2	Factor 1	Factor 2	Factor 1	Factor 2
BFS01	Feel sad	.51^***	.31 ^*	.88^***	.10	.65^***	−.25 ^*
BFS02	Feel bad about self	.46^**	.40 ^**	.66^***	.01	.54^***	.03
BFS03	Down or depressed	.51^***	.31 ⁺	.90^***	−.05	.53^**	.02
BFS04	Nervous or afraid	.77^***	−.25	.31 ^**	−.01	.78^***	−.24 ⁺
BFS05	Worry bad things	.77^***	−.05	.44^***	−.04	.65^***	.01
BFS06	Sad/scary thoughts	.76^***	.03	.37 ^***	.00	.61^**	.06
BFS07	Talks back	−.10	.85^***	.05	.86^***	−.03	.73^***
BFS08	Refuse to do what told	.02	.85^***	.00	.87^***	.10	.64^***
BFS09	Do things not supposed to	−.04	.73^***	−.02	.71^***	.09	.79^***
BFS10	Rude or disrespectful	−.00	.87^***	−.05	.85^***	−.01	.83^***
BFS11	Argue with people	.00	.83^***	.04	.83^***	−.02	.78^***
BFS12	Break rules	.07	.76^***	−.05	.66^***	.05	.86^***

Open in a new tab

Note. By convention and irrespective of sample size, loadings with absolute values > 0.4 (shown here in black) are considered stable, those < 0.3 (in light gray) are suppressible, and those between 0.3 and 0.4 (in dark gray) are potentially adequate (Guadagnoli and Velicer, 1988). Analyses used an oblique geomin rotation, which is appropriate for when factors may be correlated and when cross-loadings may be present (Muthén and Muthén, 2017).

⁺

p ≤ .10,

p < .05,

^**

p < .01,

^***

p < .001

Similarly, the correlated two-factor BFS CFA models showed poor/marginal initial fit.⁵ Inspection of parameter estimates and modification indices led to 2–3 changes per model, which varied by informant. To summarize (see bottom of Table 3 for details), three types of changes were made: (a) allowing residual variance to correlate for conceptually similar items (e.g., item 9, “Doing things he/she is not supposed to do,” with item 12, “Breaking rules at home or at school”) in all models; (b) freeing a negative cross-loading of item 4 (“Feeling nervous or afraid”) with the externalizing factor in the teacher model; and (c) adding sub-factors for depression and anxiety beneath a superordinate internalizing factor in the parent and teacher models.⁶ Results (see Table 3) showed good fit (RMSEAs < 0.08; CFIs > 0.95; TLIs > 0.93) for these final modified versions of the original two-factor structure, shedding light on relations among items in this sample without requiring changes to overall scoring or interpretation.

Table 3.

Confirmatory factor analysis results for final modified two-factor BFS models

	Teacher Report		Parent Report		Youth Report
	Estimate	Factor	Estimate	Factor	Estimate	Factor
Loadings
01. Feel sad	.88^***	Depression	.91^***	Depression	.45^***	Internalizing
02. Feel bad about self	.69^***	Depression	.65^***	Depression	.57^***	Internalizing
03. Down or depressed	.85^***	Depression	.88^***	Depression	.41^***	Internalizing
04. Nervous or afraid	.81^***	Anxiety	.65^***	Anxiety	.62^***	Internalizing
05. Worry bad things	.83^***	Anxiety	.94^***	Anxiety	.72^***	Internalizing
06. Sad/scary thoughts	.80^***	Anxiety	.72^***	Anxiety	.72^***	Internalizing
07. Talks back	.87^***	Externalizing	.80^***	Externalizing	.75^***	Externalizing
08. Refuse to do what told	.82^***	Externalizing	.80^***	Externalizing	.70^***	Externalizing
09. Do things not supposed to	.68^***	Externalizing	.68^***	Externalizing	.76^***	Externalizing
10. Rude or disrespectful	.89^***	Externalizing	.88^***	Externalizing	.84^***	Externalizing
11. Argue with people	.84^***	Externalizing	.88^***	Externalizing	.81^***	Externalizing
12. Break rules	.74^***	Externalizing	.64^***	Externalizing	.83^***	Externalizing
BFS04 cross-loading^a	−.30^***	Externalizing	-	-	-	-
Depression latent factor	.81^***	Internalizing	.61^***	Internalizing	-	-
Anxiety latent factor	.71^***	Internalizing	.71^***	Internalizing	-	-
Correlations
INT with EXT	.39^*		.16		.32^**
BFS01 with BFS03^b	-		-		.37^**
BFS07 with BFS08^c	-		.52^***		-
BFS09 with BFS12^d	.48^***		.39^**		.49^***
Model Fit
χ² (df)	82.42 (50) ^**		53.36 (50)		63.80 (51)
MLR Correction	1.097		1.080		1.223
RMSEA (90%CI)	0.077 (0.045, 0.106)		0.025 (0.000, 0.068)		0.048 (0.000, 0.082)
CFI	0.951		0.995		0.970
TLI	0.935		0.993		0.961

Open in a new tab

Note. Int = internalizing problems. Ext = externalizing problems. Depression and anxiety latent factors both load onto a super-ordinate internalizing factor (parent and teacher report models only).

BFS04 cross-loading: Externalizing Factor negatively cross-loaded on BFS04. “Feeling nervous or afraid.”

BFS01 with BFS03: “Feeling sad” with “Feeling down or depressed.”

BFS07 with BFS08: “Talking back or arguing with parents or other adults” with “Refusing to do what adults tell him/her to do.”

BFS09 with BFS12: “Doing things he/she is not supposed to do” with “Breaking rules at home or at school.”

⁺

p ≤ .10,

p < .05,

^**

p < .01,

^***

p < .001

Validity Correlations Across Measures and Informants

Table 4 presents correlation coefficients across all BFS/TPA scales and all three informants. As shown, BFS-T Internalizing Problems were marginally associated with youth-reported BFS Internalizing Problems (r = .16), but not by parent-report (r = .08). Teacher-rated Externalizing Problems were positively correlated with the corresponding scales by parent-report (r = .34) and marginally by youth-report (r = .18). Regarding BFS Total Problems, significant cross-informant correlations emerged between teachers and parents (r = .20) but not between teachers and youths (r = .06). Additionally, teacher-reported TPA severity was positively but non-significantly associated with parent- and youth-report (rs = .09 and .15, respectively). Taken together, cross-informant agreement was modest and similar across all informant pairs. For example, agreement was consistently higher for BFS Externalizing (cross-informant rs = .18 to .35) and lower for BFS Internalizing (rs = −.02 to .16), consistent with the more observable nature of externalizing problems compared to internalizing problems. The cross-informant TPA Severity correlations were also relatively low (rs = .07 to .15), which may reflect the higher degree of variability with respect to which problems the different informants were rating and how they may manifest differently within and across contexts. In summary, teachers’ baseline ratings on the BFS-T and TPA-T showed positive associations, of negligible to medium effect sizes, with corresponding ratings from other informants.

Table 4.

Cross-informant correlations among BFS and TPA scales

		Teacher Report				Parent Report				Youth Report
		1	2	3	4	5	6	7	8	9	10	11	12
1	BFS-T Internalizing	-
2	BFS-T Externalizing	.24^*	-
3	BFS-T Total	.74^***	.83^***	-
4	TPA-T Severity	.29^**	.42^***	.46^***	-
5	BFS-P Internalizing	.08	.02	.06	.07	-
6	BFS-P Externalizing	−.04	.34 ^***	.22^*	.10	.13	-
7	BFS-P Total	.02	.27^**	.20 ^*	.11	.69^***	.81^***	-
8	TPA-P Severity	.11	.00	.06	.09	.29^**	.22^*	.33^***	-
9	BFS-Y Internalizing	.16 ⁺	−.03	.07	.07	−.02	.03	.01	−.01	-
10	BFS-Y Externalizing	−.17⁺	.18 ⁺	.03	.04	−.06	.35 ^***	.22^*	−.01	.26^**	-
11	BFS-Y Total	−.02	.10	.06	.07	−.05	.25^**	.15	−.01	.77^***	.82^***	-
12	TPA-Y Severity	.14	−.02	.07	.15	.18⁺	.01	.11	.07	.40^***	.14	.33^***	-

Open in a new tab

Note. Cross-informant correlations for each scale are presented in bold on the diagonals.

⁺

p ≤ .10,

p < .05,

^**

p < .01,

^***

p < .001

Table 5 presents correlations of BFS-T and TPA-T scales in relation to TRF scales, including hypothesized convergent and discriminant validity coefficients. Results showed evidence of convergent validity. For example, BFS-T Internalizing Problems were significantly correlated with TRF Internalizing Problems, Anxious/ Depressed, and Withdrawn/ Depressed scales with large and positive effect sizes (rs = 0.49–0.60)—all significantly greater than the correlations with externalizing-type TRF scales. Conversely, BFS-T Externalizing Problems were associated with TRF Externalizing Problems, Rule Breaking, and Aggressive Behavior with large and positive effect sizes (rs = 0.66–0.78)—all significantly greater than with corresponding TRF internalizing-type scales. Correlations with other scales (e.g., Somatic Complaints, Social Problems) were smaller, offering support for discriminant validity. The composite teacher-report scales (BFS-T Total and TPA-T) were correlated with one another (r = .46; Table 4) and with TRF Total Problems (rs = .73 and .38, respectively; Table 5). At the same time, correlations across informants were negligible to small (rs = .06 to .20; see Table 4). In other words, the three informants’ perspectives seem to capture mostly distinct portions of the variance in youths’ total, internalizing, externalizing, and top problems.

Table 5.

Convergent and discriminant validity with Teacher Report Form (TRF) scales

TRF Scale	BFS-T Internalizing (I)	BFS-T Externalizing (E)	BFS-T Total	TPA-T Severity	Discriminant Validity (BFS-T)
TRF Scale	BFS-T Internalizing (I)	BFS-T Externalizing (E)	BFS-T Total	TPA-T Severity	H ₁	z
Broadband
Internalizing Problems	*.60* ^***	−.02	.32^***	.17⁺	I > E	6.04^***
Externalizing Problems	.32 ^***	*.75* ^***	.70^***	.29^***	I < E	5.18^***
Total Problems	.59^***	.57^***	.73 ^***	.38 ^***
Narrowband
Anxious/Depressed	*.51* ^***	−.11	.21^*	.07	I > E	5.74^***
Withdrawn/Depressed	*.49* ^***	.05	.31^***	.22^*	I > E	4.11^***
Rule Breaking	.22^*	.66 ^***	.59^***	.24^*	I < E	4.57^***
Aggressive Behavior	.34^***	.78 ^***	.74^***	.29^**	I < E	5.45^***
Somatic Complaints	.24^*	.03	.16	.04	-	-
Social Problems	.41^***	.41^***	.52^***	.26^**	-	-
Thought Problems	.35^***	.16⁺	.31^***	.21	-	-
Attention Problems	.36^***	.53^***	.57^***	.33^***	-	-

Open in a new tab

Note. Convergent validity correlation coefficients (shown in bold) test the general hypothesis that two scales purporting to measure similar constructs (e.g., BFS-T Total Problems and TRF Total Problems) are positively and strongly correlated with one another. Discriminant validity analyses (paired coefficients shown in italics and compared using z-scores) test the general hypothesis that BFS-T Internalizing Problems are less strongly correlated with BFS-T Externalizing Problems than they are with other internalizing measures (e.g., TRF Internalizing, Anxious/Depressed); and vice versa for BFS-T Externalizing Problems. Discriminant validity tests were estimated using Fisher’s r-to-z transformation while accounting for the shared variance (r = .24) between the BFS-T Internalizing Externalizing scales (Lee and Preacher, 2013).

⁺

p ≤ .10,

p < .05,

^**

p < .01,

^***

p < .001

Monitoring Change Over Time

Figure 1 presents the model-implied average trajectories of change on teacher-report measures plotted alongside corresponding trajectories for parent- and youth-report measures (see Table 6 for corresponding teacher MLM parameter estimates). As shown, BFS-T internalizing, total, and TPA-T mean scale scores all started in the moderate-high range at baseline (ps < .001) and showed log-linear decreases over time that were both statistically significant (ps < .005; Table 6) and clinically meaningful. Clinically, this means TPA-T scores declined from about 3.82 at baseline (close to the ceiling of 4, or “a very big problem”) by more than 2 scale-points over 6 months (ending closer to the scale’s floor of 0, or “not a problem at all”). These significant negative slopes show sensitivity to change for TPA-T and BFS-T Internalizing and Total Problems. The BFS-T Externalizing Problems slope term was negative but non-significant, suggesting this scale did not detect change over time. However, given that these BFS-T Externalizing scores were not elevated at baseline, and teachers’ observations primarily occur within the classroom setting, they may have had little have little room to decline in this sample.⁷

Table 6.

Unconditional models of loglinear change in teacher-rated scores over time

Models	Terms	Est	SE	df	t or z	r
BFS-T Internalizing Trajectories
Model for the means:	Intercept (baseline value)	8.86	0.92	86.7	9.60^***	-
Model for the means:	Slope (change per log-day)	−0.85	0.20	79.5	−4.29^***	-
Model for the variance:	Intercept	43.57	13.05	-	3.34^***	-
	Intercept-slope	−6.67	2.69	-	−2.48^*	−.86
	Slope	1.39	0.58	-	2.38^**	-
	Residual	8.69	0.61	-	14.15^***	-
BFS-T Externalizing Trajectories
Model for the means:	Intercept (baseline value)	6.90	1.11	101.0	6.24^***	-
Model for the means:	Slope (change per log-day)	−0.20	0.24	98.0	−0.83	-
Model for the variance:	Intercept	75.11	17.68	-	4.25^***	-
	Intercept-slope	−11.83	3.51	-	−3.37^**	−.84
	Slope	2.64	0.76	-	3.46^***	-
	Residual	9.51	0.68	-	14.00^***	-
BFS-T Total Problems Trajectories
Model for the means:	Intercept (baseline value)	15.89	1.64	92.0	9.68^***	-
Model for the means:	Slope (change per log-day)	−1.09	0.36	91.9	−3.00^**	-
Model for the variance:	Intercept	151.44	40.53	-	3.74^***	-
	Intercept-slope	−24.76	8.30	-	−2.98^**	−.84
	Slope	5.69	1.82	-	3.13^***	-
	Residual	23.80	1.69	-	14.05^***	-
TPA-T Mean Severity Trajectories
Model for the means:	Intercept (baseline value)	3.82	0.17	96.4	22.49^***	-
Model for the means:	Slope (change per log-day)	−0.39	0.04	95.4	−9.08^***	-
Model for the variance:	Intercept	1.05	0.39	-	2.66^**	-
	Intercept-slope	−0.23	0.09	-	−2.42^*	−.82
	Slope	0.07	0.02	-	3.01^**	-
	Residual	0.36	0.03	-	14.13^***	-

Open in a new tab

Note. Parameter estimates describe the log-linear trajectories of change in each progress monitoring scale score by informant. Each model is decomposed into two parts: (a) the model for the means, which estimates the average score at baseline (fixed intercept) and average rate of change over time (log-linear slope), with t values reported; and (b) the model for the variance, which characterizes patterns of variation, covariation, and residual variance by which individuals deviate from these average trajectories, with z values reported. For interpretability, covariance terms are also reported as correlations.

⁺

p ≤ .10,

p < .05,

^**

p < .01,

^***

p < .001

To examine cross-informant relations in outcome trajectories, we re-estimated the MLMs reported above as parallel-process multivariate models that paired teacher ratings with corresponding parent or youth ratings.⁸ As shown in Table 7, growth terms were mostly positively correlated across informants with negligible to medium effect sizes. In other words, when a teacher’s rating suggested a particular clinical pattern (e.g., high baseline severity, favorable treatment response), their parent and youth ratings tended to follow a similar pattern—with some variations by scale and informant. Specifically, parent- and teacher-reports showed agreement in their trajectories of TPA severity (intercept-intercept r = .46; slope-slope r = .41) and BFS externalizing problems over time (intercept-intercept r = .32; slope-slope r = .16). In contrast, youth-report was significantly correlated with teacher-report BFS internalizing problems over time (intercept-intercept r = .37; slope-slope r = .25), and for other measures at baseline only (intercept-intercept: BFS externalizing = .20, TPA severity = .21). All other cross-informant growth-term correlations were nonsignificant (rs = −.15 to .14). Critically, smaller correlations in growth terms across informants are consistent with the view that different informants offer different perspectives on youth problems and treatment response across settings and over time, supporting incremental utility of teacher report.

Table 7.

Cross-informant correlations between loglinear trajectories in parallel process models

	Parent intercept with teacher intercept	Parent slope with teacher slope	Youth intercept with teacher intercept	Youth slope with teacher slope
Main Analysis ^a
BFS Internalizing	.08	−.04	.37 ^***	.25 ^**
BFS Externalizing	.32 ^***	.16⁺	.20 ^*	.14
BFS Total Problems	.14	.02	.12	.08
TPA Mean Severity	.46 ^***	.41 ^***	.21 ^*	.01
Sensitivity Analysis ^b
BFS Internalizing	.04	−.15	.29 ^**	.13
BFS Externalizing	.38 ^***	.26 ^**	.18⁺	.08
BFS Total Problems	.20 ^*	.07	.06	.00
TPA Mean Severity	.83 ^***	.49 ^***	.32 ^***	.07

Open in a new tab

Note. Correlation coefficients are based on the standardized covariance matrix within the cross-informant multi-level models for each outcome variable. Significance p-values are based on two-tailed significance of the t-statistic with df = 108.

Main analysis used all available measurements from weekly parent- (M = 18.2 occasions per person) and youth-report data (M=17.2 occasions) to examine cross-informant trajectory correlations with monthly teacher-report data (M = 5.4 occasions).

Sensitivity analysis is identical to main analysis but used simulated monthly measurements selected from the available parent- (M = 5.7 occasions) and youth-report data (M = 5.6 occasions) to more closely correspond to the monthly teacher-report data.

⁺

p ≤ .10,

p < .05,

^**

p < .01,

^***

p < .001

To summarize the longitudinal results, BFS-T and TPA-T scores showed significant reductions on most scales over time, providing evidence for sensitivity to change. Across informants, however, patterns of change, and correlations between trajectory parameters, were varied and inconsistent. Teachers showed more correspondence with youths on internalizing problems and with parents on externalizing and top problems. Sensitivity analyses (Table 7, bottom) showed that these cross-informant trajectory correlations were robust, remaining largely the same regardless of whether the teacher data was collected at a measurement schedule similar to that of the parent and youth data.⁹ Thus, each informant seems to offer different perspectives on youths’ problem severity at baseline and over treatment.

Discussion

We examined the psychometric properties two new teacher-report measures for repeated measurement of youth mental health concerns throughout treatment. Both tools, the Behavior and Feelings Survey (BFS) and the Top Problems Assessment (TPA), have been used by parent-report and youth-report in previous studies; this study was the first to extend these tools to a third informant, teachers (i.e., the BFS-T and TPA-T). Results supported the validity, reliability, and incremental utility of the BFS-T and TPA-T. Per all informants, the BFS scales (internalizing, externalizing, and total problems) showed good internal consistency, and the correlated two-factor structure fit the data well (with minor modifications, discussed below). The BFS-T and the TPA-T demonstrated convergent and discriminant validity via correlations with established teacher-rated scales (TRF), and modest correlations with corresponding parent- and teacher-rated scales (TPA/BFS). Sensitivity to change was seen in terms of statistically significant negative slopes in the trajectories for BFS-T internalizing, BFS-T total, and TPA-T severity. That is, youths’ scores on these scales declined throughout treatment. Further, the average patterns of baseline severity and longitudinal change seen by teacher-report were similar to those seen by parent- and youth-report. Overall, the TPA and BFS were able to detect baseline and longitudinal variations in key dimensions of youth mental health symptoms across all three informants.

Findings speak to the unique perspectives, observational contexts, and discrepancies across informants, supporting the incremental utility of teacher-report. In this regard, these results mirror meta-analyses of therapy effectiveness trials (typically focused on fixed time points, like baseline to post-treatment or follow-up), where effect size is moderated by choice of informant (Weisz et al., 2017). Findings are also consistent with decades of evidence on multi-informant assessment, where meta-analyses consistently estimate cross-informant agreement at a correlation of about .27 overall (higher for externalizing and lower for internalizing; De Los Reyes et al., 2015). Our findings were broadly consistent with this pattern but with the overall correlations being weaker (median r_total = .15, median r_ext = .34, median r_int = .08, and median r_tpa = .09). These lower correlations could be explained by various features of our measures, all of which could diminish correlations: brevity (just 3–6 items per unique scale, whereas a larger number would reduce measurement error), timescale (items are rated over the past week rather than a longer-term period), context (parent, youth, and teacher each referencing different settings and experiences), breadth of coverage of the BFS (broadband internalizing or externalizing problems rather than specific problems), and idiographic character of the TPA (different informants are likely rating different problems). While these properties could attenuate cross-informant correlations, they are also strengths for MBC (as discussed in the Introduction).

Our findings also resonate with the past literature in terms of the tradeoffs between multiple domains of measurement vs. multiple perspectives on those domains. Consider Table 4. Cross-informant correlations tended to be negligible or small overall, even when focusing only on the same scale rated by different informants (median r_tp = .15, r_ty = .16, and r_py = .11). These results stand in contrast to the much larger correlations among different scales rated by the same informant (median r_tt = .44, r_pp = .31, and r_yy = .37). This pattern illustrates the value of multi-informant assessment. All else being equal, if the goal is to obtain more information about a child’s functioning, then finding a new informant to complete the same scale will go much farther than finding a new scale to give to the original informant.

The longitudinal results were particularly intriguing. Parent, youth, and teacher reports on the TPA and BFS scales all showed a decline over time, on average (Figure 1). But similarity of average results is not the same as high agreement on individual cases. Parent and teacher reports were correlated on trajectories of externalizing and top problem severity, while teachers and youths tended to agree on trajectories of internalizing problems (Table 7). As meaningful as these significant cross-informant correlations may be, the weak and non-significant correlations are equally meaningful. If all informants agreed strongly, it would be unnecessary and redundant to bother collecting data from 2–3 informants; 1–2 informants would suffice. The present data suggest that teachers were observing different clinical trajectories from those reported by parents and youths, supporting the incremental utility of teacher-report.

Overall, multi-informant patterns are consistent with the literature and suggest that teachers, parents, and youths are contributing different pieces of information to the assessment process (De Los Reys et al., 2015). Indeed, the three informants represent different perspectives, different settings, and samplings of experience on which to base their ratings. Given that teachers see the child only at school, we would not necessarily expect their ratings to correlate with parent and self-report perspectives that draw from observations outside of school. In this regard, patterns of convergence and divergence are both meaningful.

Now that we have parallel three-informant measures—one nomothetic, one idiographic—the implications for clinical and research applications are considerable. Clinicians and families involved in MBC stand to benefit from an increased number of datapoints reflecting how treatment is going, and particularly when those datapoints represent the primary contexts in which youths live their lives, including their own self-perceptions, at home, and at school. The BFS and TPA provide a more complete understanding of youth functioning across contexts. Between these two brief instruments (the 3-item TPA, and the 12-item BFS with three scales) and three different informants (P, Y, T), clinicians can have as many as 12 dynamic navigational indicators by which to “guide the voyage of treatment” (Youngstrom et al., 2017). This is relevant to treatment in school-based mental health care and beyond (e.g., outpatient, inpatient).

Yet, with more data comes more complexity, including important questions regarding how to integrate and act upon the data. This is particularly true when two or three informants differ markedly in their reports. For example, if a clinician sees that teacher-report shows improvement, but parent-report does not, they might conclude that school-based aspects of treatment are succeeding but more work is needed at home. These decision-making processes might include reliance on clinical judgment, knowledge of the specific case details (for which idiographic TPA data could be useful), and a case-conceptualization that includes hypothesized explanations for the informant discrepancies (De Los Reyes et al., 2015). Future measurement validation work could test interpretations of informant discrepancies in treatment outcomes. For example, Kraemer et al.’s (2003) “Satellite Model” allows users to synthesize parent, youth, and teacher data into interpretable component scores that can be studied in relation to criterion variables (e.g., home- and school-specific indicators of functioning). Prior research provides some examples (e.g., Charamut et al., 2022; De Los Reyes et al., 2022), but future research could extend this work into MBC and repeated measures designs.

Findings also point to other tasks for future research. In particular, the informant-specific variations in the modified factor structure suggests the possibility that all the BFS items do not “work” equally well for measuring internalizing and externalizing across all three informants. This is not terribly surprising, as multi-informant measures (e.g., Achenbach & Rescorla, 2001; Weisz et al., 2020) prioritize keeping item content, factor structures, and instruction/response characteristics parallel across informants (De Los Reyes et al., 2013). This comes with the tradeoff of having to optimize the instrument in aggregate. Alternatively, attempting to optimize measurement for each informant can lead to different items, scales, and factor structures across raters (e.g., Reynolds & Kamphaus, 2015), posing challenges for multi-informant integration (De Los Reyes et al., 2013). The slight modifications made to the BFS factor structure here (e.g., correlated residuals, anxiety and depression subordinate to internalizing problems) may be sample- and informant-specific rather than meaningful for applied measurement (e.g., scale-level internal consistency and validity correlations were still supported). Additionally, slight modifications did not impact overall scoring and interpretation of the BFS and might not be necessary in larger samples. Nevertheless, further examination of the BFS items and factor structure across informant remains an important task for future research.

Relatedly, the slopes for BFS-T externalizing problems, while negative, did not reach statistical significance. This probably reflects upon the data more than the scale itself. First, teacher-rated externalizing scores were relatively low at baseline and therefore had little room to decrease over time. Moreover, the classroom context (sitting at desks, direct instruction, routines, etc.) may be less conducive to disruptive behavior than other settings (e.g., recess, lunch, free time). Evidence for treatment sensitivity of BFS-T Externalizing Problems remains to be established in future research. Finally, it is important to acknowledge that gathering MBC data via teacher-report or from 3+ informants could be complicated logistically, particularly in routine clinical care contexts. Future work should focus on developing and evaluating strategies for gathering, integrating, and acting upon these data to guide the course of treatment.

Strengths and Limitations

Study strengths include the use of three informants rating the same scales on the same youths. In support of generalizability, the sample was referred for school-based treatment via usual channels across many different schools and districts in a large metropolitan area that reflected the socioeconomic and demographic diversity of the population. Another strength is that longitudinal and cross-sectional data allowed us to look at two important aspects of treatment: clinical presentation at initial referral and trajectories of change thereafter. Finally, the measures themselves (BFS, TPA) possess many strengths, which are detailed in the Introduction and Measures sections (e.g., free, brief, multi-informant, idiographic, nomothetic).

This study also has limitations. First, regarding constraints on generality, participants were mostly white (56.4%) and male (58.4%), and entirely in grades 1–7. It is unclear to what extent these findings might generalize to other developmental periods (high school, early childhood) or to youth from specific minoritized backgrounds (race, ethnicity, language, sex, gender). Relatedly, given the focus on school mental health counseling, generalizability to other care settings (e.g., inpatient) is not clear. Research is needed to replicate and extend these findings across diverse populations and settings. Second, we had a moderately sized sample with 110 parents, 109 youths, and 95 teachers. Although these numbers were sufficient to detect the observed effects and larger than average for the field (Weisz et al., 2017), larger samples yield more reliable and conclusive estimates. Third analyses treated all case-level data as independent observations even though only 95 students (86.4%) were rated by distinct teacher informants. Future studies with larger school-based samples may be better suited to probe effects associated with teacher, classroom, grade level, school, and district. Fourth, assessment schedules were not aligned between teachers (monthly) and parents/youths (weekly), which limited our longitudinal cross-informant analyses. Sensitivity analyses partly addressed this issue, but the study was not designed to rigorously characterize cross-informant patterns on week-to-week or month-to-month basis. These are important directions for future research.

In summary, the BFS and TPA are free, brief, incrementally useful tools designed to guide MBC in youth mental health. While the parent- and youth-report versions of these measures have been validated previously (Weisz et al., 2011, 2020), the current study provides psychometric support for new teacher-report forms of these two measures. Taken together, these findings support the BFS/TPA as a multi-informant (parent, youth, teacher) and multi-method (idiographic top problems, nomothetic internalizing and externalizing problems) measurement system for research and clinical use.

Supplementary Material

Suppl bfs

NIHMS2141814-supplement-Suppl_bfs.pdf^{(687.4KB, pdf)}

Suppl tpa

NIHMS2141814-supplement-Suppl_tpa.pdf^{(334.4KB, pdf)}

Suppl problems

NIHMS2141814-supplement-Suppl_problems.xlsx^{(33.6KB, xlsx)}

Supplemental materials include the following:

S1: Behavior and Feelings Survey (BFS): parent-, youth-, and teacher-report

S2: Top Problems Assessment (TPA): parent-, youth-, and teacher-report

S3: Top problems as reported by teachers, parents, and youths in present sample

Public Significance Statement.

We examined two brief and free questionnaires—the Behavior and Feelings Survey (BFS) and the Top Problems Assessment (TPA)—used to measure youth mental health symptoms throughout treatment and from multiple perspectives. Our findings provide initial support for new teacher-report versions of these measures and further support for the established parent- and youth-report versions—all of which can now be used to guide clinical care.

Acknowledgments

The authors thank the youths, caregivers, teachers, and clinicians who participated in this research, as well as the research team members who helped collect the data.

This work was supported by a grant to JRW from the Institute of Education Sciences (grant number R305A140253). During the preparation of this manuscript, the authors received support from their respective institutions, the National Institute of Mental Health (R01MH124965 to JRW; L30MH120708 to SCE), AIM Clinical Science Fellowship (SCE), and the Manton Foundation (JRW). JRW is a co-author of the MATCH treatment protocol and receives royalties for its sales. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The present results have not been reported in any previous publications.

Footnotes

There were zero instances of multiple youths nested within parent rater. Regarding teachers, 82 (86.3%) rated exactly one student, 11 (11.6%) rated two, and two (2.1%) rated three. Put differently, 74.5% of cases (n = 82) had no dependencies. We concluded that the hierarchical nesting inherent to schools (i.e., students within classrooms, within schools, within districts) was minimal in this dataset. Our analytic approach does not eliminate the potential effects of dependencies in 25.5% of cases (n = 28), as noted in the limitations (see Discussion). It does, however, allow for parallel analyses and interpretation of results across parent, youth, and teacher data.

Consistent with prior studies using TPA (e.g., Weisz et al., 2011, 2020b), internal consistency and factor analysis were not appropriate for the TPA given that the item content is idiosyncratic to each person.

It is difficult to quantify percentage missing data because treatment episode and study participation length varied across cases by design, and therefore so did the number of repeated BFS/TPA occasions. Further, it was not possible to ascertain whether an expected weekly/monthly was truly “missing,” as there could be multiple explanations for it not being recorded (e.g., prompt not received, occurred during breaks/holidays, technical errors/difficulties).

⁴

Specifically, log-days = ln(observation date – baseline date + 1). This approach was selected because loglinear trends fit the data better than linear or polynomial (i.e., smaller −2LL, AIC, and BIC) and have also been used by convention in prior RCTs (e.g., Weisz et al., 2012, 2020a).

⁵

Fit statistics for each were as follows. BFS-P: χ2 (df=53) = 171.883, p<.001; MLR correction: 1.1072; RMSEA(90% CI) = 0.143 (0.119, 0.167), CFI = 0.807, TLI = 0.760; BFS-Y: χ2 (df=53) = 89.676, p<.001; MLR correction: 1.2146; RMSEA(90% CI) = 0.080 (0.050, 0.108), CFI = 0.914, TLI = 0.893; BFS-T: χ2 (df=53) = 197.932, p<.001; MLR correction: 1.1008; RMSEA(90% CI) = 0.158 (0.135, 0.181), CFI = 0.780, TLI = 0.727.

⁶

The separation of internalizing item content to include depression vs. anxiety separately is consistent with the original design of the measure, although not part of the original factor structure (Weisz et al., 2020b).

⁷

Specifically, BFS-T Externalizing baseline scores (intercept = 6.90) correspond to item mean of 1.15 on the BFS’s 0–4 response scale, where 0 represents “not a problem.” This interpretation is also supported by the very high slope-intercept correlations in all teacher-rated trajectories (rs = −.86 to −.82; Table 6), suggesting that most of the variance in youths’ slopes of symptom improvement might be explained by their severity at baseline.

⁸

The fixed effect results for the parallel process models largely did not differ from those presented in Table 6 and Figure 1, so they are not reported here. What is of primary interest from these results, and the reason for estimating them, is the magnitude of the correlations between slopes with slopes and intercepts with intercepts, when rating the same construct by different informants. These correlations are presented in Table 7.

⁹

We re-ran the parallel-process models restricting the parent- and youth-reported data to a subset of observations filtered at ~30-day intervals, for closer correspondence to the monthly teacher-report data. Most results showed little to no change in terms of significance (i.e., ps <.10 and ps ≥.10 remained so) or magnitude (change in r = −.12 to .12, computed using Fisher’s r-to-z transformations). Specifically, of the 16 correlation coefficients from the main analyses (Table 7, top), only 3 showed notable changes in the sensitivity analyses (Table 7, bottom), and these were in mixed directions: BFS total parent and teacher intercepts (increased from r = .14, p ≥ .10 to r = .20, p < .05); BFS internalizing youth and teacher slopes (decreased from r = .25, p < .01 to r = .13, p ≥ .10); and TPA parent and teacher intercepts (increased from r = .46, p < .001 to r = .83, p < .001).

References

Achenbach TM (2019). International findings with the Achenbach System of Empirically Based Assessment (ASEBA): Applications to clinical services, research, and training. Child and Adolescent Psychiatry and Mental Health, 13, 1–10. 10.1186/s13034-019-0291-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
Achenbach TM, & Rescorla LA (2001). Manual for the ASEBA school-age forms & profiles: An integrated system of multi-informant assessment Burlington, VT: University of Vermont. Research Center for Children, Youth, & Families. [Google Scholar]
American Psychological Association Task Force on Professional Practice Guidelines on Measurement-Based Care. (2025). APA Professional Practice Guidelines on Measurement-Based Care. Retrieved from https://www.apa.org/about/policy/
Ashworth M, Guerra D, & Kordowicz M (2019). Individualised or standardised outcome measures: A co-habitation? Administration and Policy in Mental Health and Mental Health Services Research, 46, 425–428. 10.1007/s10488-019-00928-z [DOI] [PubMed] [Google Scholar]
Bitsko RH, Claussen AH, Lichstein J, et al. (2022). Mental health surveillance among children—United States, 2013–2019. MMWR Supplements, 71(Suppl-2), 1–42. 10.15585/mmwr.su7102a1 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bradshaw CP, Buckley JA, & Ialongo NS (2008). School-based service utilization among urban children with early onset educational and mental health problems: The squeaky wheel phenomenon. School Psychology Quarterly, 23(2), 169–186. [Google Scholar]
Burns JR, & Rapee RM (2022). Barriers to universal mental health screening in schools: The perspective of school psychologists. Journal of Applied School Psychology, 38(3), 223–240. 10.1080/15377903.2021.1941470 [DOI] [Google Scholar]
Charamut NR, Racz SJ, Wang Mo, & De Los Reyes A (2022). Integrating multi-informant reports of youth mental health: A construct validation test of Kraemer and Colleagues’ (2003) Satellite Model. Frontiers in Psychology, 13, 911629. 10.3389/fpsyg.2022.911629 [DOI] [PMC free article] [PubMed] [Google Scholar]
Chorpita BF, Daleiden EL, Park AL, Ward AM, Levy MC, Cromley T, … Krull JL (2017). Child STEPs in California: A cluster randomized effectiveness trial comparing modular treatment with community implemented treatment for youth with anxiety, depression, conduct problems, or traumatic stress. Journal of Consulting and Clinical Psychology, 85(1), 13–25. 10.1037/ccp0000133 [DOI] [PubMed] [Google Scholar]
Conners CK (2022). Conners 4th Edition (Conners 4^®). Multi-Health Systems. [Google Scholar]
Connors EH, Arora P, Curtis L, & Stephan SH (2015). Evidence-based assessment in school mental health. Cognitive and Behavioral Practice, 22(1), 60–73. [Google Scholar]
Connors EH, Childs AW, Douglas S, & Jensen-Doss A (2025). Data-informed communication: How measurement-based care can optimize child psychotherapy. Administration and Policy in Mental Health and Mental Health Services Research, 52, 179–193. 10.1007/s10488-024-01372-4 [DOI] [PubMed] [Google Scholar]
Cooper M, Stewart D, Sparks J, & Bunting L (2013). School-based counseling using systematic feedback: A cohort study evaluating outcomes and predictors of change. Psychotherapy Research, 23(4), 474–488. [DOI] [PubMed] [Google Scholar]
Corteselli KA, Hollinsaid NL, Harmon SL, Bonadio FT, Westine M, Weisz JR, & Price MA (2020). School counselor perspectives on implementing a modular treatment for youth. Evidence-Based Practice in Child and Adolescent Mental Health, 5(3), 271–287. 10.1080/23794925.2020.1765434 [DOI] [Google Scholar]
De Los Reyes A (2024). Discrepant results in mental health research: What they mean, why they matter, and how they inform scientific practices. Oxford University Press. [Google Scholar]
De Los Reyes A, & Epkins CC (2023). Introduction to the special issue. A dozen years of demonstrating that informant discrepancies are more than measurement error: Toward guidelines for integrating data from multi-informant assessments of youth mental health. Journal of Clinical Child & Adolescent Psychology, 52(1), 1–18. 10.1080/15374416.2022.2158843 [DOI] [PubMed] [Google Scholar]
De Los Reyes A, Augenstein TM, Wang M, Thomas SA, Drabick DAG, Burgers DE, & Rabinowitz J (2015). The validity of the multi-informant approach to assessing child and adolescent mental health. Psychological Bulletin, 141(4), 858–900. [DOI] [PMC free article] [PubMed] [Google Scholar]
De Los Reyes A, Cook CR, Sullivan M, Morrell N, Hamlin C, Wang M, Gresham FM, Makol BM, Keeley LM, & Qasmieh N (2022). The Work and Social Adjustment Scale for Youth: Psychometric properties of the teacher version and evidence of contextual variability in psychosocial impairments. Psychological Assessment, 34(8), 777–790. 10.1037/pas0001139 [DOI] [PubMed] [Google Scholar]
De Los Reyes A, Thomas SA, Goodman KL, & Kundey SM (2013). Principles underlying the use of multiple informants’ reports. Annual Review of Clinical Psychology, 9(1), 123–149. 10.1146/annurev-clinpsy-050212-185617 [DOI] [PMC free article] [PubMed] [Google Scholar]
Dunn TJ, Baguley T, & Brunsden V (2014). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105(3), 399–412. 10.1111/bjop.12046 [DOI] [PubMed] [Google Scholar]
Duong MT, Bruns EJ, Lee K, Cox S, Coifman J, Mayworm A, & Lyon AR (2021). Rates of mental health service utilization by children and adolescents in schools and other common service settings: A systematic review and meta-analysis. Administration and Policy in Mental Health and Mental Health Services Research, 48, 420–439. 10.1007/s10488-020-01080-9 [DOI] [PubMed] [Google Scholar]
Duong MT, Lyon AR, Ludwig K, Wasse JK, & McCauley E (2016). Student perceptions of the acceptability and utility of standardized and idiographic assessment in school mental health. International Journal of Mental Health Promotion, 18(1), 49–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
Elliott SN, Gresham FM, Freeman T, & McCloskey G (1988). Teacher and observer ratings of children’s social skills: Validation of the Social Skills Rating Scales. Journal of Psychoeducational Assessment, 6(2), 152–161. [Google Scholar]
Epstein JN, Kelleher KJ, Baum R, Brinkman WB, Peugh J, Gardner W, Lichtenstein P, & Langberg J (2014). Variability in ADHD care in community-based pediatrics. Pediatrics, 134(6), 1136–1143. 10.1542/peds.2014-1500 [DOI] [PMC free article] [PubMed] [Google Scholar]
Evans SC, Corteselli KA, Edelman A, Scott H, & Weisz JR (2023). Is irritability a top problem in youth mental health care? A multi-informant, multi-method investigation. Child Psychiatry & Human Development, 54, 1027–1041. [DOI] [PubMed] [Google Scholar]
Evans SC, Karlovich AR, Corteselli KA, Jensen-Doss A, & Weisz JR (2025). Taking measurement-based care to school: Evaluating teacher-report versions of the Behavior and Feelings Survey and the Top Problems Assessment (Preprint). OSF. 10.17605/OSF.IO/FS2EW [DOI] [PMC free article] [PubMed] [Google Scholar]
Farmer EM, Burns BJ, Phillips SD, Angold A, & Costello EJ (2003). Pathways into and through mental health services for children and adolescents. Psychiatric Services, 54(1), 60–66. 10.1176/appi.ps.54.1.60 [DOI] [PubMed] [Google Scholar]
Goodman R (1997). The Strengths and Difficulties Questionnaire: A research note. Journal of Child Psychology and Psychiatry, 38(5), 581–586. [DOI] [PubMed] [Google Scholar]
Green D (2016). Making the case for using personalised outcome measures to track progress in psychotherapy. European Journal of Psychotherapy & Counselling, 18(1), 39–57. [Google Scholar]
Green JG, McLaughlin KA, Alegría M, Costello EJ, Gruber MJ, Hoagwood K, … Kessler RC (2013). School mental health resources and adolescent mental health service use. Journal of the American Academy of Child & Adolescent Psychiatry, 52(5), 501–510. 10.1016/j.jaac.2013.03.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gresham FM, & Elliott SN (2008). Social Skills Improvement System—Rating Scales. Pearson Assessments. [Google Scholar]
Guadagnoli E, & Velicer WF (1988). Relation of sample size to the stability of component patterns. Psychological Bulletin, 103(2), 265–275. [DOI] [PubMed] [Google Scholar]
Hall H, & Nielsen E (2020). How do children spend their time? Time use and skill development in the PSID. FEDS Notes. Board of Governors of the Federal Reserve System. 10.17016/2380-7172.2577 [DOI] [Google Scholar]
Harmon SL, Price MA, Corteselli KA, Lee EH, Metz K, Bonadio FT, Hersh J, Marchette LK, Rodríguez GM, Raftery-Helmer J, Thomassin K, Bearman SK, Jensen-Doss AJ, Evans SC, & Weisz JR (2021). Evaluating a modular approach to therapy for children with anxiety, depression, trauma, and conduct problems (MATCH) in school-based mental health care: Study protocol for a randomized controlled trial. Frontiers in Psychology, 12, 639493. 10.3389/fpsyg.2021.639493 [DOI] [PMC free article] [PubMed] [Google Scholar]
Hawley KM, & Weisz JR (2003). Child, parent and therapist (dis) agreement on target problems in outpatient therapy: The therapist’s dilemma and its implications. Journal of Consulting and Clinical Psychology, 71(1), 62–70. [DOI] [PubMed] [Google Scholar]
Hoffman L (2015). Longitudinal analysis: Modeling within-person fluctuation and change. Routledge. [Google Scholar]
Jensen-Doss A, Haimes EMB, Smith AM, Lyon AR, Lewis CC, Stanick CF, & Hawley KM (2018). Monitoring treatment progress and providing feedback is viewed favorably but rarely used in practice. Administration and Policy in Mental Health and Mental Health Services Research, 45, 48–61. 10.1007/s10488-016-0763-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
Keepers BC, Easterly CW, Dennis N, Domino ME, & Bhalla IP (2023). A survey of behavioral health care providers on use and barriers to use of measurement-based care. Psychiatric Services, 74(4), 439–357. 10.1176/appi.ps.202100735 [DOI] [PubMed] [Google Scholar]
Kline RB (2023). Principles and practice of structural equation modeling (5^th ed). Guilford. [Google Scholar]
Kraemer HC, Measelle JR, Ablow JC, Essex MJ, Boyce WT, & Kupfer DJ (2003). A new approach to integrating data from multiple informants in psychiatric assessment and research: Mixing and matching contexts and perspectives. American Journal of Psychiatry, 160(9), 1566–1577. [DOI] [PubMed] [Google Scholar]
Lee IA, & Preacher KJ (2013). Calculation for the test of the difference between two dependent correlations with one variable in common [software]. http://quantpsy.org
Lewis CC, Boyd M, Puspitasari A, Navarro E, Howard J, Kassab H, … Douglas S (2019). Implementing measurement-based care in behavioral health: A review. JAMA Psychiatry, 76(3), 324–335. 10.1001/jamapsychiatry.2018.3329 [DOI] [PMC free article] [PubMed] [Google Scholar]
Lyon AR, Ludwig K, Wasse JK, Bergstrom A, Hendrix E, & McCauley E (2016). Determinants and functions of standardized assessment use among school mental health clinicians: A mixed methods evaluation. Administration and Policy in Mental Health and Mental Health Services Research, 43(1), 122–134. [DOI] [PMC free article] [PubMed] [Google Scholar]
Michelson D, Malik K, Parikh R, Weiss HA, Doyle AM, Bhat B, … Krishna M (2020). Effectiveness of a brief lay counsellor-delivered, problem-solving intervention for adolescent mental health problems in urban, low-income schools in India: A randomised controlled trial. The Lancet Child & Adolescent Health, 4(8), 571–582. 10.1016/s2352-4642(20)30173-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
Muthén LK, & Muthén BO (1998–2017). Mplus User’s Guide. Eighth Edition. Los Angeles, CA: Muthén & Muthén. [Google Scholar]
National Academies of Sciences, Engineering, and Medicine (2018). How People Learn II: Learners, Contexts, and Cultures. The National Academies Press. 10.17226/24783 [DOI] [Google Scholar]
Reynolds CR, & Kamphaus RW (2015). Behavior Assessment System for Children (3rd ed.). Bloomington, MN: Pearson. [Google Scholar]
Rhemtulla M, Brosseau-Liard PÉ, & Savalei V (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychological Methods, 17(3), 354–373. 10.1037/a0029315 [DOI] [PubMed] [Google Scholar]
Rognstad K, Helland SS, Neumer SP, Baardstu S, & Kjøbli J (2022). Short measures of youth psychopathology: psychometric properties of the brief problem monitor (BPM) and the behavior and feelings survey (BFS) in a Norwegian clinical sample. BMC Psychology, 10(1), 182. 10.1186/s40359-022-00894-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
Sales CMD, & Alves PCG (2012). Individualized patient-progress systems: Why we need to move towards a personalized evaluation of psychological treatments. Canadian Psychology / Psychologie Canadienne, 53(2), 115–121. [Google Scholar]
Steiger JH (1980). Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87(2), 245. [Google Scholar]
Tam H, & Ronan K (2017). The application of a feedback-informed approach in psychological service with youth: Systematic review and meta-analysis. Clinical Psychology Review, 55, 41–55. 10.1016/j.cpr.2017.04.005 [DOI] [PubMed] [Google Scholar]
Weisz JR, Chorpita BF, Frye A, Ng MY, Lau N, Bearman SK, … Hoagwood KE (2011). Youth top problems: Using idiographic, consumer-guided assessment to identify treatment needs and to track change during psychotherapy. Journal of Consulting and Clinical Psychology, 79(3), 369–380. 10.1037/a0023307 [DOI] [PubMed] [Google Scholar]
Weisz JR, Chorpita BF, Palinkas LA, Schoenwald SK, Miranda J, Bearman SK, … The Research Network on Youth Mental Health. (2012). Testing standard and modular designs for psychotherapy treating depression, anxiety, and conduct problems in youth: A randomized effectiveness trial. Archives of General Psychiatry, 69(3), 274–282. [DOI] [PubMed] [Google Scholar]
Weisz JR, Evans SC, Price MA, Hersh JA, Lee EH, Hollinsaid NL, Harmon SL, Rodriguez GM, Corteselli KA, Bearman SK, Bonadio FT, Marchette LK, Thomassin K, Raftery-Helmer J, Metz K, & Jensen-Doss A (2025). Testing Child STEPs in school-based mental health care: Cluster randomized controlled effectiveness trial in five school districts. Manuscript submitted for publication. [Google Scholar]
Weisz JR, Kuppens S, Ng MY, Eckshtain D, Ugueto AM, Vaughn-Coaxum R, … & Fordwood SR (2017). What five decades of research tells us about the effects of youth psychological therapy: A multilevel meta-analysis and implications for science and practice. American Psychologist, 72(2), 79–117. [DOI] [PubMed] [Google Scholar]
Weisz JR, Thomassin K, Hersh J, Santucci LC, MacPherson HA, Rodriguez GM, … Evans SC (2020a). Clinician training, then what? Randomized clinical trial of child STEPs psychotherapy using lower-cost implementation supports with versus without expert consultation. Journal of Consulting and Clinical Psychology, 88(12), 1065–1078. 10.1037/ccp0000536 [DOI] [PubMed] [Google Scholar]
Weisz JR, Vaughn-Coaxum RA, Evans SC, Thomassin K, Hersh J, Ng MY, … Mair P (2020b). Efficient monitoring of treatment response during youth psychotherapy: The behavior and feelings survey. Journal of Clinical Child & Adolescent Psychology, 49(6), 737–751. 10.1080/15374416.2018.1547973 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wolraich ML, Bard DE, Stein MT, Rushton JL, & O’Connor KG (2010). Pediatricians’ attitudes and practices on ADHD before and after the development of ADHD pediatric practice guidelines. Journal of Attention Disorders, 13(6), 563–572. [DOI] [PubMed] [Google Scholar]
Wolraich ML, Feurer ID, Hannah JN, Baumgaertel A, & Pinnock TY (1998). Obtaining systematic teacher reports of disruptive behavior disorders utilizing DSM-IV. Journal of Abnormal Child Psychology, 26, 141–152. [DOI] [PubMed] [Google Scholar]
Yeh M, & Weisz JR (2001). Why are we here at the clinic? Parent-child (dis)agreement on referral problems at outpatient treatment entry. Journal of Consulting and Clinical Psychology, 69(6), 1018–1025. 10.1037//0022-006x.69.6.1018 [DOI] [PubMed] [Google Scholar]
Yeh M, McCabe K, Hurlburt M, Hough R, Hazen A, Culver S, Garland A, & Landsverk J (2002). Referral sources, diagnoses, and service types of youth in public outpatient mental health care: A focus on ethnic minorities. The Journal of Behavioral Health Services & Research, 29, 45–60. 10.1007/bf02287831 [DOI] [PubMed] [Google Scholar]
Youngstrom EA, Van Meter A, Frazier TW, Hunsley J, Prinstein MJ, Ong M-L, & Youngstrom JK (2017). Evidence-based assessment as an integrative model for applying psychological science to guide the voyage of treatment. Clinical Psychology: Science and Practice, 24(4), 331–363. 10.1111/cpsp.12207 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Suppl bfs

NIHMS2141814-supplement-Suppl_bfs.pdf^{(687.4KB, pdf)}

Suppl tpa

NIHMS2141814-supplement-Suppl_tpa.pdf^{(334.4KB, pdf)}

Suppl problems

NIHMS2141814-supplement-Suppl_problems.xlsx^{(33.6KB, xlsx)}

[R1] Achenbach TM (2019). International findings with the Achenbach System of Empirically Based Assessment (ASEBA): Applications to clinical services, research, and training. Child and Adolescent Psychiatry and Mental Health, 13, 1–10. 10.1186/s13034-019-0291-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Achenbach TM, & Rescorla LA (2001). Manual for the ASEBA school-age forms & profiles: An integrated system of multi-informant assessment Burlington, VT: University of Vermont. Research Center for Children, Youth, & Families. [Google Scholar]

[R3] American Psychological Association Task Force on Professional Practice Guidelines on Measurement-Based Care. (2025). APA Professional Practice Guidelines on Measurement-Based Care. Retrieved from https://www.apa.org/about/policy/

[R4] Ashworth M, Guerra D, & Kordowicz M (2019). Individualised or standardised outcome measures: A co-habitation? Administration and Policy in Mental Health and Mental Health Services Research, 46, 425–428. 10.1007/s10488-019-00928-z [DOI] [PubMed] [Google Scholar]

[R5] Bitsko RH, Claussen AH, Lichstein J, et al. (2022). Mental health surveillance among children—United States, 2013–2019. MMWR Supplements, 71(Suppl-2), 1–42. 10.15585/mmwr.su7102a1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Bradshaw CP, Buckley JA, & Ialongo NS (2008). School-based service utilization among urban children with early onset educational and mental health problems: The squeaky wheel phenomenon. School Psychology Quarterly, 23(2), 169–186. [Google Scholar]

[R7] Burns JR, & Rapee RM (2022). Barriers to universal mental health screening in schools: The perspective of school psychologists. Journal of Applied School Psychology, 38(3), 223–240. 10.1080/15377903.2021.1941470 [DOI] [Google Scholar]

[R8] Charamut NR, Racz SJ, Wang Mo, & De Los Reyes A (2022). Integrating multi-informant reports of youth mental health: A construct validation test of Kraemer and Colleagues’ (2003) Satellite Model. Frontiers in Psychology, 13, 911629. 10.3389/fpsyg.2022.911629 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Chorpita BF, Daleiden EL, Park AL, Ward AM, Levy MC, Cromley T, … Krull JL (2017). Child STEPs in California: A cluster randomized effectiveness trial comparing modular treatment with community implemented treatment for youth with anxiety, depression, conduct problems, or traumatic stress. Journal of Consulting and Clinical Psychology, 85(1), 13–25. 10.1037/ccp0000133 [DOI] [PubMed] [Google Scholar]

[R10] Conners CK (2022). Conners 4th Edition (Conners 4^®). Multi-Health Systems. [Google Scholar]

[R11] Connors EH, Arora P, Curtis L, & Stephan SH (2015). Evidence-based assessment in school mental health. Cognitive and Behavioral Practice, 22(1), 60–73. [Google Scholar]

[R12] Connors EH, Childs AW, Douglas S, & Jensen-Doss A (2025). Data-informed communication: How measurement-based care can optimize child psychotherapy. Administration and Policy in Mental Health and Mental Health Services Research, 52, 179–193. 10.1007/s10488-024-01372-4 [DOI] [PubMed] [Google Scholar]

[R13] Cooper M, Stewart D, Sparks J, & Bunting L (2013). School-based counseling using systematic feedback: A cohort study evaluating outcomes and predictors of change. Psychotherapy Research, 23(4), 474–488. [DOI] [PubMed] [Google Scholar]

[R14] Corteselli KA, Hollinsaid NL, Harmon SL, Bonadio FT, Westine M, Weisz JR, & Price MA (2020). School counselor perspectives on implementing a modular treatment for youth. Evidence-Based Practice in Child and Adolescent Mental Health, 5(3), 271–287. 10.1080/23794925.2020.1765434 [DOI] [Google Scholar]

[R15] De Los Reyes A (2024). Discrepant results in mental health research: What they mean, why they matter, and how they inform scientific practices. Oxford University Press. [Google Scholar]

[R16] De Los Reyes A, & Epkins CC (2023). Introduction to the special issue. A dozen years of demonstrating that informant discrepancies are more than measurement error: Toward guidelines for integrating data from multi-informant assessments of youth mental health. Journal of Clinical Child & Adolescent Psychology, 52(1), 1–18. 10.1080/15374416.2022.2158843 [DOI] [PubMed] [Google Scholar]

[R17] De Los Reyes A, Augenstein TM, Wang M, Thomas SA, Drabick DAG, Burgers DE, & Rabinowitz J (2015). The validity of the multi-informant approach to assessing child and adolescent mental health. Psychological Bulletin, 141(4), 858–900. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] De Los Reyes A, Cook CR, Sullivan M, Morrell N, Hamlin C, Wang M, Gresham FM, Makol BM, Keeley LM, & Qasmieh N (2022). The Work and Social Adjustment Scale for Youth: Psychometric properties of the teacher version and evidence of contextual variability in psychosocial impairments. Psychological Assessment, 34(8), 777–790. 10.1037/pas0001139 [DOI] [PubMed] [Google Scholar]

[R19] De Los Reyes A, Thomas SA, Goodman KL, & Kundey SM (2013). Principles underlying the use of multiple informants’ reports. Annual Review of Clinical Psychology, 9(1), 123–149. 10.1146/annurev-clinpsy-050212-185617 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Dunn TJ, Baguley T, & Brunsden V (2014). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105(3), 399–412. 10.1111/bjop.12046 [DOI] [PubMed] [Google Scholar]

[R21] Duong MT, Bruns EJ, Lee K, Cox S, Coifman J, Mayworm A, & Lyon AR (2021). Rates of mental health service utilization by children and adolescents in schools and other common service settings: A systematic review and meta-analysis. Administration and Policy in Mental Health and Mental Health Services Research, 48, 420–439. 10.1007/s10488-020-01080-9 [DOI] [PubMed] [Google Scholar]

[R22] Duong MT, Lyon AR, Ludwig K, Wasse JK, & McCauley E (2016). Student perceptions of the acceptability and utility of standardized and idiographic assessment in school mental health. International Journal of Mental Health Promotion, 18(1), 49–63. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Elliott SN, Gresham FM, Freeman T, & McCloskey G (1988). Teacher and observer ratings of children’s social skills: Validation of the Social Skills Rating Scales. Journal of Psychoeducational Assessment, 6(2), 152–161. [Google Scholar]

[R24] Epstein JN, Kelleher KJ, Baum R, Brinkman WB, Peugh J, Gardner W, Lichtenstein P, & Langberg J (2014). Variability in ADHD care in community-based pediatrics. Pediatrics, 134(6), 1136–1143. 10.1542/peds.2014-1500 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Evans SC, Corteselli KA, Edelman A, Scott H, & Weisz JR (2023). Is irritability a top problem in youth mental health care? A multi-informant, multi-method investigation. Child Psychiatry & Human Development, 54, 1027–1041. [DOI] [PubMed] [Google Scholar]

[R26] Evans SC, Karlovich AR, Corteselli KA, Jensen-Doss A, & Weisz JR (2025). Taking measurement-based care to school: Evaluating teacher-report versions of the Behavior and Feelings Survey and the Top Problems Assessment (Preprint). OSF. 10.17605/OSF.IO/FS2EW [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Farmer EM, Burns BJ, Phillips SD, Angold A, & Costello EJ (2003). Pathways into and through mental health services for children and adolescents. Psychiatric Services, 54(1), 60–66. 10.1176/appi.ps.54.1.60 [DOI] [PubMed] [Google Scholar]

[R28] Goodman R (1997). The Strengths and Difficulties Questionnaire: A research note. Journal of Child Psychology and Psychiatry, 38(5), 581–586. [DOI] [PubMed] [Google Scholar]

[R29] Green D (2016). Making the case for using personalised outcome measures to track progress in psychotherapy. European Journal of Psychotherapy & Counselling, 18(1), 39–57. [Google Scholar]

[R30] Green JG, McLaughlin KA, Alegría M, Costello EJ, Gruber MJ, Hoagwood K, … Kessler RC (2013). School mental health resources and adolescent mental health service use. Journal of the American Academy of Child & Adolescent Psychiatry, 52(5), 501–510. 10.1016/j.jaac.2013.03.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Gresham FM, & Elliott SN (2008). Social Skills Improvement System—Rating Scales. Pearson Assessments. [Google Scholar]

[R32] Guadagnoli E, & Velicer WF (1988). Relation of sample size to the stability of component patterns. Psychological Bulletin, 103(2), 265–275. [DOI] [PubMed] [Google Scholar]

[R33] Hall H, & Nielsen E (2020). How do children spend their time? Time use and skill development in the PSID. FEDS Notes. Board of Governors of the Federal Reserve System. 10.17016/2380-7172.2577 [DOI] [Google Scholar]

[R34] Harmon SL, Price MA, Corteselli KA, Lee EH, Metz K, Bonadio FT, Hersh J, Marchette LK, Rodríguez GM, Raftery-Helmer J, Thomassin K, Bearman SK, Jensen-Doss AJ, Evans SC, & Weisz JR (2021). Evaluating a modular approach to therapy for children with anxiety, depression, trauma, and conduct problems (MATCH) in school-based mental health care: Study protocol for a randomized controlled trial. Frontiers in Psychology, 12, 639493. 10.3389/fpsyg.2021.639493 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Hawley KM, & Weisz JR (2003). Child, parent and therapist (dis) agreement on target problems in outpatient therapy: The therapist’s dilemma and its implications. Journal of Consulting and Clinical Psychology, 71(1), 62–70. [DOI] [PubMed] [Google Scholar]

[R36] Hoffman L (2015). Longitudinal analysis: Modeling within-person fluctuation and change. Routledge. [Google Scholar]

[R37] Jensen-Doss A, Haimes EMB, Smith AM, Lyon AR, Lewis CC, Stanick CF, & Hawley KM (2018). Monitoring treatment progress and providing feedback is viewed favorably but rarely used in practice. Administration and Policy in Mental Health and Mental Health Services Research, 45, 48–61. 10.1007/s10488-016-0763-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Keepers BC, Easterly CW, Dennis N, Domino ME, & Bhalla IP (2023). A survey of behavioral health care providers on use and barriers to use of measurement-based care. Psychiatric Services, 74(4), 439–357. 10.1176/appi.ps.202100735 [DOI] [PubMed] [Google Scholar]

[R39] Kline RB (2023). Principles and practice of structural equation modeling (5^th ed). Guilford. [Google Scholar]

[R40] Kraemer HC, Measelle JR, Ablow JC, Essex MJ, Boyce WT, & Kupfer DJ (2003). A new approach to integrating data from multiple informants in psychiatric assessment and research: Mixing and matching contexts and perspectives. American Journal of Psychiatry, 160(9), 1566–1577. [DOI] [PubMed] [Google Scholar]

[R41] Lee IA, & Preacher KJ (2013). Calculation for the test of the difference between two dependent correlations with one variable in common [software]. http://quantpsy.org

[R42] Lewis CC, Boyd M, Puspitasari A, Navarro E, Howard J, Kassab H, … Douglas S (2019). Implementing measurement-based care in behavioral health: A review. JAMA Psychiatry, 76(3), 324–335. 10.1001/jamapsychiatry.2018.3329 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] Lyon AR, Ludwig K, Wasse JK, Bergstrom A, Hendrix E, & McCauley E (2016). Determinants and functions of standardized assessment use among school mental health clinicians: A mixed methods evaluation. Administration and Policy in Mental Health and Mental Health Services Research, 43(1), 122–134. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] Michelson D, Malik K, Parikh R, Weiss HA, Doyle AM, Bhat B, … Krishna M (2020). Effectiveness of a brief lay counsellor-delivered, problem-solving intervention for adolescent mental health problems in urban, low-income schools in India: A randomised controlled trial. The Lancet Child & Adolescent Health, 4(8), 571–582. 10.1016/s2352-4642(20)30173-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] Muthén LK, & Muthén BO (1998–2017). Mplus User’s Guide. Eighth Edition. Los Angeles, CA: Muthén & Muthén. [Google Scholar]

[R46] National Academies of Sciences, Engineering, and Medicine (2018). How People Learn II: Learners, Contexts, and Cultures. The National Academies Press. 10.17226/24783 [DOI] [Google Scholar]

[R47] Reynolds CR, & Kamphaus RW (2015). Behavior Assessment System for Children (3rd ed.). Bloomington, MN: Pearson. [Google Scholar]

[R48] Rhemtulla M, Brosseau-Liard PÉ, & Savalei V (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychological Methods, 17(3), 354–373. 10.1037/a0029315 [DOI] [PubMed] [Google Scholar]

[R49] Rognstad K, Helland SS, Neumer SP, Baardstu S, & Kjøbli J (2022). Short measures of youth psychopathology: psychometric properties of the brief problem monitor (BPM) and the behavior and feelings survey (BFS) in a Norwegian clinical sample. BMC Psychology, 10(1), 182. 10.1186/s40359-022-00894-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] Sales CMD, & Alves PCG (2012). Individualized patient-progress systems: Why we need to move towards a personalized evaluation of psychological treatments. Canadian Psychology / Psychologie Canadienne, 53(2), 115–121. [Google Scholar]

[R51] Steiger JH (1980). Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87(2), 245. [Google Scholar]

[R52] Tam H, & Ronan K (2017). The application of a feedback-informed approach in psychological service with youth: Systematic review and meta-analysis. Clinical Psychology Review, 55, 41–55. 10.1016/j.cpr.2017.04.005 [DOI] [PubMed] [Google Scholar]

[R53] Weisz JR, Chorpita BF, Frye A, Ng MY, Lau N, Bearman SK, … Hoagwood KE (2011). Youth top problems: Using idiographic, consumer-guided assessment to identify treatment needs and to track change during psychotherapy. Journal of Consulting and Clinical Psychology, 79(3), 369–380. 10.1037/a0023307 [DOI] [PubMed] [Google Scholar]

[R54] Weisz JR, Chorpita BF, Palinkas LA, Schoenwald SK, Miranda J, Bearman SK, … The Research Network on Youth Mental Health. (2012). Testing standard and modular designs for psychotherapy treating depression, anxiety, and conduct problems in youth: A randomized effectiveness trial. Archives of General Psychiatry, 69(3), 274–282. [DOI] [PubMed] [Google Scholar]

[R55] Weisz JR, Evans SC, Price MA, Hersh JA, Lee EH, Hollinsaid NL, Harmon SL, Rodriguez GM, Corteselli KA, Bearman SK, Bonadio FT, Marchette LK, Thomassin K, Raftery-Helmer J, Metz K, & Jensen-Doss A (2025). Testing Child STEPs in school-based mental health care: Cluster randomized controlled effectiveness trial in five school districts. Manuscript submitted for publication. [Google Scholar]

[R56] Weisz JR, Kuppens S, Ng MY, Eckshtain D, Ugueto AM, Vaughn-Coaxum R, … & Fordwood SR (2017). What five decades of research tells us about the effects of youth psychological therapy: A multilevel meta-analysis and implications for science and practice. American Psychologist, 72(2), 79–117. [DOI] [PubMed] [Google Scholar]

[R57] Weisz JR, Thomassin K, Hersh J, Santucci LC, MacPherson HA, Rodriguez GM, … Evans SC (2020a). Clinician training, then what? Randomized clinical trial of child STEPs psychotherapy using lower-cost implementation supports with versus without expert consultation. Journal of Consulting and Clinical Psychology, 88(12), 1065–1078. 10.1037/ccp0000536 [DOI] [PubMed] [Google Scholar]

[R58] Weisz JR, Vaughn-Coaxum RA, Evans SC, Thomassin K, Hersh J, Ng MY, … Mair P (2020b). Efficient monitoring of treatment response during youth psychotherapy: The behavior and feelings survey. Journal of Clinical Child & Adolescent Psychology, 49(6), 737–751. 10.1080/15374416.2018.1547973 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R59] Wolraich ML, Bard DE, Stein MT, Rushton JL, & O’Connor KG (2010). Pediatricians’ attitudes and practices on ADHD before and after the development of ADHD pediatric practice guidelines. Journal of Attention Disorders, 13(6), 563–572. [DOI] [PubMed] [Google Scholar]

[R60] Wolraich ML, Feurer ID, Hannah JN, Baumgaertel A, & Pinnock TY (1998). Obtaining systematic teacher reports of disruptive behavior disorders utilizing DSM-IV. Journal of Abnormal Child Psychology, 26, 141–152. [DOI] [PubMed] [Google Scholar]

[R61] Yeh M, & Weisz JR (2001). Why are we here at the clinic? Parent-child (dis)agreement on referral problems at outpatient treatment entry. Journal of Consulting and Clinical Psychology, 69(6), 1018–1025. 10.1037//0022-006x.69.6.1018 [DOI] [PubMed] [Google Scholar]

[R62] Yeh M, McCabe K, Hurlburt M, Hough R, Hazen A, Culver S, Garland A, & Landsverk J (2002). Referral sources, diagnoses, and service types of youth in public outpatient mental health care: A focus on ethnic minorities. The Journal of Behavioral Health Services & Research, 29, 45–60. 10.1007/bf02287831 [DOI] [PubMed] [Google Scholar]

[R63] Youngstrom EA, Van Meter A, Frazier TW, Hunsley J, Prinstein MJ, Ong M-L, & Youngstrom JK (2017). Evidence-based assessment as an integrative model for applying psychological science to guide the voyage of treatment. Clinical Psychology: Science and Practice, 24(4), 331–363. 10.1111/cpsp.12207 [DOI] [Google Scholar]

PERMALINK

Taking measurement-based care to school: Evaluating teacher-report versions of the Behavior and Feelings Survey and the Top Problems Assessment

Spencer C Evans

Ashley R Karlovich

Katherine A Corteselli

Amanda Jensen-Doss

John R Weisz

Abstract

Introduction

What is Needed?

BFS, TPA, and Current Study

Methods

Participants and Procedures

Transparency and Openness

Measures

Behavior and Feelings Survey (BFS; Weisz et al., 2020).

Top Problems Assessment (TPA; Weisz et al., 2011).

Teacher Report Form (TRF; Achenbach & Rescorla, 2001).

Analytic Plan

Descriptive, Reliability, and Factor Analyses.

Validity and Multi-Informant Correlations.

Longitudinal Multilevel Models (MLMs).

Results

Descriptive Characteristics and Internal Consistency

Table 1.

Factor Structure

Table 2.

Table 3.

Validity Correlations Across Measures and Informants

Table 4.

Table 5.

Monitoring Change Over Time

Figure 1.

Table 6.

Table 7.

Discussion

Strengths and Limitations

Supplementary Material

Public Significance Statement.

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases