Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jul 1.
Published in final edited form as: Adm Policy Ment Health. 2020 Jul;47(4):581–596. doi: 10.1007/s10488-020-01020-7

Knowledge of Evidence-Based Services Questionnaire: Development and Validation of a Short Form

Gwendolyn M Lawson 1, Tyler M Moore 1, Kelsie H Okamura 4, Emily M Becker-Haimes 1, Rinad S Beidas 1,2,3
PMCID: PMC7260077  NIHMSID: NIHMS1566416  PMID: 32076887

Abstract

The Knowledge of Evidence-Based Services Questionnaire (KEBSQ) is an objective measure of therapist knowledge of practices derived from the evidence base for the treatment of youth psychopathology. However, the length of this measure (i.e., 40 items) and respondent demands associated with each item makes it burdensome for researchers and clinicians. This study developed and validated a Short Form of the KEBSQ using Item Response Theory (IRT) measurement models. The Short Form consists of 17 items and generates two separate scores: Correct Endorsements and Correct Rejections. The Short Form was found to correlate highly with and perform similarly to the Full Form, providing preliminary validity evidence.

Keywords: Knowledge, item response theory, evidence-based practices


Although evidence-based practices (EBPs) for the treatment of youth psychiatric disorders exist (Chorpita et al., 2011), therapists in community settings often do not use these practices (Garland, Bickman, & Chorpita, 2010; McHugh & Barlow, 2010; Weersing, Weisz, & Donenberg, 2002). This is a critical issue, given evidence that youth who receive EBPs demonstrate better outcomes than those who receive usual care (Weisz et al., 2017). As such, it is important to understand and address factors associated with therapists’ use of EBPs. Many factors, including characteristics of the setting, intervention, and individuals, may influence therapists’ use of EBPs (Damschroder et al., 2009). One individual characteristic, therapist knowledge about EBPs, is theoretically considered to be a necessary precursor to EBP use (Rogers, 2003) and has been empirically identified as a barrier to EBP delivery (Dearing, 2009; Sanders, Prinz, & Shapiro, 2009; Seng, Prinz, & Sanders, 2006). Therapist EBP knowledge can include both declarative knowledge (e.g., factual knowledge regarding what practices are evidence based) and procedural knowledge (e.g., process knowledge about how to use an evidence-based practice).

Given the importance of therapist EBP knowledge, it is critical for researchers and practitioners to validly measure this construct. The field of implementation science has been limited by the ability to validly and reliably measure relevant constructs, including therapist-level factors (Lewis et al., 2015). This limits the ability to test mechanistic approaches to implementation, a critical goal for the field (Lewis et al., 2018). As such, refining and improving measures of implementation constructs, such as therapist EBP knowledge, is an important step for the field. Importantly, measures need to be pragmatic (i.e., free and brief), while retaining strong psychometric properties, if they are to be used widely while remaining useful (Glasgow & Riley, 2013).

To date, there have been serious limitations to the measurement of therapist knowledge (Beidas & Kendall, 2010). Knowledge is often measured using therapist self-report of perceived knowledge (Beidas & Kendall, 2010), which may be subject to a number of biases, including social desirability bias (Rosenman, Tennekoon, & Hill, 2011). Other measures objectively assess therapist knowledge (e.g., via multiple choice questions that are scored according to the number of correct answers); however, these measures are often specific to a given EBP (Beidas & Kendall, 2010). These idiographic measures may have limited utility in community settings, where most therapists treat a wide variety of presenting problems and clients often present with comorbidities (Weisz, Ugueto, Cheron, & Herren, 2013). Therefore, there is a great need for the development and validation of brief measures that assess therapist EBP knowledge across a range of therapeutic practices.

The Knowledge of Evidence-Based Services Questionnaire (KEBSQ) is an objective measure of therapist declarative knowledge of practices derived from the evidence base for the treatment of youth psychopathology (Stumpf, Higa-McMillan, & Chorpita, 2009). The development of the KEBSQ was informed by distillation methodology (Chorpita & Daleiden, 2009; Chorpita, Daleiden, & Weisz, 2005), which defines and organizes EBPs by discrete techniques, rather than by specific treatment programs. This approach aggregates findings across studies to examine the frequency of treatment components of evidence-based mental health interventions. This allows the identification of common “practice elements” that are frequently included in evidence-based treatment protocols for specific problem domains (e.g., Chorpita, Becker, & Daleiden, 2007). These practice elements represent the discrete practices (e.g., exposure, problem solving) that comprise the evidence-based protocols for a given problem domain. Each item on the KEBSQ consists of a practice element. The KEBSQ includes items that queries whether each practice element is included in evidence-based treatment protocols for any of the following problem domains (anxious/avoidant, depressed/withdrawn, disruptive behavior, attention/hyperactivity, or none). The broad, transtheoretical nature of the KEBSQ makes it a good fit for measuring knowledge among community therapists, who treat a range of presenting problems using a variety of approaches (Garland et al., 2010; Weisz et al., 2013). The KEBSQ shows a number of promising characteristics including sensitivity to change following training, discriminant validity (Stumpf et al., 2009) and convergent validity (Nakamura, Higa-McMillan, Okamura, & Shimabukuro, 2011).

Although there are benefits to the KEBSQ relative to other knowledge measures, its utility for researchers and practitioners is limited by the assessment burden associated with the measure (i.e., 40 practices × 4 problem areas=160 separate yes/no decisions; see Okamura, Nakamura, Mueller, Hayashi, & Higa-McMillan, 2016; Weist et al., 2009). Furthermore, because the KEBSQ is a test of knowledge with correct and incorrect responses, the respondent burden associated with each yes/no decision is greater than it would be for making a decision on a rating scale regarding one’s attitude or opinion. Indeed, research participants have reported that the length of the KEBSQ makes it difficult to complete (Weist et al. 2009). To address this limitation, a previous study used a subset of KEBSQ items that represent the practices most frequently included in EBPs for the four problem areas (Okamura, Hee, Jackson, & Nakamura, 2018); however, there has not yet been an effort to empirically derive and validate a KEBSQ short form, to address the need for a pragmatic measure of therapist knowledge.

The primary aim of this study was to develop and validate a short form of the KEBSQ. To do so, we employed Item Response Theory (IRT) measurement models, a frequently used technique for developing short forms of questionnaires (Moore et al., 2015; Roalf et al., 2016) that allows a systematic approach to select items that are most predictive of overall test performance and of optimal difficulty (Embretson & Reise, 2000). To create this KEBSQ Short Form, we addressed the following primary questions in a large sample of community mental health therapists: (a) Which KEBSQ items are most informative about overall performance and should be retained in a short form and (b) Does a short form version of the KEBSQ perform comparably to the traditional KEBSQ with respect to internal consistency and concurrent validity? We also performed two additional sensitivity analyses to test the robustness of our results.

Method

Study Design and Participants

Setting

This work was conducted as part of a larger study that examined how therapists employed in community mental health clinics in Philadelphia implemented EBPs over time given the creation of a centralized infrastructure to support EBP use from the Philadelphia Department of Behavioral Health and Intellectual disAbility Services (DBHIDS; Beidas et al., 2013; Powell et al., 2016). The larger study used a prospective, sequential design to survey 499 therapists working within the 31 organizations that serve 80% of youth in the publicly funded mental health system. Therapists working in each organization were surveyed at three time points, each approximately two years apart (2013, 2015, 2017). Researchers scheduled group meetings with all clinicians working with the organizations; during these meetings, the research team described the study, obtained written informed consent, and collected measures in person. Therapists received $50 each time point; therapists participating in all three waves received an additional $50.

Since 2007, DBHIDS has supported EBPs via separate “EBP initiatives” that included training and expert consultation for enrolled clinicians lasting approximately one year, as recommended by treatment developers (Powell et al., 2016). Between 2007 and 2019 through these initiatives, DBHIDS supported the implementation of a variety of cognitive behavioral-therapy focused practices. In 2013, the Evidence-based Practice and Innovation Center (EPIC), an entity intended to provide a centralized infrastructure for EBP administration, was launched. In addition to supporting the EBP initiatives, which predated the creation of this centralized infrastructure, EPIC aligned policy, fiscal, and operational approaches by developing systematic processes to contract for EBP delivery, hosting events to publicize EBP delivery, designating providers as EBP agencies, and creating enhanced rates for the delivery of some EBPs. All organizations and clinicians were exposed to system-level support provided by EPIC from 2013–2017. This study was approved by appropriate Institutional Review Boards.

Participants

This study employed data from unique therapists across the three time points of the larger study. In part due to high rates of turnover (Beidas et al., 2016), the majority (N = 388; 77.9%) of therapists provided data at only one time point; 94 (18.9%) therapists provided data at two time points and 16 (3.2%) therapists provided data at all three time points. The analytic sample (N = 465), which was used to develop the KEBSQ Short Form, used data from the first time point that each therapist completed the KEBSQ. The validation sample (N = 92), which was used to compare the performance of the KEBSQ Short Form with the KEBSQ Full Form, employed data from the second time point that each therapist completed the KEBSQ.

Sample size recommendations for IRT measurement models (and the Graded Response Model [GRM] in particular) vary substantially, with no widely accepted cutoffs for adequate sample size. The most frequently cited for the Graded Response Model is Reise and Yu (1990), who recommended N = 500 for Graded Response Model estimates in one of the original IRT packages, MULTILOG.

Demographics of therapists who were included in the analytic and validation samples are summarized in Table 1. Most therapists in the sample were female with a Masters degree.

Table 1.

Sample Demographics

Variable Analytic Sample (N = 465)
Validation Sample (N = 92)
N(%) M(SD) N(%) M(SD)
36.69 39.45
Age (in years) (11.32) (11.59)
Years experience in full-time human 7.62 10.98
services work (7.62) (8.56)
2.63 5.01
Years experience in present agency (3.80) (5.04)
Gender
 Female 374 (80.4) 72 (78.3)
 Male 85 (18.3) 18(19.6)
 Not provided 6(1.3) 2 (2.2)
Educational attainment
 Bachelors degree 38 (8.2) 1(1.1)
 Masters degree 375 (80.6) 75 (81.5)
 Doctoral degree 43 (9.2) 15 (16.3)
 Not provided 9(1.9) 1(1.1)
Position Type
 Social Worker 59 (12.7) 7 (7.6)
 Psychologist 19(4.1) 4 (4.3)
 Psychiatrist 1(.2) 0(0)
 Marriage and Family Therapist 37(8) 1 (9.8)
 Masters Level Clinician 301 (64.7) 62 (67.4)
 Other 53 (11.4) 9 (9.8)
 Not provided 7(1.5) 1(1.1)
Race/Ethnicity
 Asian 28 (6) 6 (6.5)
 African American 128 (27.5) 23 (25)
 White 223 (48) 42 (45.7)
 Hispanic/Latino 70(15.5) 18(19.8)
 Multiracial 13 (2.8) 2 (2.2)
 Pacific Islander 1(.2) 0(0)
 American Indian 4 (.9) 0(0)
 Other 22 (4.7) 12(13)
 Not provided 16(3.5) 2 (2.2)

Measures

Knowledge

The full 40-item KEBSQ (Stumpf et al., 2009) requires therapists to identify whether practice element descriptions (e.g., “Training the parent(s) to give directions and commands effectively”) are included in evidence-based treatment protocols within the four domains of youth clinical presentations: anxious/avoidant, depressed/withdrawn, disruptive behavior, and hyperactivity/inattention problems. Respondents could also endorse “none.” Of note, the same practice element (e.g., Psychoeducation for Caregiver) could be included in a treatment protocol for more than one of the four domains of youth clinical presentation.

On this original scale, each item is scored from 0 to 4 using a scoring key, with one point assigned for each correct endorsement (i.e., correctly endorsing a domain for which a practice is derived from the evidence base) and one point for each correct rejection (i.e., correctly rejecting a domain for which a practice is not derived from the evidence base). For example, the KEBSQ item about commands/limit setting represents a practice that is derived from the evidence base for disruptive behavior and hyperactivity/inattention problems, but not for anxious/avoidance problems and depressed/withdrawn problems. Thus, to receive the maximum four points for this item, the respondent would need to correctly endorse the item as derived from the evidence base for disruptive behavior problems and hyperactivity/inattention problems (two correct endorsements) and correctly reject limit setting as derived from the evidence base for anxiety and depressed/withdrawn problems (two correct rejections). Thus, scoring follows a “multiple true false” approach (Kreiter & Frisbie, 1989), where respondents can receive between zero and four points, as opposed to dichotomous scoring for each item (i.e., correct vs. incorrect). The total scale is scored from 0–160, and higher scores indicate more EBP knowledge. To examine relationships between the pattern of errors and related constructs, such as attitudes, previous studies have also scored the KEBSQ in terms of commission errors (i.e., the total number of incorrectly endorsed practice as being derived from the evidence base when it is actually not) and omission errors (i.e., incorrectly indicating that a practice is not derived from the evidence base when it actually is; Nakamura et al., 2011). Note that correct endorsements are the inverse of omission errors, and correct rejections are the inverse of commission errors.

The KEBSQ scoring key is determined by the PracticeWise Evidence-Based Services Database (PWEBS), a comprehensive database of the child mental health treatment literature, and is informed by ongoing coding of the literature. PWEBS includes the practice elements (see Chorpita, Daleiden & Weisz, 2005) included in each treatment protocol. Following previous studies utilizing the KEBSQ (Nakamura et al. 2011; Stumpf et al. 2009), a practice was considered derived from the evidence base for a particular problem area if the practice was included in at least 10% of all treatment protocols receiving Best (Level 1) or Good (Level 2) Support for that specific problem area. The KEBSQ is dynamic in its scoring approach such that the scoring key is revised based on the most up to date coding of the extant treatment literature through distillation methodology (Chorpita et al., 2005). For the purpose of the current study, the KEBSQ scoring was informed by the 2017 update (PracticeWise, 2017), and scoring was consistent across time points. We also conducted a sensitivity analysis in which the KEBSQ was scored according to the most recent available scoring key at the time of data collection (i.e., the 2013 scoring key for data from 2013 and 2015, and the 2017 scoring key for data from 2017) to ensure the validity of our interpretations.

Predictors used in the validation sample

We examined convergent validity of the newly developed KEBSQ Short Form by computing correlations between the KEBSQ Short Form and theoretically relevant constructs such as attitudes toward evidence-based practices and use of therapy strategies; we compared these with the correlations between the original KEBSQ Full Form and the same constructs. Similar correlations provide convergent validity evidence. Selected measures were those been previously empirically linked to KEBSQ performance and those indexing constructs theoretically associated with clinician EBP knowledge. These measures are described below.

Attitudes

The Evidence-Based Practice Attitude Scale (EBPAS; Aarons, 2004) is a 15-item self-report measure of therapists’ attitudes toward EBPs. Therapist performance on the KEBSQ has been linked to scores on the EBPAS (Nakamura et al, 2011). The EBPAS includes four subscales: appeal, requirements, openness, and divergence. The range of each subscale is 0–4, where 0 = not at all; 1 = slight extent; 2 = moderate extent; 3 = great extent; and 4 = very great extent. Higher scores indicate more positive attitudes, with the exception of divergence, which is reverse coded. The EBPAS has good internal consistency (Aarons et al., 2010). In the current validation sample, the alpha for the full scale was .68, and the alphas for the subscales were: .94 (Requirements), .62 (Appeal), .78 (Openness), and .71 (Divergence).

Use of Therapy Strategies

The Therapist Procedures Checklist-Family Revised (TPC-FR; (Do, Warnick, & Weersing, 2012; Kolko, Cohen, Mannarino, Baumann, & Knudsen, 2009; Weersing et al., 2002) is a 62-item self-report measure of therapist reported use of strategies from four modalities: cognitive, behavioral, family, and psychodynamic. Given theories of behavior change stating that therapist knowledge informs therapist reported EBP use (Rogers, 2003) we hypothesize that higher KEBSQ scores would be related to higher reported use of cognitive-behavioral strategies, which tend to have the greatest support from the evidence base (Weisz, Jensen-Doss, & Hawley, 2006). Therapists were asked to select a representative client (i.e., a client that they had been seeing for some time that was most similar to their typical caseload), and rate on a scale of 1 (Rarely) to 5 (Most of the time) the extent to which they used each of 62 specific therapy strategies with the client across sessions to date. Higher scores are indicative of more use of the set of strategies for a given domain. This measure has shown good internal consistency, test-retest reliability, and sensitivity to change (Weersing et al., 2002). For this study, we used scores from the composite cognitive-behavioral scale (α in the current validation sample = .91), family (α = .93), and psychodynamic (α = .87) scales.

Organizational Culture and Climate

The Organizational Social Context (OSC; Glisson et al., 2008) is a 105-item measure of the culture and climate of mental health organizations. The current study employed the two OSC subscales that are most conceptually related to EBP knowledge: proficiency culture (i.e., the extent to which therapists in organizations are expected to keep up to date and be competent), and functional climate (e.g., the extent to which therapists in organizations believe they are able to do their jobs effectively; (Glisson et al., 2008). Scores on this measure were aggregated across therapists at the agency level for most agencies. At time points one, two, and three, the mean numbers of therapists reporting OSC across agencies were 6 (SD 3.1, range 2–14), 9 (SD 6.7, range 2–31), and 10 (SD 6.7, range 2–26), respectively. Two agencies did not employ enough front line providers to create OSC profiles; to allow these two agencies to be included in analyses, scores were created by aggregating both therapist and agency leadership responses. This decision is justified by findings that suggest that in smaller programs employing fewer therapists, therapist and agency leaders are concordant in their report of OSC (Beidas et al., 2017). Rwgs for the OSC supported aggregation as all were above the suggested 0.60 level (Bliese, 2000; Brown & Hauenstein, 1993). OSC subscales have strong internal consistency, within-system agreement, and between-system differences (Glisson et al., 2008). For the full sample for the larger study, alphas for proficiency culture ranged from .87 to .91 across waves, and alphas for functional climate ranged from .89 to .92 across waves.

Implementation Climate

The Implementation Climate Scale (ICS; Ehrhart, Aarons, & Farahnak, 2014) is an 18-item self-report measure of organizational-level factors that contribute to successful implementation. Given that education is theoretically linked to therapist knowledge, the current study employed the educational support for EBP subscale (α in the current validation sample = .91), which measures the extent to which organizations provide educational support (e.g., trainings or training materials) for EBP. Scores on this measure were aggregated across therapists at the agency level. Awgs for the ICS supported aggregation as all were above the suggested 0.60 level (Bliese, 2000; Brown & Hauenstein, 1993). This subscale has shown good internal consistency and evidence for shared variance at the agency level (Ehrhart et al., 2014).

Demographics and other characteristics

Therapists completed a demographics questionnaire that included background demographic information (e.g., age, gender, race/ethnicity), educational and employment information (e.g., highest degree, position type, licensure status, years of experience), and other clinical characteristics (e.g., active cases, hours of supervision, theoretical orientation, extent of feelings of professional burnout).

Statistical Analyses

Examining assumptions for Item Response Theory

We first performed factor analysis and non-parametric item analysis (Mokken, 1971) using the mokken and mirt R packages to determine whether the KEBSQ data met assumptions to apply Item Response Theory measurement models. Specifically, we examined whether the assumptions of monotonicity (i.e., as the overall latent construct level increases, the probability of a correct response to each item also increases), unidimensionality (i.e., only one latent construct is measured by the set of items), and local independence (i.e. items are not related to each other, other than by performance on the overall latent construct; Embretson & Reise, 2000) were met. Factor analyses were performed on polychoric correlations.

Factor analysis – problem domains

We also performed an exploratory factor analysis (EFA) to test whether responses cluster by specific problem areas. The EFA is described in further detail in the Supplementary Materials.

Identifying items for the KEBSQ Short Form

We next employed Item Response Theory (IRT; Embretson & Reise, 2000; Lord, 1980) measurement models to identify individual KEBSQ items that are most informative regarding total KEBSQ performance. IRT is a psychometric method that focuses on various characteristics of individual test or scale items (rather than a test/scale as a whole). The most common IRT model (two-parameter, or “2PL”) has two key parameters: item discrimination (i.e., how precisely the item can place an individual on the overall spectrum of knowledge) and item difficulty (i.e., how high on the spectrum of knowledge one has to be at the time of answering the item in order to have a 50% chance of answering the item correctly). These parameters are used to predict the probability of a correct response for a person at a given level of knowledge at the time of answering the item. An advantage of IRT is that its emphasis on individual items allows using the parameter estimates produced by the model to assess the overall quality of the items. Here, “quality” is determined by the amount of information provided by the item at any given level of knowledge.

The 2PL described above is intended for dichotomous (correct/incorrect) responses, but there is a direct extension of the 2PL for polytomous items called the Graded Response Model (Samejima, 1969), used here. For the KEBSQ specifically, each item is graded on four response options (i.e., whether the practice was correctly identified as derived from the evidence base for each of the four problem domains). We chose to use the Graded Response Model here because it is an optimal combination of parsimony (e.g. a third parameter would add too much complexity for this sample size) and flexibility (e.g. it allows items to have unique discrimination and difficulty parameters). Further details about IRT and this modification are provided in the Supplementary Materials.

Item-calibration and Computer Adaptive Testing (CAT) simulation were used to determine which KEBSQ items provide the most information about overall KEBSQ performance. The item-calibration and CAT-simulation methods used here have been described previously (see Moore et al., 2015; Roalf et al., 2016). Computer Adaptive Testing is a method of item administration that updates trait level estimates of an examinee as s/he responds to items. These estimates are used to determine which item (from the full KEBSQ bank) will provide the most information about that examinee, and the most informative item at that specific estimated trait level is administered. For example, if an examinee responds correctly to an item of average difficulty, the adaptive algorithm will temporarily “assume” the examinee is of above- average knowledge, and will select a more difficult item to administer next. If the examinee responds correctly to that second, more difficult item, the algorithm will update its estimate of the examinee’s knowledge to be even higher, and will administer an even more difficult item. This process continues until the examinee responds incorrectly to an item, at which point the algorithm will administer items around that difficulty range until a stopping criterion is met (e.g., the examinee’s standard error of measurement reaches some lower threshold).

While the above application is focused on the examinee, simulated CAT sessions can be used to determine which items within the full form of the test would be administered most frequently across a sample. For example, if there are some items in the full form that are never administered in the simulated CAT sessions—either because they are too difficult/easy or because they are not very discriminating—those items might be removed from the test form with no loss in information. Thus, simulated CAT sessions can be used to determine which items within the full form of the test would be administered most frequently across a sample of participants based on, (a) item discriminations, (b) item difficulties, and (c) overall performance levels of the participants. The most frequently administered items in the simulation were retained in the short version of the test.

Item parameters (discriminations and difficulties) were inputted to Firestar software (Choi, 2009) to simulate CAT sessions. The stopping-rule for the simulated CAT sessions was to stop testing when the standard error of measurement (SEM) reached 0.39 or less given that it corresponds to a Classical Test Theory-based reliability (Cronbach’s α) of 0.85, which is in the middle of the “Good” range (0.80 – 0.90).

Sensitivity analysis

In order to examine whether results were affected by use of a different years’ scoring key, we also conducted a sensitivity analysis in which the KEBSQ was scored according to the most recent available scoring key at the time of data collection (i.e., the 2013 scoring key for data from 2013 and 2015, and the 2017 scoring key for data from 2017).

Instrument validation

We examined validity evidence for the KEBSQ Short Form using the separate validation sample (N = 92) by examining three questions. First, we assessed the correlations between the Short and Full Forms of the KEBSQ, with a high correlation providing validity evidence for the Short Form. Second, we examined the internal consistency of the KEBSQ Short Form as compared to the Full Form, with similar internal consistency across the forms providing validity evidence for the Short Form. Finally, we examined relationships between each form of the KEBSQ with the following measures of theoretically related individual and organizational constructs, with similar correlations between the Short Form and related constructs and the Full Form and related constructs providing validity evidence for the Short Form.

Relationships among test forms and validity criteria were assessed using linear mixed models accounting for within-organization covariance. Differences between short-form and full-form effects were tested for statistical significance using bootstrapping (Efron, 1979). Specifically, for each bootstrapped sample, the signed difference between short-form and full-form effects was recorded, resulting in a distribution of 1000 differences; if zero lay between the 5th and 95th percentile of the distribution, the null hypothesis of zero difference could not be rejected. If zero lay outside the 5th-95th percentile range, the null hypothesis was rejected. Note that the percentile range used here could justifiably have been 2.5 to 97.5 rather than five to 95, because we had no directional hypothesis about differences between forms. However, use of the narrower (less conservative) percentile range makes the findings of no differences between forms especially compelling.

Note that, while IRT models were used for identification of items for the Short Form, analyses examining reliability and validity evidence for the Short Form used Classical Test Theory. The complexity of IRT scoring and standard error calculation could create a barrier to use of the Short Form by some researchers. For the sake of simplicity and reader familiarity, scoring of the Short Form is done using unit-weighting (sum score), and reliability of the short form is assessed using Cronbach’s α. Importantly, the switch from IRT to Classical Test Theory in evaluating the short form does not change the fact that the items selected provide the most information, even for a sum score. For example, the Roalf-Moore short-form Montreal Cognitive Assessment (MoCA; Roalf et al., 2016) was developed using the same method used here, including use of sum scoring for the short-form, and recent evaluation of MoCA short forms found the Roalf-Moore form to be the best available (Liew, 2019)

Results

Descriptive statistics

Descriptive statistics of scores for all instruments in the study for the analytic and validation samples are in Table 2.

Table 2.

Descriptive Statistics for the KEBSQ and Related Measures

Variable Analytic Sample (N = 465)
Validation Sample (N = 96)
Possible Range Observed Range M(SD) Possible Range Observed Range M(SD)
Individual Level
KEBSQ Total Score 0–160 49–122 95.12 (9.76) 0–160 65–117 96.76 (9.59)
EBPAS
 Requirements 0–4 0–4 2.79 (1.02) 0–4 0–4 2.74 (1.13)
 Appeal 0–4 .5–4 3.20 (.68) 0–4 2–4 3.24 (.57)
 Openness 0–4 .75–4 3.04 (.69) 0–4 1.5–4 3.14 (.61)
 Divergence (Reverse Scored) 0–4 0–4 2.73 (.71) 0–4 .5–4 2.91 (.67)
 Total 0–4 1.44–3.94 2.94 (.49) 0–4 1.88–3.94 3.01 (.46)
Therapist Procedures Checklist
 Psychodynamic 1–5 1.44–3.94 3.40 (.65) 1–5 1.13–4.87 3.01 (.46)
 Family 1–5 1.07–5 3.34 (.89) 1–5 1–5 3.30 (1.02)
 Cognitive Behavioral 1–5 1.73–4.79 3.24 (.64) 1–5 1.18–4.85 3.32 (.65)
Burnout 0–10 0–10 0–10 4.52 (2.69)
Organizational Level
ICS - Educational Support 0–4 0–4 2.05 (1.18) 0–4 1–3.48 2.30 (.69)
OSC - Proficiency Climate 12.52–70.07 51.82 (9.17) 26.01–70.07 52.73 (9.51)
OSC - Functionality Culture 15.41–84.43 61.54 (10.49) 43.49–81.48 62.43 (9.46)

Note. KEBSQ = Knowledge of Evidence Based Practice Questionnaire. EBPAS = Evidence-Based Practice Attitude Scale. ISC = Implementation Climate Scale. OSC = Organizational Social Context.

Examining Assumptions for Item Response Theory

Factor analysis – unidimensional model

Because the KEBSQ was designed to assess a single latent construct (knowledge of evidence-based practices), we first extracted a single factor (unidimensional model) via least squares. Results indicated violations of unidimensionality and local independence assumptions of IRT; the assumption of monotonicity was also violated on 27 (68%) of items (see Supplementary Table 1). Notably, 17 of 40 items loaded negatively on the KEBSQ total score (factor loadings ranged from −.76 [item 22; Supportive listening] to .78 [Item 11; Psychoeducation for Caregiver]. This indicates the presence of items for which a correct response should result in a lower overall score, but in the current scoring model results in a higher score. This provides further evidence for violations of the assumption of non-decreasing monotonicity (Embretson & Reise, 2000). The opposite loadings in the unidimensional model results provide evidence that items tended to cluster by whether they warranted more endorsements or non-endorsements (“rejections”); items warranting correct endorsements tended to load positively while items warranting correct rejections tended to load negatively.

Factor analysis – alternative scoring model

Based on these results, we rejected the unidimensional model and separated analyses into two scores: Correct Endorsements (i.e., items that were correctly endorsed as deriving from the evidence base for a particular domain) and Correct Rejections (i.e., items that were correctly rejected as not deriving from the evidence base for a particular domain). An individual’s overall Correct Endorsement score can be conceptualized as one’s ability to correctly endorse practices as derived from the evidence base, and one’s Correct Rejection score can be conceptualized as the ability to correctly reject practices as being not derived from the evidence base. These models showed acceptable performance for use in IRT models, including assumptions of monotonicity, unidimensionality, and local independence; this scoring was thus used in all subsequent analyses (see Supplementary Table 2 and Supplementary Results for more details and results of the two separate unidimensional factor analyses). Thus, the KEBSQ Short Form developed through IRT models and CAT-simulation generates two scores: Correct Endorsements and Correct Rejections. Note that this method of scoring the KEBSQ provides the same information conceptually as scoring according to omission errors and commission errors; Correct Endorsement and Correct Rejection scoring was used here, rather than omission and commission errors, for ease of use with the current analytic methods.

Factor analysis – problem domains

The above findings draw attention to another potential phenomenon discoverable in this data. Specifically, in addition to the general tendency to either endorse or not endorse, are examinees able to endorse practices as derived from the evidence base for specific problem areas? For example, if someone endorses a practice as derived from the evidence base for treating “Disruptive Behavior”, is s/he more likely to endorse “Disruptive Behavior” the problem domain for other practices as well? To investigate this, we performed an exploratory factor analysis (EFA). The EFA is described in further detail in the Supplementary Materials. The overall results of the EFA do not provide evidence that therapists are more likely to endorse practices as derived from the evidence base for specific problem areas: none of the five factors that emerged corresponded to a specific problem area.

Identifying Items for the KEBSQ Short Form

Item Response Theory

Table 3 shows the models estimated using the IRT Graded Response Model. For the Correct Endorsements score, the most highly discriminating item was Item 10 (Commands/limit setting); the least highly discriminating item was Item 1 (Exposure). The most difficult item was Item 8 (Skill building/behavioral rehearsal) the least difficult item was Item 1 (Exposure).

Table 3.

Graded Response Model IRT Parameter Estimates (Logistic Metric) for KEBSQ Correct Endorsements and Correct Rejections

Item Practice element Correct Endorsements
Correct Rejections
Difficulty Parameters
Difficulty Parameters
α β1 β2 β3 β4 α β1 β2 β3 β4
1 Exposure 0.57 −4.52 0.65 −5.69 −3.80 −1.40
2 Modeling 1.32 −3.83 −0.88 1.14 1.87
3 Relaxation 1.34 −4.62 −1.28 −0.07 1.29
4 Therapist praise/rewards 1.69 −2.79 −0.82 0.87 1.22
5 Self-monitoring 1.39 −2.62 −0.63 0.54 0.68 −0.11
6 Psychoeducation - child 1.28 −2.33 −0.92 0.06 0.43
7 Activity scheduling 1.63 −1.80 1.09 −1.57 −1.04 0.68
8 Skill building/behavioral rehearsal 1.52 −0.11 1.29 −0.15 0.87 2.51
9 Self-reward/self-praise 1.60 −1.28 0.26 0.54 1.08 1.81
10 Commands/limit setting 2.36 −2.13 −1.06 1.16 −1.49 −1.02
11 Psychoeducation - parent 2.29 −2.05 −1.01 −0.45 −0.34
12 Response cost 1.01 −2.64 −0.81 0.44 −7.74 −5.17
13 Tangible rewards 1.70 −2.87 −1.07 1.40 1.08 −2.13
14 Parent praise 1.91 −2.46 −1.06 0.47 1.28 −0.31
15 Parent-monitoring 1.04 −0.80 −0.02 1.30 0.81 1.71
16 Directed play 1.14 −1.26 0.09 1.24 −0.32 0.75
17 Stimulus/antecedent control 1.63 −2.69 −0.75 0.86 1.17 −0.14
18 Social skills training 1.82 −3.02 −1.00 0.08 0.62
19 Family engagement 1.55 −1.01 1.64 0.31 0.64 1.34
20 Crisis management 1.85 −0.54 −0.25 0.61 1.79
21 Play therapy 1.30 −0.83 −0.51 0.61 1.85
22 Supportive listening 2.00 0.23 0.34 0.95 2.09
23 Parent coping 2.02 −1.89 −0.85 −0.05 1.46 0.66
24 Emotional processing 1.11 −2.08 −1.34 0.72 2.23
25 Mentoring 1.46 −1.22 −0.85 0.20 1.45
26 Family therapy 1.57 −1.12 2.01 −0.08 0.54 1.48
27 Relationship/rapport building 2.04 −1.94 −1.22 −0.79 −0.63
28 Educational support 0.82 −2.36 0.04 1.02 −1.72 −1.13
29 Maintenance/relapse prevention 1.43 −2.10 −1.00 −0.33 −0.12
30 Peer modeling/pairing 1.14 −2.30 −1.68 0.01 1.49
31 Cognitive/coping 1.13 −2.94 −1.05 1.04 1.27 −1.25
32 Natural/logical consequences 1.05 −2.18 −0.12 0.97 −2.46 −1.84
33 Insight building 1.49 −1.68 −0.05 1.92 −0.22 0.93
34 Assertiveness training 1.16 −1.57 1.28 1.41 −1.40 1.15
35 Problem solving 1.42 −2.69 −0.87 0.35 0.82
36 Time-out 1.04 −2.89 −0.33 0.68 −5.33 −3.83
37 Ignoring or DRO 1.14 −2.73 −0.59 2.40 1.06 −2.59
38 Communication skills 1.40 −2.24 −0.40 0.40 1.31 0.37
39 Line of sight supervision 1.43 −1.40 −0.86 0.36 1.35
40 Milieu therapy 0.95 −1.42 −1.00 0.09 1.60

Note. α = discrimination parameter—i.e. the slope of the item characteristic curve at its inflection point. Difficulty parameters represent thresholds [on a standard normal (z) scale] between item performance of 0 vs. 1, 1 vs. 2, 2 vs. 3, etc. The number of correct endorsement and correct rejection difficulty parameters varies between items because difficulty parameters are available for each possible correct endorsement and correct rejection, and items vary in the number of correct endorsements and correct rejections that are possible. Note that items with more response options (higher possible total score) will contribute more information (on average), but note that there are diminishing returns on additional information gain from adding additional response options

For the Correct Rejections score, the most highly discriminating item was Item 26 (Family therapy) and the least highly discriminating item was Item 12 (Response cost). The most difficult item was Item 9 (Self-reward/self-praise) and the least difficult item was Item 12 (Response cost).

CAT-simulation

Table 4 shows the item-administration statistics for the KEBSQ Correct Endorsement and Correct Rejection CAT simulations—specifically, the percentage of examinees who were administered each item during the simulation. For the Correct Endorsements score, the most frequently administered (i.e., most informative) item was Item 11 (Psychoeducation – caregiver); administered to 100% of examinees), and the three items with lowest administration frequencies were Item One (Exposure), Item Seven (Activity scheduling), and Item 26 (Family therapy), each administered to 1.8% of examinees. For the Correct Rejections score, the three most frequently administered items were Item 20 (Crisis management), Item 26 (Family therapy), and Item 33 (Insight building); and the two least frequently administered items were Item Nine (Self-reward/self-praise) and Item 12 (Response cost). Based on established cut-offs (i.e., stopping testing when the SEM reached .39 or less; see Statistical Analyses) eight items were retained to generate the Correct Endorsement Score and nine items were retained to generate the Correct Rejection Score on the KEBSQ Short Form.

Table 4.

CAT Simulation Item-Administration Percentage for KEBSQ Correct Endorsements and Correct Rejections

KEBSQ Correct Endorsements KEBSQ Correct Rejections

Item Practice element Percent Of Examinees Administered Item Practice element Percent of Examinees Administered
1 Exposure 1.8 1 Exposure 4.4
2 Modeling 17.9 2 Modeling
3 Relaxation 21.7 3 Relaxation
4* Therapist praise/rewards 69.4 4 Therapist praise/rewards
5 Self-monitoring Psychoeducation - 12.2 5 Self-monitoring Psychoeducation - 2.6
6 child 4.3 6 child
7 Activity scheduling 1.8 7 Activity scheduling 15.9
8 Skill building/behavioral rehearsal 3.3 8 Skill building/behavioral rehearsal 24.9
9* Self-reward/self-praise 54.3 9 Self-reward/self-praise 2.1
10 Commands/limit setting 35.5 10 Commands/limit setting 15.1
11* Psychoeducation - parent 100.0 11 Psychoeducation - parent
12 Response cost 2.0 12 Response cost 2.1
13 Tangible rewards 33.4 13 Tangible rewards 9.7
14* Parent praise 86.7 14 Parent praise 15.6
15 Parent-monitoring 3.1 15 Parent-monitoring 6.4
16 Directed play 3.3 16 Directed play 18.5
17* Stimulus/antecedent control 51.5 17 Stimulus/antecedent control 4.9
18* Social skills training 81.9 18 Social skills training
19 Family engagement 2.0 19* Family engagement 74.9
20 Crisis management 20* Crisis management 100.0
21 Play therapy 21* Play therapy 55.1
22 Supportive listening 22* Supportive listening 89.7
23* Parent coping 87.8 23 Parent coping 5.4
24 Emotional processing 24 Emotional processing 19.2
25 Mentoring 25* Mentoring 83.8
26 Family therapy 1.8 26* Family therapy 100.0
27* Relationship/rapport building 53.6 27 Relationship/rapport building
28 Educational support 2.6 28 Educational support 11.8
29 Maintenance/relapse prevention 3.3 29 Maintenance/relapse prevention
30 Peer modeling/pairing 30 Peer modeling/pairing 25.9
31 Cognitive/ coping 7.1 31 Cognitive/ coping 18.5
32 Natural/logical consequences 2.8 32 Natural/logical consequences 7.9
33 Insight building 3.3 33* Insight building 100.0
34 Assertiveness training 8.2 34* Assertiveness training 38.2
35 Problem solving 31.6 35 Problem solving
36 Time-out 2.0 36 Time-out 2.8
37 Ignoring or DRO 6.9 37 Ignoring or DRO 6.4
38 Communication skills 8.9 38 Communication skills 4.1
39 Line of sight supervision 39* Line of sight supervision 72.6
40 Milieu therapy 40 Milieu therapy 10.8

Note. CAT stopping rule was SEM < 0.39, equivalent to a CTT-based alpha of 0.85.

*

indicates item retained in short form

Based on these results, a total of 17 items were retained on the KEBSQ Short Form. Eight items (4, 9, 11, 14, 17, 18, 23, 27) are used to generate the Correct Endorsement score, and nine items (19, 20, 21, 22, 25, 26, 33, 34, and 39) are used to generate the Correct Rejection score. Among the validation sample, the mean Correct Endorsement score on the Short Form was 19.09 (SD = 5.84, range 0 to 28), and the mean Correct Rejection score on Short Form was 11.99 (SD = 7.09, range 0 to 30). Means and standard deviations for the items retained in the KEBSQ Short Form are shown in Table 5.

Table 5.

Means and Standard Deviations for KEBSQ Short Form Items

KEBSQ Correct Endorsements KEBSQ Correct Rejections

Item Practice element Mean SD Item Practice element Mean SD
4 Therapist praise/rewards 2.13 1.33
9 Self-reward/self-praise 1.54 1.08
11 Psychoeducation - parent 3.15 1.08
14 Parent praise 2.17 .70
17 Stimulus/antecedent control 2.00 .81
18 Social skills training 2.42 1.15 19 Family engagement .74 1.09
20 Crisis management 1.27 1.39
21 Play therapy 1.72 1.54
22 Supportive listening 1.10 1.28
23 Parent coping 2.33 .93 25 Mentoring 2.16 1.34
26 Family therapy 1.10 1.19
27 Relationship/rapport building 3.33 1.25 33 Insight building .85 .77
34 Assertiveness training .99 .65
39 Line of sight supervision 2.13 1.33

Sensitivity analysis

In order to examine whether results were affected by use of a different years’ scoring key, we also conducted a sensitivity analysis in which the KEBSQ was scored according to the most recent available scoring key at the time of data collection. When this scoring approach was used, the set of items identified for the Correct Endorsement Short Form was identical to the items identified in the primary analysis. However, it was not possible to use this scoring approach for the Correct Rejection analysis, due to the presence of items that had no Correct Rejection score (i.e., the answer key indicated that all four problem areas were correct responses) for only one of the years.

Instrument Validation

Each of the analyses to examine validity evidence for the KEBSQ Short Form were conducted separately for the Correct Endorsement score and Correct Rejection score. The original KEBSQ Full Form was also scored according to Correct Endorsements and Correct Rejections to facilitate comparison between the performance of the Short and Full Forms.

Correlations between the KEBSQ Short and Full Forms

Scores on the KEBSQ Short Form - Correct Endorsement Score correlated highly with scores on the KEBSQ Full Form - Correct Endorsement Score (Levy-corrected r = .85, p < .001). Additionally, scores on the KEBSQ Short Form - Correct Rejection Score correlated highly with scores on the KEBSQ Full Form - Correct Rejection Score (Levy-corrected r = .84, p < .001).

Internal consistency

The Short Form showed a Cronbach’s alpha of .79 when scored according to Correct Endorsements and .83 when scored according to Correct Rejections. If the Short Form was scored according to total points, using the traditional scoring approach for the KEBSQ, it showed an alpha of .25.

Similarly, the Full Form showed a Cronbach’s alpha of .93 when scored according to Correct Endorsements, .88 when scored according to Correct Rejections, and .46 when scored according to total points using traditional KEBSQ scoring.

Relationships with related constructs

Table 6 shows mixed model standardized β coefficients between both forms (i.e., Short and Full) of the KEBSQ and criterion measures of theoretically relevant constructs. Results for the Correct Endorsement Scoring of the KEBSQ are shown on the left side of the table, and results for Correct Rejection Scoring are shown on the right side of the table. Both the Short Form – Correct Endorsement Score and the Full Form – Correct Endorsement Score related positively and significantly with the EBPAS Appeal scale and Total EBPAS score. That is, therapists who rate EBPs as more appealing tend to be more likely to correctly endorse practices as being derived from the evidence base. For the Correct Endorsement Scoring, three criterion measures (EBPAS Divergence, Burnout, OSC Proficiency) related differently to the Short Form KEBSQ, compared to the Full Form KEBSQ. Specifically, the mixed model effect size between the EBPAS Divergence score and KEBSQ Short Form – Correct Endorsement Score (β = .20) was significantly higher than the correlation with the KEBSQ Full Form - Correct Endorsement Score (β= .09; p-value for bootstrapped difference = 0.015). Additionally, the relationship between the measure of burnout and the KEBSQ Full Form - Correct Endorsement Score was significantly stronger (β =−.14) than the relationship between burnout and KEBSQ Short Form - Correct Endorsement Score (β =−.08; p-value = 0.049). Finally, there was a significant difference between forms for the OSC Proficiency score (β = .01 with the Short Form and β = −.01 with the Full Form; p-value = 0.047). None of the other criterion measures showed a different relationship with the KEBSQ Short Form - Correct Endorsement Scoring compared to the KEBSQ Full Form - Correct Endorsement Scoring.

Table 6.

Mixed Model Standardized Beta Coefficients Predicting Validity Criteria Using the KEBSQ Correct Endorsement Short Form, Correct Rejection Short Form, and KEBSQ Full Forms Using the Validation Sample (N = 92)

Correct Endorsement Scoring
Correct Rejection Scoring
Criterion Short Form Full Form Significant Difference Between Forms? Short Form Full Form Significant Difference Between Forms?
Individual Level
EBPAS Requirements .20 .20 No −.11 −.18 No
EBPAS Appeal .27 .29* No −.19 −.18 No
EBPAS Openness −.09 −.05 No .02 .04 No
EBPAS Divergence .20 .09 Yes; p = .015 −.05 .003 No
EBPAS Total .24* .22* No −.13 −.15 No
TPC Psychodynamic −.05 .003 No −.26* −.30* No
TPC Family .06 .09 No −.21* −.20* No
TPC Cognitive Behavioral .07 .09 No −.27* −.31* No
Burnout −.08 −.14 Yes; p = .049 .05 .05 No
Organizational Level
ICS – Educational Support .05 .05 No −.05 −.05 No
OSC- Proficiency .01 −.01 Yes; p = .047 −.002 .006 No
OSC - Functionality −.01 −.02 No −.004 .01 No

Note. KEBSQ = Knowledge of Evidence Based Practice Questionnaire. EBP AS = Evidence- Based Practice Attitude Scale. TPC = Therapist Procedures Checklist. ISC = Implementation Climate Scale. OSC = Organizational Social Context.

*

p < .05

For the Correct Rejection Scoring, both the Short Form and the Full Form related negatively and significantly with the Therapist Procedures Checklist Psychodynamic (Short Form β = −.26; Full Form β = −.30), Family (Short Form β = −.21; Full Form β = −.20), and Cognitive Behavioral (Short Form β = −.27; Full Form β = −.31) scales. None of the criterion measures showed a different relationship with the KEBSQ Short Form - Correct Rejection Scoring compared to the KEBSQ Full Form - Correct Rejection Scoring.

Discussion

This study empirically derived and provides validity evidence for a 17-item KEBSQ Short Form that generates two separate scores: Correct Endorsements (based on eight items) and Correct Rejections (based on nine items). Analyses indicated that the Short Form correlates highly with and performs similarly to the Full Form, providing preliminary validity evidence. This Short Form offers two primary advantages over the original KEBSQ. First, the Short Form is less burdensome to administer than the Full Form, which offers pragmatic advantages for research and practice. Second, scoring the KEBSQ according to Correct Endorsements and Correct Rejections, as opposed to according to a single total score, offers a more nuanced understanding of therapist knowledge by distinguishing between awareness knowledge of practices that are derived from the evidence base, and awareness knowledge of practices that are not derived from the evidence base.

Although our original goal was simply to develop an overall KEBSQ Short Form, tests of the psychometric properties of the KEBSQ revealed that such an approach would require violating numerous assumptions of IRT measurement models. Furthermore, when either the Short Form or Full Form of the KEBSQ was scored using traditional scoring (i.e., a single total score), it showed rather low internal consistency, while internal consistency was higher when the KEBSQ was scored according to Correct Endorsements, and Correct Rejections. Therefore, we instead examined KEBSQ scores for the Full Form and Short Form according to Correct Endorsements and Correct Rejections. Given the clear dimensionality of the KEBSQ, it may be advisable for future studies to explore the potential use of scores that capture these dimensions (e.g., correct endorsements and correct rejections, or omission errors and commission errors), rather than the total score.

These results highlight some limitations of the KEBSQ. When it was scored according to traditional scoring, the KEBSQ did not seem to measure one construct. Similarly, when the KEBSQ was scored according to Correct Rejections and Correct Endorsements, these scores may capture therapists’ tendencies to endorse or reject practices as derived from the evidence base, as well as their knowledge of the practices. Scoring the KEBSQ according to Correct Rejections and Correct Endorsements may, in fact, provide useful information to inform training and implementation efforts. For example, therapists who tend to over endorse practices as derived from the evidence base may benefit from different implementation strategies than those who tend to under endorse. However, these results also suggest the need for additional measure development to assess provider knowledge. In particular, measures assessing other aspects of knowledge (e.g., procedural knowledge about how to implement EBPs of interest) may provide useful information above and beyond the declarative knowledge assessed by the KEBSQ.

We also examined the related question of whether therapists were more likely to endorse practices as derived from the evidence base for specific problems areas, and did not find evidence that this was the case. These results are consistent with prior finding that the factors that emerge for KEBSQ items can be best described by the extensiveness (i.e., degree to which each of the items within the factor was considered to be derived from EBP for each problem) and coverage (i.e., the extent to which an item on the KEBSQ was considered to be derived from EBP across the four problem areas) of the practice, rather than particular problem areas (Okamura et al., 2016).

The items retained on the KEBSQ Short Form were those that were most informative. For items that generate the Correct Endorsement score, retained items (e.g., therapist praise/rewards, self-reward, stimulus/antecedent control, parent praise) tended to be behavioral in nature. Several of these items (e.g., parent psychoeducation, parent praise, parent coping) focused on parental involvement. Interestingly, exposure was the least discriminating and least difficult item. The latter, low difficulty, indicates that the majority of therapists in the sample were able to correctly indicate that exposure is derived from the evidence base for anxiety, despite evidence that exposure is rarely used in routine clinical or community mental health settings (Becker-Haimes et al., 2017). For items that generate the Correct Rejection score, retained items (e.g., supportive listening, family therapy, family engagement) tended to be items that reflected general support, rather than specific skill building. The item reflecting response cost was the least difficult item, indicating that most therapists correctly identified that response cost is not derived from the evidence base for anxiety or depression.

To examine validity evidence for the KEBSQ Short Form, we examined relationships with theoretically related constructs. We found that the KEBSQ Correct Endorsements score related positively and significantly to therapist attitudes toward evidence-based practices, as measured by the EBPAS Total Score and Appeal Scale Score. This is consistent with the finding that therapists who make more errors in which they fail to correctly endorse a practice as derived from the evidence base tended to show more negative attitudes toward EBPs (Nakamura et al., 2011). It is perhaps not surprising that therapists who are more likely to correctly endorse that practices are derived from the evidence base are also more likely to have more positive attitudes toward EBPs; although the direction of causality is not clear from the current results, this suggests the possibility that increasing therapist knowledge that certain practices are derived from evidence may be a useful target when aiming to increase attitudes toward EBPs. We also found that KEBSQ Correct Rejections related negatively and significantly with therapist self-reported use of psychodynamic, family, and cognitive behavioral strategies, as measured by the TPC. This raises the possibility that therapists who are more skeptical about EBPs tend to make more correct rejections on the KEBSQ and also tend to report less use of therapy strategies across modalities.

The current study should be considered within the context of its limitations. First, analyses relied on a sample of therapists from a single behavioral health system, in the context of a centralized system infrastructure supporting the use of evidence-based practices in the public community mental health agencies. Therefore, it is possible that results may be specific to this sample. For example, therapists may have shown greater knowledge about practices that were more heavily emphasized in system-wide initiatives (e.g., the City of Philadelphia has supported a number of cognitive-behavioral therapy initiatives). Additionally, the analytic sample size was slightly below the commonly recommended sample size of 500. Furthermore, the validation sample used here consisted of the same therapists used in the analytic sample, with validation sample data drawn from a different time point; we would not expect the repeated-measures aspect of the validation sample to bias the validation results, although it is possible that this may have led to unknown complications. It will therefore be important for future research to validate the Short Form in other samples. It will also be important for future research to examine the extent to which the KEBSQ Short Form predicts implementation success.

One important caveat regarding the pragmatic nature of the KEBSQ relates to the dynamic scoring key as informed by PracticeWise (2017). Currently, researchers and community stakeholders can request the scoring key. The field of implementation science will continue to grapple with the decision to use static versus dynamic knowledge instruments and future studies should examine the predictive utility of theses varying types of measures. The current study used data from unique therapists across three time points, but scored the KEBSQ for all time points according to the 2017 scoring. While there is little variability in the scoring procedures across these time points, scores for therapists who completed the KEBSQ at earlier time points may not fully reflect the evidence available at that time. We conducted a sensitivity analysis, which provided evidence that using the changing scoring key did not change results for the Correct Endorsement models, although it was not possible to conduct these analyses for the Correct Rejection models. Additionally, it is possible that there exists some unmodeled multidimensionality in the KEBSQ related to the four problem domains. For example, some therapists may have greater knowledge regarding a specific problem domain. This was not captured by the analyses in the current paper and is an important direction for future research.

Additionally therapist reported use of strategies was measured by self-report (i.e., without verification by independent observers) and prior evidence examining concordance between therapist self-report and observer ratings regarding use of therapy processes has been mixed (Hurlburt, Garland, Nguyen, & Brookman-Frazee, 2010; Ward et al. 2013). Furthermore, when therapists completed the TPC-FR, they were instructed to “select a representative client” from their current caseload and indicate their use of strategies with that particular client; however, we did not validate the representativeness of the client and did not collect key information (e.g., demographics, symptom severity, impairment, or treatment progress) about the clients that were selected. It is important for future work to rigorously examine the psychometric properties of this method of measuring therapist reported use of strategies.

Finally, relationships between the KEBSQ and theoretically-related constructs tended to be low to moderate. This is consistent with previous findings that therapist knowledge does not relate to variables such as years of training and theoretical orientation (Nakamura et al., 2011). Indeed, it is possible that the theoretically-related constructs examined here were not good measures of convergence with the KEBSQ; for instance, a therapist may correctly indicate declarative knowledge of a practice on the KEBSQ but may choose not to use the practice for a number of reasons. Nonetheless, the key finding that these differences between the Short Form and Full Form were consistently non-significant can reasonably be expected to extend to criteria that correlate more strongly with knowledge (e.g., we would expect an alternative measure of knowledge to relate equally strongly to the Short Form KEBSQ and the Full Form KEBSQ).

Despite these limitations, this study addresses an important gap in the field of implementation science: the need for valid, reliable, and practical measures of therapist factors (Lewis et al., 2015). Therapist knowledge is a necessary precursor to EBP use, and it is therefore important to measure how knowledge changes over time in response to implementation efforts; valid, reliable, and practical measures of knowledge are needed in order to do so. Our results also suggest the possibility of developing targeted implementation strategies for different groups of therapists: for instance, therapists with lower Correct Endorsement scores may benefit from different strategies than therapists with lower Correct Rejection scores. An organization might therefore administer the KEBSQ Short Form to therapists and use the results to inform overall training needs or to plan differentiated trainings or consultations for different groups of therapists (e.g., those with high KEBSQ scores, those with low Correct Endorsement scores, those with low Correct Rejection scores).

Our results were strengthened by the use of separate analytic and validation samples, which is considered the gold standard approach for measure development and validation (Chernyshenko, Stark, Chan, Drasgow, & Williams, 2001; Edelen & Reeve, 2007). Furthermore, this study demonstrates how IRT with CAT simulation offers a useful approach to develop and validate a shorter version of a lengthy measure. This approach has the potential to be applied to other measures of therapist and organizational factors, in order to answer the call for the development of pragmatic measures in implementation science.

Supplementary Material

Supplementary Material

Funding

This study was supported by the following grants from the US National Institute of Mental Health: K23 MH099179, Beidas; T32 MH109433

Footnotes

Conflicts of Interest: Dr. Beidas reported receiving consulting fees from Merck Sharpe & Dohme and the Camden Coalition of Healthcare Providers; and royalties from the Oxford University Press. Drs. Lawson, Moore, Okamura, and Becker-Haimes declare that they have no conflicts of interest.

Compliance with Ethical Standards:

Ethical Approval: All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed Consent: Informed consent was obtained from all individual participants included in the study.

This is a post-peer-review, pre-copyedit version of an article published in Administration and Policy in Mental Health and Mental Health Services Research. The final authenticated version is available online at: http://dx.doi.org/10.1007/s10488–020-01020–7

References

  1. Aarons GA (2004). Mental Health Provider Attitudes Toward Adoption of Evidence-Based Practice: The Evidence-Based Practice Attitude Scale (EBPAS). Mental Health Services Research, 6(2), 61–74. 10.1023/B:MHSR.0000024351.12294.65 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Becker-Haimes EM, Okamura KH, Wolk CB, Rubin R, Evans AC, & Beidas RS (2017). Predictors of clinician use of exposure therapy in community mental health settings. Journal of Anxiety Disorders, 49, 88–94. 10.1016/j.janxdis.2017.04.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Beidas RS, & Kendall PC (2010). Training therapists in evidence-based practice: A critical review of studies from a systems-contextual perspective. Clinical Psychology: Science and Practice, 17(1), 1–30. 10.1111/j.1468-2850.2009.01187.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Beidas RS, Aarons GA, Barg F, Evans A, Hadley T, Hoagwood K, Marcus S, Schoenwald S, Walsh L, Mandell DS (2013) Policy to implementation: evidence-based practice in community mental health – study protocol. Implementation Science 8 (1), 38 10.1186/1748-5908-8-38 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Beidas RS, Marcus S, Wolk CB, Powell B, Aarons GA, Evans AC, Hurford MO, Hadley T, Adams DR, Walsh LM, Babbar S, Barg F, & Mandell DS (2016). A prospective examination of clinician and supervisor turnover within the context of implementation of evidence-based practices in a publicly-funded mental health system. Administration and Policy in Mental Health and Mental Health Services Research, 43(5), 640–649. 10.1007/s10488-015-0673-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Beidas RS, Williams NJ, Green PD, Aarons GA, Becker-Haimes EM, Evans AC, … Marcus SC (2018). Concordance Between Administrator and Clinician Ratings of Organizational Culture and Climate. Administration and Policy in Mental Health and Mental Health Services Research, 45(1), 142–151. 10.1007/s10488-016-0776-8 [DOI] [PubMed] [Google Scholar]
  7. Bliese P (2000). Within-group agreement, non-independence, and reliability: implications for data aggregation and analysis In: Klein K, Kozlowski S, eds. Multilevel Theory, Research, and Methods in Organizations. San Francisco, CA: Jossey-Bass; 349–380. [Google Scholar]
  8. Brown RD & Hauenstein NMA (1993). Rwg: an assessment of within-group interrater agreement. Organizational Research Methods. 78(2), 306–309. [Google Scholar]
  9. Chernyshenko OS, Stark S, Chan K-Y, Drasgow F, & Williams B (2001). Fitting Item Response Theory models to two personality inventories: Issues and insights. Multivariate Behavioral Research, 36(4), 523–562. 10.1207/S15327906MBR3604_03 [DOI] [PubMed] [Google Scholar]
  10. Choi SW (2009). Firestar: Computerized Adaptive Testing simulation program for polytomous Item Response Theory models. Applied Psychological Measurement, 33(8), 644–645. 10.1177/0146621608329892 [DOI] [Google Scholar]
  11. Chorpita BF, Becker KD, & Daleiden EL, (2007). Understanding the common elements of evidence-based practice. Journal of the American Academy of Child & Adolescent Psychiatry, 46(5), 647–652. 10.1097/chi.0b013e318033ff71 [DOI] [PubMed] [Google Scholar]
  12. Chorpita BF, & Daleiden EL (2009). Mapping evidence-based treatments for children and adolescents: Application of the distillation and matching model to 615 treatments from 322 randomized trials. Journal of Consulting and Clinical Psychology, 77(3), 566–579. 10.1037/a0014565 [DOI] [PubMed] [Google Scholar]
  13. Chorpita BF, Daleiden EL, Ebesutani C, Young J, Becker KD, Nakamura BJ, … Starace N (2011). Evidence-based treatments for children and adolescents: An updated review of indicators of efficacy and effectiveness. Clinical Psychology: Science and Practice, 18(2), 154–172. 10.1111/j.1468-2850.2011.01247.x [DOI] [Google Scholar]
  14. Chorpita BF, Daleiden EL, & Weisz JR (2005). Modularity in the design and application of therapeutic interventions. Applied and Preventive Psychology, 11(3), 141–156. 10.1016/j.appsy.2005.05.002 [DOI] [Google Scholar]
  15. Damschroder LJ, Aron DC, Keith RE, Kirsh SR, Alexander JA, & Lowery JC (2009). Fostering implementation of health services research findings into practice: a consolidated framework for advancing implementation science. Implementation Science, 4(1). 10.1186/1748-5908-4-50 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Dearing JW (2009). Applying diffusion of innovation theory to intervention development. Research on Social Work Practice, 19(5), 503–518. 10.1177/1049731509335569 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Do M, Warnick E, & Weersing V (2012). Examination of the psychometric properties of the Therapy Procedures Checklist-Family Revised. Poster Presented at the Annual Meeting of the Association for Behavioral and Cognitive Therapies (ABCT). Presented at the National Harbor, MD. National Harbor, MD. [Google Scholar]
  18. Edelen MO, & Reeve BB (2007). Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Quality of Life Research, 16(1), 5–18. [DOI] [PubMed] [Google Scholar]
  19. Efron B (1979). Bootstrap methods: Another look at the jack- knife. The annals of statistics, 7(1), 1–26. [Google Scholar]
  20. Ehrhart MG, Aarons GA, & Farahnak LR (2014). Assessing the organizational context for EBP implementation: the development and validity testing of the Implementation Climate Scale (ICS). Implementation Science, 9(1). 10.1186/s13012-014-0157-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Embretson SE, & Reise SP (2000). Item Response Theory for psychologists. Mahwah, NJ: Lawrence Earlbaum. [Google Scholar]
  22. Garland AF, Bickman L, & Chorpita BF (2010). Change what? Identifying quality improvement targets by investigating usual mental health care. Administration and Policy in Mental Health and Mental Health Services Research, 37(1–2), 15–26. 10.1007/s10488-010-0279-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Glasgow RE, & Riley WT (2013). Pragmatic measures. American Journal of Preventive Medicine, 45(2), 237–243. 10.1016/j.amepre.2013.03.010 [DOI] [PubMed] [Google Scholar]
  24. Glisson C, Landsverk J, Schoenwald S, Kelleher K, Hoagwood KE, Mayberg S, … The research network on youth mental health. (2008). Assessing the Organizational Social Context (OSC) of mental health services: Implications for research and practice. Administration and Policy in Mental Health and Mental Health Services Research, 35(1–2), 98–113. 10.1007/s10488-007-0148-5 [DOI] [PubMed] [Google Scholar]
  25. Hurlburt MS, Garland AF, Nguyen K, & Brookman-Frazee L (2010) Child and family therapy process: Concordance of therapist and observational perspectives. Administration and Policy in Mental Health and Mental Health Services Research, 37 (3), 230–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Kolko DJ, Cohen JA, Mannarino AP, Baumann BL, & Knudsen K (2009). Community treatment of child sexual abuse: a survey of practitioners in the National Child Traumatic Stress Network. Administration and Policy in Mental Health and Mental Health Services Researc, 36(1), 37–49. [DOI] [PubMed] [Google Scholar]
  27. Kreier CD &Frisbie DA (1989). Effectiveness of multiple true-false items. Applied Measurement in Education, 2, 207–216. [Google Scholar]
  28. Lewis CC, Fischer S, Weiner BJ, Stanick C, Kim M, & Martinez RG (2015). Outcomes for implementation science: an enhanced systematic review of instruments using evidence-based rating criteria. Implementation Science, 10(1). 10.1186/s13012-015-0342-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Lewis CC, Klasnja P, Powell BJ, Lyon AR, Tuzzio L, Jones S, … Weiner B (2018). From classification to causality: Advancing understanding of mechanisms of change in Implementation Science. Frontiers in Public Health, 6 10.3389/fpubh.2018.00136 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Liew TM (2019). The optimal short version of Montreal Cognitive Assessment in diagnosing mild cognitive impairment and dementia. Journal of the American Medical Directors Association. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Lord FM (1980). Applications of Item Response Theory to practical testing problems. Hilldale, NJ: Erlbaum. [Google Scholar]
  32. McHugh RK, & Barlow DH (2010). The dissemination and implementation of evidence-based psychological treatments: A review of current efforts. American Psychologist, 65(2), 73–84. 10.1037/a0018121 [DOI] [PubMed] [Google Scholar]
  33. Mokken RJ (1971). A Theory and Procedure of Scale Analysis. Berlin, Germany: De Gruyter. [Google Scholar]
  34. Moore TM, Scott JC, Reise SP, Port AM, Jackson CT, Ruparel K, … Gur RC (2015). Development of an abbreviated form of the Penn Line Orientation Test using large samples and computerized adaptive test simulation. Psychological Assessment, 27(3), 955–964. 10.1037/pas0000102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Nakamura BJ, Higa-McMillan CK, Okamura KH, & Shimabukuro S (2011). Knowledge of and attitudes towards evidence-based practices in community child mental health practitioners. Administration and Policy in Mental Health and Mental Health Services Research, 38(4), 287–300. 10.1007/s10488-011-0351-2 [DOI] [PubMed] [Google Scholar]
  36. Okamura KH, Hee PJ, Jackson D, & Nakamura BJ (2018). Furthering our understanding of therapist knowledge and attitudinal measurement in youth community mental health. Administration and Policy in Mental Health and Mental Health Services Research, 45(5), 699–708. 10.1007/s10488-018-0854-1 [DOI] [PubMed] [Google Scholar]
  37. Okamura KH, Nakamura BJ, Mueller C, Hayashi K, & McMillan CKH (2016). An exploratory factor analysis of the Knowledge of Evidence-Based Services Questionnaire. The Journal of Behavioral Health Services & Research, 43(2), 214–232. 10.1007/s11414-013-9384-5 [DOI] [PubMed] [Google Scholar]
  38. Powell BJ, Beidas RS, Rubin RM, Stewart RE, Wolk CB, Matlin SL, … Mandell DS (2016). Applying the policy ecology framework to Philadelphia’s behavioral health transformation efforts. Administration and Policy in Mental Health and Mental Health Services Research, 43(6), 909–926. 10.1007/s10488-016-0733-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. PracticeWise LLC (2017). PracticeWise evidence-base services database.Retrieved from https://www.practicewise.com [Google Scholar]
  40. Reise SP, & Yu J (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27(2), 133–144. [Google Scholar]
  41. Roalf DR, Moore TM, Wolk DA, Arnold SE, Mechanic-Hamilton D, Rick J, … Moberg PJ (2016). Defining and validating a short form Montreal Cognitive Assessment (s-MoCA) for use in neurodegenerative disease. Journal of Neurology, Neurosurgery & Psychiatry, 87(12), 1303–1310. 10.1136/jnnp-2015-312723 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Rogers EM (2003). Diffusion of innovation (5th ed.). New York: Free Press. [Google Scholar]
  43. Rosenman R, Tennekoon V, & Hill LG (2011). Measuring bias in self-reported data. International Journal of Behavioural and Healthcare Research, 2(4), 320 10.1504/IJBHR.2011.043414 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Samejima F (1969). Estimation of latent ability using a response pattern of graded scores. Chicago, IL: Psychometric Society. [Google Scholar]
  45. Sanders MR, Prinz RJ, & Shapiro CJ (2009). Predicting Utilization of Evidence-Based Parenting Interventions with Organizational, Service-Provider and Client Variables. Administration and Policy in Mental Health and Mental Health Services Research, 36(2), 133–143. 10.1007/s10488-009-0205-3 [DOI] [PubMed] [Google Scholar]
  46. Seng AC, Prinz RJ, & Sanders MR (2006). The Role of training variables in effective dissemination of evidence-based parenting interventions. International Journal of Mental Health Promotion, 8(4), 20–28. 10.1080/14623730.2006.9721748 [DOI] [Google Scholar]
  47. Stumpf RE, Higa-McMillan CK, & Chorpita BF (2009). Implementation of evidence-based services for youth: Assessing provider knowledge. Behavior Modification, 33(1), 48–65. 10.1177/0145445508322625 [DOI] [PubMed] [Google Scholar]
  48. Weersing VR, Weisz JR, & Donenberg GR (2002). Development of the therapy procedures checklist: A therapist-report measure of technique use in child and adolescent treatment. Journal of Clinical Child & Adolescent Psychology, 31(2), 168–180. 10.1207/S15374424JCCP3102_03 [DOI] [PubMed] [Google Scholar]
  49. Ward AM, Regan J, Chorpita BF, Starace N, Rodriguez A, Okamura K, … The Research Network on Youth Menta. (2013). Tracking Evidence Based Practice with Youth: Validity of the MATCH and Standard Manual Consultation Records. Journal of Clinical Child & Adolescent Psychology, 42(1), 44–55. [DOI] [PubMed] [Google Scholar]
  50. Weist M, Lever N, Stephan S, Youngstrom E, Moore E, Harrison B, … Stiegler K (2009). Formative evaluation of a framework for high quality, evidence-based services in school mental health. School Mental Health, 1(4), 196–211. 10.1007/s12310-009-9018-5 [DOI] [Google Scholar]
  51. Weisz JR, Jensen-Doss A, & Hawley KM (2006). Evidence-based youth psychotherapies versus usual clinical care: A meta-analysis of direct comparisons. American Psychologist, 61(7), 671–689. 10.1037/0003-066X.61.7.671 [DOI] [PubMed] [Google Scholar]
  52. Weisz JR, Kuppens S, Ng MY, Eckshtain D, Ugueto AM, Vaughn-Coaxum R, … Fordwood SR (2017). What five decades of research tells us about the effects of youth psychological therapy: A multilevel meta-analysis and implications for science and practice. American Psychologist, 72(2), 79–117. 10.1037/a0040360 [DOI] [PubMed] [Google Scholar]
  53. Weisz JR, Ugueto AM, Cheron DM, & Herren J (2013). Evidence-based youth psychotherapy in the mental health ecosystem. Journal of Clinical Child & Adolescent Psychology, 42(2), 274–286. 10.1080/15374416.2013.764824 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

RESOURCES