. Author manuscript; available in PMC: 2013 Aug 12.

Published in final edited form as: Soc Work Ment Health. 2012 Mar 16;10(3):205–232. doi: 10.1080/15332985.2011.628602

Table 1.

Overview of Outcome Measures for Children and/or Adolescents in Inpatient Psychiatric Settings

Measure	Measure Properties and Use	Psychometric Properties	Strengths	Limitations
Achenbach (1991) & (1992) Child Behavior Checklist/4 –18 (CBCL/4–18) Child Behavior Checklist/2 – 3 (CBCL/2–3)	Behavior problem checklist which consists of 113 items. Measures emotional and behavioral problems in children and adolescents. Responders are given three options: 0 = not true of their child; 1 = somewhat or sometimes true; and 2 = very true or often true.	Reliability: Internal consistencies ranged from .92 to .96. For the internalizing scale the reliability ranged from .88 to .92 Validity: Strong discriminant validity. High concurrent correlations with related instruments and the CBCL.	Used in large national studies. Translated into 50 different languages	Multiple items make it a challenge to administer such an instrument in a short term inpatient setting.
Bracken, & Keith (2004) Clinical Assessment of Behavior (CAB)	Purpose is to aid in the assessing, diagnosing, screening and treating of children between the ages of 2 and 18 years old. Consists of 3 rating forms: the Parent Rating Form (CAB-P) consisting of 70 questions, the Parent Extended Rating Form (CAB-PX) consisting of 170 questions, and the Teacher Rating Form (CAB-T) consisting of 70 questions. Items on all scales utilize a 5-point likert scale.	Reliability: Test-retest reliability ranged between .75 and .93. Inter-rater reliability for the scales and subscales ranged between .40 and .58. Validity: Correlation coefficients were found to range between .57 and .77. Construct validity ranged between .71 and .95.	Can be utilized to track the progress of clients during treatment. Designed to closely reflect the content of current behavioral disorder literature and child/adolescent psychopathology. Possesses strong psychometric properties.	It cannot be used among individuals lacking a proficient knowledge of the English language.
Burlingame, Jasper, Peterson, Wells, Reisinger, & Brown (n.d) Youth Outcome Measure (YOQ 30.1)	This measure is a shortened version of the Youth Outcome Questionnaire. It consists of 30 likert item questions, which may be completed by parents, clinicians, adolescents or teachers. Designed to be used both at intake and at the end of the treatment. A parent and a self-report version of the scale exist.	Reliability: Scale has high internal consistency, with a coefficient alpha ranging between .92 and .94. The Self-report version has a test-retest reliability of .91 and the parent-report version has a test-retest reliability of .80. Inter-rater reliability was in the low to moderate range, .32 to .77.	Enables clinicians to track the progress of children and adolescents. It allows clinicians to track the overall distress of a patient.	Supporting psychometric evidence regarding whether the YOQ 30.1 provides valid information is limited and hence does not provide clinicians with the informed clarity needed to make decisions based on this measure.
Eisen, Dill, & Grob (1994) Behavior and Symptom Identification Scale (BASIS-32)	This scale consists of 32 items and items are assessed on a 5-point scale. Utilized to measure outcomes in inpatient settings	Reliability: good test-retest reliability and internal consistency was established (Klinkenberg, Cho, & Vieweg, 1998) Vailidity: Good concurrent and discriminant validity.	The use of the scale has been examined in both inpatient and outpatient settings. It obtains a patient-report and can be easily administered and scored (Hoffmann, Capelli, & Mastrianni, 1997)	Takes approximately 20–30 minutes to complete. Questionable utility in measuring outcomes among adolescents (Hoffmann et al, 1997; Klinkenberg et al., 1998).
Eyberg, & Pincus (1999) Eyberg Child Behavior Inventory (ECBI) Sutter-Eyberg Student Behavior Inventory Revised (SESBI-R)	Examines conduct problems in children between 2– 16 years old. Consists of a total of 38 items; 13 of the items are unique to the SESBI-R. Behaviors are rated on two scales, a Yes-No Problem scale, which identifies problematic behaviors, and a 7-point intensity scale regarding the frequency of the child’s behavior.	Reliability: internal consistency ranged from .98 to .95. Test-retest reliability ranged between the .80s and .70s. Validity: Discriminant validity and convergent validity was established.	Takes into account the perspective of both parents and teachers in the assessment of conduct problems among children. Can be considered as initial screening devices.	Does not adequately take into account the changes in conduct-related behaviors among children between the ages of 2 and 16, due to development. Lacks age-specific norms, limiting the utility of these scales. Scarce validation studies in support of this scale exist.
Gadow & Sprafkin (2002) Child Symptom Inventory-4	Measure screens for 13 major childhood psychiatric disorders in the DSM-IV using a 4-point scale. Each item is rated: never, sometimes, often, and very often.	Reliability: Test-retest reliability ranged between .70 to .87. Validity: Convergent validity was established through multiple studies and criterion validity was established in one study.	User-friendly scale, which is easy to interpret and administer. Closely linked to the DSM making it suitable for practical and research use.	Scale's validity depends on the DSM-IV.
Gowers et al (1999). Health of the Nation Outcome Scales for Children and Adolescents (HoNOSCA)	Comprised of 15 scales of which the first 13 are used to compute the total score. The additional two items relate to lack of knowledge within the family about the nature of the child’s disorder and of information about the services. The scale is scored on a 0–4 point rating ranging from ‘no problems’ to ‘severe problems’.	Reliability: Inter-rater reliability was established for 20 cases in one study. Validity: Face validity existed and satisfactory sensitivity to change was demonstrated in one study.	Found to be feasible in clinical settings and was acceptable for use by a range of clinicians.	Requires significant amount of training in order to be used effectively as an outcome measure (Bebbington et al., 1999). Also, for pre-school aged children, certain subscales were found to be unsuitable.
Hodges, Wong & Latessa (1998) The Child & Adolescent Functional Assessment Scale (CAFAS)	Measures the effect of emotional, behavioral, or psychiatric problems and the degree of impact on the functioning of youth. Consists of 315 items, with 11 subscales. Severity measured by four levels: Severe, Moderate, mild, minimal or no impairment.	Reliability: Internal consistency ranged between .73 to .78. A score of .78 was found for test-retest reliability and a score of .92 was found for inter-rater reliability. Validity: Acceptable criterion validity existed.	Attempts to reduce rater bias are in place, by requiring raters to provide justification through behavioral descriptions of the youth.	Burden for clinicians because required to provide justifications as part of the rating process, causing them to see this measure as added paperwork
Kronenberger, Carter & Thomas (1997) Pediatric Inpatient Behavior Scale	47-item scale completed by nurses. Behaviors observed during hospitalization are rated on a 0–3 frequency level. Consists of 10 Subscales: Positive-Sociable, Distress, Elimination Problem, Withdrawal, Oppositional-Noncompliant, Conduct Problem, Anxiety, Self-Harm, Self-Stimulation, & Overactive.	Reliability: Strong inter-rater reliability, with correlations greater than 0.70, was found for all subscales except for withdrawal. For 8 of the 10 subscales, strong internal consistency was found. Validity: Strong construct validity was established among a sample of child inpatients.	Good psychometric properties for 8 out of 10 subscales. Designed specifically for children with primary childhood diagnoses.	Due to the total number of items and numerous subscales, may require extensive time to complete. Also, 2 of the 10 subscales have poor psychometric properties.
LeBuffe, & Naglieri, (2003). Devereux Early Childhood Assessment-Clinical Form.	Assesses the behavior of children between the age of 2 to 5 years old, with a focus on emotional, and social resilience, including other concerns relating to behavior. Comprised of 62 items with 7 subscales. Each item consists of a 5 point frequency rating ranging from never to very frequently.	Reliability: Internal reliability for the two total scores, Protective Factors and Behavioral Concerns, ranged from .88 to .94. Validity: Since the items are formulated based on the DECA, the DSMD, and the DSM-IV, the scales content validity is said to be demonstrated.	This scale is useful in not only assessing behavioral concerns but also is capable of assessing both risk and resiliency factors of the child which is beneficial in developing individual treatment plans and outcomes research.	This instrument is suitable only for children who range in age between 2 to 5 years old, and hence not equipped to assess children who are above 5 years of age.
Lyons (1998) Severity and Acuity of Psychiatric Illness—Child and Adolescent Version (CAPI)	Comprised of 20 anchored ratings, ranging from 0 to 3, with 0 representing an extremely healthy pole and 3 representing an extremely unhealthy pole (Lyons, McCulloch & Romansky, 2006; Lyons, Terry, Martinovich, Peterson, & Bouska, 2001). Four domains: Symptoms, High Risk Behaviors, Functioning, and System Support (Lyons et al., 2001).	Reliability: Reliability for all subscales was .70 or higher. Inter-rater reliability also existed with the average reliability being .76 (Kappa) (Lyons et al., 2001). Validity: The CAPI had strongly correlated with the CAFAS and the Child Behavior Checklist (Lyons et al., 2001).	Valid and reliable (Lyons et al., 2006). Brief and takes relatively little time to complete, of approximately 5 to 10 minutes.	Limited in that it does not measure educational attainment, healthy development or functional skills (Lyons et al., 2006).
Shaffer et al. (1983) Children’s Global Assessment Scale (CGAS)	Psychiatric Severity is measured. It is used among children ranging in age from 4 to 16 years old. It ranges from 1 to 100, where for each decile, descriptions of behavioral functioning across various life situations are provided.	Reliability: Three studies established test-retest reliability. Several past studies also established inter-rater reliability for this scale (Schorre & Vandvik, 2004). Validity: One study examined concurrent validity and it was supported.	CGAS is the most studied scale. Easy to administer and can be used on normal populations as well. Also utilized to measure psychosocial functioning in somatic patients (Schorre & Vandvik., 2004).	For children under 4 years old, psychometric properties were not established (Schorre & Vandvik., 2004). The scale relies exclusively on clinical report, thus the possibility of bias when completing the scale exists (Gold et al., 2009).
Williams & Bloomer (1978–1987) Bay Area Functional Performance Evaluation	Consists of two components which are the Social Interaction Scale (SIS) and the Task-Oriented Assessment (TOA). Seven aspects of social behavior are rated through the SIS based on observations in five different social settings. Performance, affective and cognitive functioning is assessed by the TOA through the administration of five tasks.	Reliability: Inter-item reliability ranged from .73 to .89. Test-retest reliability was said to exist, but psychometric information was not provided. Validity: Concurrent validity was established between the CGAS and the Functional Life scale. Initial construct validity was also established in some studies of the scale (Hemphill-Pearson., 2007).	BaFPE can be quickly administered and scored. BaFPE can be utilized to complement other life skills assessments and support clinical observations. It can be used to enhance patients’ self esteem due to its ability to be used as a therapeutic medium (Hemphill-Pearson., 2007).	TOA may be measuring constructs not intended by the authors. Also, TOA constructs were not clearly described or formulated.