Skip to main content
. Author manuscript; available in PMC: 2019 Jun 1.
Published in final edited form as: J Autism Dev Disord. 2018 Jun;48(6):1932–1944. doi: 10.1007/s10803-017-3458-9

Table 2.

Quality Standards for Psychometric Measure Evaluation

Quality Standard Description
1. Describe the measure
2. Describe the examiner qualifications
  • Characteristics of the examiners and raters, including level of experience (Kottner et al., 2011).

  • Assessor training required to administer the measure reliably.

  • Disclose if raters were blind to the child’s diagnosis (Stone et al., 2000).

3. Explain test procedures
  • Adequate procedural description for clinician to replicate testing environment (Friberg, 2010).

4. Describe the sample population
  • Sample population used to obtain psychometric data and determine appropriateness of the test for particular children; include: geographic region, age, gender, ethnicity, socioeconomic status, and diagnostic/disability categories.

  • Recruitment method and justification of sample size.

  • A sample size of at least 100 children per relevant population group (e.g. gender, race, etc 2010).

5. Examine and interpret psychometric properties of reliability and validity Reliability
Internal consistency
  • Guidelines for interpreting Cronbach’s coefficient alpha: ≤.70 = unacceptable, .70–.79 = fair, .80–.89 = good, ≥ .90 = excellent (Cicchetti and Sparrow, 1990).

  • Type and purpose of the measure considered when interpreting reliability (Bracken, 1987)

Test-retest and Interrater reliability
  • Guidelines for interpreting statistical significance for both test-retest and interrater reliability, typically reported using Cohen’s Kappa (Cohen, 1960; for nominal data) or an Intraclass Correlation Coefficient (ICC; for ordinal, interval, or ratio data): <.40 = poor, .40–.59 = fair, .60–.74 = good, .75–1.00 = excellent (Cicchetti and Sparrow, 1981).

  • A test-retest interval of 2–4 weeks recommended for young children (Bracken, 1987; Cicchetti, 1994).

Validity
Concurrent and Divergent validity
  • Correlation of 1.00 or −1.00 not expected or desired; new test should not be an exact replica of the other measure (Cicchetti, 1994; Crocker & Algina, 1986).

6. Examine the dimensionality of the measure
  • Factor structure of the measure considered when evaluating subscales.

  • Construct validity supported with item total correlations greater than .30, inter-item correlations between .30 and .70, and factor analysis with factor loadings greater than .40 (DeVon et al., 2007).