Table 3.
Validity inference | Definition (assumptions)a | Examples of evidence |
---|---|---|
Scoring | The score or written narrative from a given observation adequately captures key aspects of performance | Procedures for creating and empirically evaluating item wording, response options, scoring options Rater selection and training |
Generalization | The total score or synthesis of narratives reflects performance across the test domain | Sampling strategy (e.g., test blueprint) and sample size Internal consistency reliability Interrater reliability |
Extrapolation | The total score or synthesis in a test setting reflects meaningful performance in a real life setting | Authenticity of context Correlation with tests measuring similar constructs, especially in real-life context Correlation (or lack thereof) with tests measuring different constructs Expert-novice comparisons Factor analysis |
Implications/decisions | Measured performance constitutes a rational basis for meaningful decisions and actions | See Table 2, “Consequences” |