Skip to main content
. Author manuscript; available in PMC: 2015 Oct 8.
Published in final edited form as: Med Decis Making. 2014 Apr 8;34(5):560–566. doi: 10.1177/0272989X14528381

Table 1. Elements of measure development and psychometric performance: abstraction criteria and descriptions.

Measure Development
Item generation
Cognitive testing
Pilot studies
How were content items developed and by whom?
Was the measure tested for understandability before use?
Were pilot studies (of any type) conducted to pretest the measure?
Measure Performance
Reliability Were appropriate assessments of the reliability of the measure reported? If so, was there evidence of adequate reliability?
Examples of assessments: internal consistency reliability (e.g., Cronbach's alpha, Kuder-Richardson coefficient); test-retest reliability; inter-rater reliability (e.g., percentage agreement, Kappa coefficient; intra-class correlation coefficient)
Validity (extent to which the measure assesses what is intended) Were appropriate assessments of the validity of the measure reported? If so, was there there evidence of adequate validity?
Types of validity assessment for self-report measures: content validity (e.g., Content Validity Index); criterion-related validity (e.g., correlations to demonstrate concurrent, predictive validity); construct validity (e.g., factor analysis to demonstrate predicted convergence/divergence of constructs and/or structural invariance of the measure, discriminant analysis, known groups analysis)
Measure Performance (other)
Responsiveness (sensitivity) Is there evidence that the measure is sensitive to changes of importance to patients and clinicians?
Accuracy and precision What is known about the measure performance in comparison to “gold standard” measures (accuracy) and/or the number of distinctions or extent of random error in use of the measure (precision)?
Interpretability Are the scores meaningful to clinicians and patients?
Acceptability Does the measure appear to be acceptable to respondents (usually patients; could include others); e.g., are there patterns to missing data or low response rates that could signal a problem with acceptability of the measure?
Feasibility of administration Are there indicators of the appropriateness of effort, burden, or disruption (of clinical or research team) required to administer the measure?