Skip to main content
. 2020 Aug 11;10(8):e036038. doi: 10.1136/bmjopen-2019-036038

Table 2.

Tools to assess measurement properties. characteristics and comparison to testing standards

Tools Cosmin Terwee’s criteria Attributes and criteria Economic evaluation Guidance for industry Fitzpatrick’s criteria ICF ICFCY EMPRO SCI criteria Andresen’s tool Canchild outcomes Omeract Testing standards
Development Delphi Author criteria Expert panel Literature Consensus Literature Expert panel Expert panel literature Literature Expert panel Expert panel Delphi Consensus
Sponsor/s COSMIN initiative Author SACMOT working group Standing group of health technology FDA staff Standing group of health technology WHO member states IRYSS committee SCIRE working group Author CanChild centre staff OMERACT initiative AERA, APA, NCME
Approval updates 2010, 2018 2007 1996, 2002, 2013 1999, 2017 2006, 2009 1998 2001, 2019* 2008 2008, 2016 2000 1987†, 2004 1992, 1998,2007,2014, 2019 1954, 1966, 1974, 1985, 1999, 2014
Items (scoring) 5–18 items/box (+/−/?) 8–9 items total (+/−/?) Not item structured (no scoring) Not item structured (no scoring) Not item structured (no scoring) Not item structured (no scoring) Not item structured (no scoring) 39 items(strongly agree, agree, disagree, strongly disagree) 3–5 items/box (++++/+++/++/+) Eleven items total (A, B, C) 2–6 items/box (excellent, adequate, poor) 2–5 items/box (Green, amber, red, white) Not item structured (no scoring)
Measurement properties
Validity
Content construct (Int. structure cross-cultural hypotheses test)
Criterion (Gold standard)


Responsiveness
Content construct (Hypotheses test)
Criterion (Gold standard)
Floor/Ceiling


Responsiveness
Conceptual and measurement model
Content
construct (Hypotheses test)
Criterion (Gold standard)
Responsiveness
Descriptive (Content Face Construct)
Preference-based valuation
Empirical (Criterion)
Conceptual model
Content
construct (Hypothesis test, discriminant, convergent, known groups)


Responsiveness
Use Content/face construct (convergent, discriminant, int. structure)
Criterion (Predictive)
Cut-score precision
Responsiveness
Content Conceptual and measurement model
Content construct (Hypotheses test)
Criterion


Responsiveness
Content criterion (concurrent predictive ‘discriminant’)
Clinical utility (consequential validity)
Floor/Ceiling


Responsiveness
Conceptual and measurement model
Instrument bias Int. structure convergent
discriminant


Responsiveness
Use scale construction
Content
construct (Hypotheses test)
Criterion (Gold standard)


Responsiveness
Content, face construct (Convergent, divergent)
Criterion (Accuracy)
Discrimination (Sensitivity over time and over treatment)
Content response process Int. structure (Dimensions, DIF)
Relations to other variables (Hypotheses test, Convergent,
Discriminant, criterion, responsiveness Consequences
Reliability Int. consistency measurement error (Test retest, agreement) Int. consistency reproducibility (Agreement, relative measurement error) Int. consistency reproducibility (Test retest, inter-rater) Test retest Inter-rater Test retest Inter-rater Int. consistency Int. consistency reproducibility (Test retest) Int. consistency reproducibility (Test retest, inter-rater) Int. consistency test retest Int. consistency test retest Int. consistency intra/inter-rater test retest Reproducibility test retest Int. consistency test retest alternate forms scorers and decision consistency/accuracy
Fairness Equivalence of accommodations
Other characteristics Norms Norms, standard values Norms standardisation Scales, norms, Score comparability
Interpretability Interpretability Interpretability Interpretability Interpretability Interpretability Test development and revision
Burden Burden Acceptability (Burden) Burden Burden Burden
Administration accessible forms Administration accessible forms Administration Administration accessible forms Administration accessible forms
Feasibility Cultural adaptations Practicality Feasibility cultural adaptations Cultural adaptations Applicability cultural adaptations Cultural adaptations Clinical utility (Feasibility) Feasibility
Frequency of use (%) 61 (30.4) 45 (22.4) 33 (16.4) 17 (8.4) 14 (6.9) 14 (6.9) 7 (3.4) 4 (2.0) 2 (1.0) 2 (1.0) 1 (0.5) 1 (0.5) 0

*Updated version at website.

†Reference at 2004.

AERA, american educational research association; APA, American Psychological Association; COSMIN, Consensus-based Standards for the selection of health Measurement Instruments; DIF, differential item functioning; EMPRO, Evaluating Measures of Patient Reported Outcomes; FDA, Food and Drug Administration; ICF, international classification of functioning; ICFCY, international classification of functioning for children and youth; IRYSS, Investigation Network for Health and Health Service Outcomes Research; NCME, National Council on Measurement in Education; OMERACT, Outcomes Measures in Rheumatology Clinical Trials; SACMOT, Scientific Advisory Committee Medical Outcomes Trust; SCI, spinal cord injury; SCIRE, Spinal Cord Injury Rehabilitation Evidence.;