Skip to main content
. 2022 Mar 15;12(3):e059360. doi: 10.1136/bmjopen-2021-059360

Table 3.

Criteria for good measurement properties

Measurement property Rating Criteria
Structural validity + CTT
CFA: CFI or TLI or comparable measure >0.95 OR RMSEA <0.06 OR SRMR <0.08*.
IRT/Rasch
No violation of unidimensionality†: CFI or TLI or comparable measure >0.95 OR RMSEA <0.06 OR SRMR <0.08.
AND no violation of local independence: residual correlations among the items after controlling for the dominant factor <0.20 OR Q3 <0.37.
AND no violation of monotonicity: adequate looking graphs OR item scalability >0.30.
AND adequate model fit.
IRT: χ2 >0.001.
Rasch: infit and outfit mean squares ≥0.5 and ≤1.5 OR Z-standardised values >−2 and <2.
? CTT: not all information for ‘+’ reported.
IRT/Rasch: model fit not reported.
Criteria for ‘+’ not met.
Internal consistency + At least low evidence‡ for sufficient structural validity§ AND Cronbach’s alpha ≥0.70 for each unidimensional scale or subscale¶.
? Criteria for ‘At least low evidence‡ for sufficient structural validity§’ not met.
At least low evidence‡ for sufficient structural validity§ AND Cronbach’s alpha <0.70 for each unidimensional scale or subscale¶.
Reliability + ICC or weighted kappa ≥0.70.
? ICC or weighted kappa not reported.
ICC or weighted kappa <0.70.
Measurement error + SDC or LoA <MIC§.
? MIC not defined.
SDC or LoA >MIC§.
Hypotheses testing for construct validity + The result is in accordance with the hypothesis**.
? No hypothesis defined (by the review team).
The result is not in accordance with the hypothesis**.
Cross-cultural validity/measurement invariance + No important differences found between group factors (such as age, gender, language) in multiple group factor analysis OR no important DIF for group factors (McFadden’s R2 <0.02).
? No multiple group factor analysis OR DIF analysis performed.
Important differences between group factors OR DIF were found.
Criterion validity + Correlation with gold standard ≥0.70 OR AUC ≥0.70.
? Not all information for ‘+’ reported.
Correlation with gold standard <0.70 OR AUC <0.70.
Responsiveness + The result is in accordance with the hypothesis** OR AUC ≥0.70.
? No hypothesis defined (by the review team).
The result is not in accordance with the hypothesis** OR AUC <0.70.

From Prinsen et al.11

‘+’, sufficient; ‘−’, insufficient; ‘?’, indeterminate.

*To rate the quality of the summary score, the factor structures should be equal across studies.

†Unidimensionality refers to a factor analysis per subscale, while structural validity refers to a factor analysis of a (multidimensional) PROM.

‡As defined by grading the evidence according to the GRADE approach.

§This evidence may come from different studies.

¶The criteria ‘Cronbach’s alpha <0.95’ was deleted as this is relevant in the development phase of a PROM and not when evaluating an existing PROM.

**The results of all studies should be taken together and it should then be decided if 75% of the results are in accordance with the hypotheses.

AUC, area under the curve; CFA, confirmatory factor analysis; CFI, comparative fit index; CTT, classical test theory; DIF, differential item functioning; GRADE, Grading of Recommendations, Assessment, Development and Evaluation; ICC, intraclass correlation coefficient; IRT, item response theory; LoA, limits of agreement; MIC, minimal important change; PROM, patient-reported outcome measure; RMSEA, root mean square error of approximation; SDC, smallest detectable change; SRMR, standardised root mean residuals; TLI, Tucker-Lewis index.