. 2022 Mar 15;12(3):e059360. doi: 10.1136/bmjopen-2021-059360

Table 3.

Criteria for good measurement properties

Measurement property	Rating	Criteria
Structural validity	+	CTT CFA: CFI or TLI or comparable measure >0.95 OR RMSEA <0.06 OR SRMR <0.08. IRT/Rasch* No violation of unidimensionality†: CFI or TLI or comparable measure >0.95 OR RMSEA <0.06 OR SRMR <0.08. AND no violation of local independence: residual correlations among the items after controlling for the dominant factor <0.20 OR Q3 <0.37. AND no violation of monotonicity: adequate looking graphs OR item scalability >0.30. AND adequate model fit. IRT: χ² >0.001. Rasch: infit and outfit mean squares ≥0.5 and ≤1.5 OR Z-standardised values >−2 and <2.
	?	CTT: not all information for ‘+’ reported. IRT/Rasch: model fit not reported.
	−	Criteria for ‘+’ not met.
Internal consistency	+	At least low evidence‡ for sufficient structural validity§ AND Cronbach’s alpha ≥0.70 for each unidimensional scale or subscale¶.
	?	Criteria for ‘At least low evidence‡ for sufficient structural validity§’ not met.
	−	At least low evidence‡ for sufficient structural validity§ AND Cronbach’s alpha <0.70 for each unidimensional scale or subscale¶.
Reliability	+	ICC or weighted kappa ≥0.70.
	?	ICC or weighted kappa not reported.
	−	ICC or weighted kappa <0.70.
Measurement error	+	SDC or LoA <MIC§.
	?	MIC not defined.
	−	SDC or LoA >MIC§.
Hypotheses testing for construct validity	+	The result is in accordance with the hypothesis**.
	?	No hypothesis defined (by the review team).
	−	The result is not in accordance with the hypothesis**.
Cross-cultural validity/measurement invariance	+	No important differences found between group factors (such as age, gender, language) in multiple group factor analysis OR no important DIF for group factors (McFadden’s R² <0.02).
	?	No multiple group factor analysis OR DIF analysis performed.
	−	Important differences between group factors OR DIF were found.
Criterion validity	+	Correlation with gold standard ≥0.70 OR AUC ≥0.70.
	?	Not all information for ‘+’ reported.
	−	Correlation with gold standard <0.70 OR AUC <0.70.
Responsiveness	+	The result is in accordance with the hypothesis** OR AUC ≥0.70.
	?	No hypothesis defined (by the review team).
	−	The result is not in accordance with the hypothesis** OR AUC <0.70.

From Prinsen et al.¹¹

‘+’, sufficient; ‘−’, insufficient; ‘?’, indeterminate.

*To rate the quality of the summary score, the factor structures should be equal across studies.

†Unidimensionality refers to a factor analysis per subscale, while structural validity refers to a factor analysis of a (multidimensional) PROM.

‡As defined by grading the evidence according to the GRADE approach.

§This evidence may come from different studies.

¶The criteria ‘Cronbach’s alpha <0.95’ was deleted as this is relevant in the development phase of a PROM and not when evaluating an existing PROM.

**The results of all studies should be taken together and it should then be decided if 75% of the results are in accordance with the hypotheses.

AUC, area under the curve; CFA, confirmatory factor analysis; CFI, comparative fit index; CTT, classical test theory; DIF, differential item functioning; GRADE, Grading of Recommendations, Assessment, Development and Evaluation; ICC, intraclass correlation coefficient; IRT, item response theory; LoA, limits of agreement; MIC, minimal important change; PROM, patient-reported outcome measure; RMSEA, root mean square error of approximation; SDC, smallest detectable change; SRMR, standardised root mean residuals; TLI, Tucker-Lewis index.