Table 3.
Criteria for good measurement properties
| Measurement property | Rating | Criteria |
| Structural validity | + |
CTT CFA: CFI or TLI or comparable measure >0.95 OR RMSEA <0.06 OR SRMR <0.08*. IRT/Rasch No violation of unidimensionality†: CFI or TLI or comparable measure >0.95 OR RMSEA <0.06 OR SRMR <0.08. AND no violation of local independence: residual correlations among the items after controlling for the dominant factor <0.20 OR Q3 <0.37. AND no violation of monotonicity: adequate looking graphs OR item scalability >0.30. AND adequate model fit. IRT: χ2 >0.001. Rasch: infit and outfit mean squares ≥0.5 and ≤1.5 OR Z-standardised values >−2 and <2. |
| ? | CTT: not all information for ‘+’ reported. IRT/Rasch: model fit not reported. |
|
| − | Criteria for ‘+’ not met. | |
| Internal consistency | + | At least low evidence‡ for sufficient structural validity§ AND Cronbach’s alpha ≥0.70 for each unidimensional scale or subscale¶. |
| ? | Criteria for ‘At least low evidence‡ for sufficient structural validity§’ not met. | |
| − | At least low evidence‡ for sufficient structural validity§ AND Cronbach’s alpha <0.70 for each unidimensional scale or subscale¶. | |
| Reliability | + | ICC or weighted kappa ≥0.70. |
| ? | ICC or weighted kappa not reported. | |
| − | ICC or weighted kappa <0.70. | |
| Measurement error | + | SDC or LoA <MIC§. |
| ? | MIC not defined. | |
| − | SDC or LoA >MIC§. | |
| Hypotheses testing for construct validity | + | The result is in accordance with the hypothesis**. |
| ? | No hypothesis defined (by the review team). | |
| − | The result is not in accordance with the hypothesis**. | |
| Cross-cultural validity/measurement invariance | + | No important differences found between group factors (such as age, gender, language) in multiple group factor analysis OR no important DIF for group factors (McFadden’s R2 <0.02). |
| ? | No multiple group factor analysis OR DIF analysis performed. | |
| − | Important differences between group factors OR DIF were found. | |
| Criterion validity | + | Correlation with gold standard ≥0.70 OR AUC ≥0.70. |
| ? | Not all information for ‘+’ reported. | |
| − | Correlation with gold standard <0.70 OR AUC <0.70. | |
| Responsiveness | + | The result is in accordance with the hypothesis** OR AUC ≥0.70. |
| ? | No hypothesis defined (by the review team). | |
| − | The result is not in accordance with the hypothesis** OR AUC <0.70. |
From Prinsen et al.11
‘+’, sufficient; ‘−’, insufficient; ‘?’, indeterminate.
*To rate the quality of the summary score, the factor structures should be equal across studies.
†Unidimensionality refers to a factor analysis per subscale, while structural validity refers to a factor analysis of a (multidimensional) PROM.
‡As defined by grading the evidence according to the GRADE approach.
§This evidence may come from different studies.
¶The criteria ‘Cronbach’s alpha <0.95’ was deleted as this is relevant in the development phase of a PROM and not when evaluating an existing PROM.
**The results of all studies should be taken together and it should then be decided if 75% of the results are in accordance with the hypotheses.
AUC, area under the curve; CFA, confirmatory factor analysis; CFI, comparative fit index; CTT, classical test theory; DIF, differential item functioning; GRADE, Grading of Recommendations, Assessment, Development and Evaluation; ICC, intraclass correlation coefficient; IRT, item response theory; LoA, limits of agreement; MIC, minimal important change; PROM, patient-reported outcome measure; RMSEA, root mean square error of approximation; SDC, smallest detectable change; SRMR, standardised root mean residuals; TLI, Tucker-Lewis index.