. 2019 May 5;24(3):187–218. doi: 10.1016/j.bjpt.2019.04.002

Table 1.

Description of measurement properties definition and the criteria adopted for methodological and results assessments.

Measurement property	Definition¹²	Methodological quality assessment¹²	Quality criteria assessment¹²
Content validity	The degree to which the content of an instrument is an adequate reflection of the construct to be measured.	Assessment of general requirements (e.g. relevance of items, comprehensiveness of the instrument and any important flaws in the design or methods of the study)	(+) A clear description is provided of the measurement aim, the target population, the concepts that are being measured, and the item selection AND target population and (investigators OR experts) were involved in item selection; (?) A clear description of above-mentioned aspects is lacking OR only target population involved OR doubtful design or method; (−) No target population involvement¹⁵

Structural validity	The degree to which the scores of an instrument are an adequate reflection of the dimensionality of the construct to be measured	Assessment of design requirements and statistical methods (e.g. adequate sample size, information on exploratory factor analysis or IRT tests and any important flaws in the design or methods of the study)	(+) CTT CFA: CFI or TLI or comparable measure >0.95 OR RMSEA <0.06 OR SRMR <0.082 IRT/Rasch No violation of unidimensionality: CFI or TLI or comparable measure >0.95 OR RMSEA <0.06 OR SRMR <0.08 AND no violation of local independence: residual correlations among the items after controlling for the dominant factor <0.20 OR Q3's <0.37 AND no violation of monotonicity: adequate looking graphs OR item scalability >0.30 AND adequate model fit: IRT: χ² > 0.01 Rasch: infit and outfit mean squares ≥0.5 and ≤1.5 OR Z-standardized values >−2 and <2 (?) Not all information for ‘+’ reported IRT/Rasch: Model fit not reported (−) Criteria for ‘+’ not met

Internal consistency	The degree of the interrelatedness among the items.	Assessment of design requirements and statistical methods (e.g. information on Cronbach's alpha analysis and any important flaws in the design or methods of the study)	(+) At least low evidence for sufficient structural validity AND Cronbach's alpha(s) ≥0.70 for each unidimensional scale or subscale; (?) Criteria for “At least low evidence for sufficient structural validity” not met; (−) At least low evidence for sufficient structural validity AND Cronbach's alpha(s) <0.70 for each unidimensional scale or subscale

Cross-cultural validity	The degree to which the performance of the items on a translated or culturally adapted instrument are an adequate reflection of the performance of the items of the original version of the instrument.	Assessment of design requirements and statistical methods (e.g. adequate sample size, characteristics similarity on sample and if the regression analysis or IRT was assessed)	(+) No important differences found between group factors (such as age, gender, language) in multiple group factor analysis OR no important DIF for group factors (McFadden's R² < 0.02) (?) No multiple group factor analysis OR DIF analysis performed (−) Important differences between group factors OR DIF was found

Reliability	The proportion of the total variance in the measurements which is due to true differences between Individuals. The extent to which scores for individuals who have not changed are the same for repeated measurement under several conditions.	Assessment of design requirements and statistical methods (e.g. test conditions, information on time interval, ICC or Kappa analysis assessment)	(+) ICC or weighted Kappa >0.70; (?) ICC or weighted Kappa not reported; (−) ICC or weighted Kappa <0.70

Measurement error	The systematic and random error of an individual's score that is not attributed to true changes in the construct to be measured.	Assessment of design requirements (e.g. information on time interval, test conditions, SEM, SDC or LoA analysis assessment and any important flaws in the design or methods of the study)	(+) SDC or LoA < MIC; (?) MIC not defined; (−) SDC or LoA > MIC5

Criterion validity	The degree to which the scores of an instrument are an adequate reflection of a ‘gold standard’.	Assessment of design requirements and statistical methods (e.g. AUC analysis, sensitivity and specificity determined and any important flaws in the design or methods of the study)	(+) Correlation with gold standard ≥0.70 OR AUC ≥0.70; (?) Not all information for ‘+’ reported; (−) Correlation with gold standard <0.70 OR AUC <0.70

Construct validity	The degree to which the scores of an instrument are consistent with hypotheses (for instance with regard to internal relationships, relationships to scores of other instruments, or differences between relevant groups) based on the assumption that the instrument validly measures the construct to be measured.	Assessment of design requirements and statistical methods (e.g. measurement properties of comparator instrument, comparison between subgroups and any important flaws in the design or methods of the study)	(+) The result is in accordance with the hypothesis; (?) No hypothesis defined (by the review team); (−) The result is not in accordance with the hypothesis

Responsiveness	The ability of an instrument to detect change over time in the construct to be measured.	Assessment of design requirements and statistical methods (e.g. gold standard use, ROC curve calculated, sensitivity and specificity determined, measurement properties of comparator instrument and any important flaws in the design or methods of the study)	(+) The result is in accordance with the hypothesis 7 OR AUC ≥ 0.70; (?) No hypothesis defined (by the review team); (−) The result is not in accordance with the hypothesis 7 OR AUC < 0.70

AUC, area under the curve; CFA, confirmatory factor analysis; CFI, comparative fit index; CTT, classical test theory; CTV, content validity; DIF, differential item functioning; ICC, intraclass correlation coefficient; IRT, item response theory; LoA, limits of agreement; MIC, minimal important change; RMSEA, root mean square error of approximation; ROC, receiver operator curve; SDC, smallest detectable change; SEM, standard error of measurement; SRMR, standardized root mean residuals; TLI, Tucker–Lewis index.

(+) = sufficient rating, (?) = indeterminate rating, (−) = insufficient rating.