Skip to main content
. 2020 Apr 1;7(2):71–78. doi: 10.5152/eurjrheum.2020.19043

Supplementary File 1.

COSMIN Study Design checklist for patient-reported outcome measurement instruments

Yes No ?
 Internal Consistency
1. Were there any important flaws in the design or methods of the study?
2. Design requirements
3. Was the percentage of missing items given?
4 Was there a description of how missing items were handled?
5. Was the sample size included in the internal consistency analysis adequate?
6. Was the unidimensionality of the scale checked? i.e. was factor analysis or IRT model applied?
7. Was the sample size included in the unidimensionality analysis adequate?
8. Was an internal consistency statistic calculated for each (unidimensional) (sub)scale separately?
9. Were there any important flaws in the design or methods of the study?
 Statistical methods Yes No NA
1. for Classical Test Theory (CTT): Was Cronbach’s alpha calculated?
2. for dichotomous scores: Was Cronbach’s alpha or KR-20 calculated?
3. for IRT: Was a goodness of fit statistic at a global level calculated? e.g. χ2, reliability coefficient of estimated latent trait value (index of (subject or item) separation)
 Reliability: relative measures (including test-retest reliability, inter-rater reliability and intra-rater reliability) Yes No NA/?
 Design requirements
 Was the percentage of missing items given?
 Was there a description of how missing items were handled?
 Was the sample size included in the analysis adequate?
 Were at least two measurements available?
 Were the administrations independent?
 Was the time interval stated?
 Were patients stable in the interim period on the construct to be measured?
 Was the time interval appropriate?
 Were the test conditions similar for both measurements? e.g. type of administration, environment, instructions
 Were there any important flaws in the design or methods of the study?
 Statistical methods
 for continuous scores: Was an intraclass correlation coefficient (ICC) calculated?
 for dichotomous/nominal/ordinal scores: Was kappa calculated?
 for ordinal scores: Was a weighted kappa calculated?
 for ordinal scores: Was the weighting scheme described? e.g. linear, quadratic
 Measurement error: absolute measures
 Design requirements
 Was the percentage of missing items given?
 Was there a description of how missing items were handled?
 Was the sample size included in the analysis adequate?
 Were at least two measurements available?
 Were the administrations independent?
 Was the time interval stated?
 Were patients stable in the interim period on the construct to be measured?
 Was the time interval appropriate?
 Were the test conditions similar for both measurements? e.g. type of administration, environment, instructions
 Were there any important flaws in the design or methods of the study?
 Statistical methods
 for CTT: Was the Standard Error of Measurement (SEM), Smallest Detectable Change (SDC) or Limits of Agreement (LoA) calculated?
 Hypotheses testing Yes No ?
 Design requirements
 Was the percentage of missing items given?
 Was there a description of how missing items were handled?
 Was the sample size included in the analysis adequate?
 Were hypotheses regarding correlations or mean differences formulated a priori (i.e. before data collection)?
Yes No NA
 Was the expected direction of correlations or mean differences included in the hypotheses?
 Was the expected absolute or relative magnitude of correlations or mean differences included in the hypotheses?
 for convergent validity: Was an adequate description provided of the comparator instrument(s)?
 for convergent validity: Were the measurement properties of the comparator instrument(s) adequately described?
 Were there any important flaws in the design or methods of the study?
 Statistical methods Yes No NA
 Were design and statistical methods adequate for the hypotheses to be tested?
 Interpretability Yes No NA
 Was the percentage of missing items given?
 Was there a description of how missing items were handled?
 Was the sample size included in the analysis adequate?
 Was the distribution of the (total) scores in the study sample described?
 Was the percentage of the respondents who had the lowest possible (total) score described?
 Was the percentage of the respondents who had the highest possible (total) score described?
 Were scores and change scores (i.e. means and SD) presented for relevant (sub) groups? e.g. for normative groups, subgroups of patients, or the general population
 Was the minimal important change (MIC) or the minimal important difference (MID) determined?
 Were there any important flaws in the design or methods of the study?