. 2020 Apr 1;7(2):71–78. doi: 10.5152/eurjrheum.2020.19043

Supplementary File 1.

COSMIN Study Design checklist for patient-reported outcome measurement instruments

	Yes	No	?
Internal Consistency
1. Were there any important flaws in the design or methods of the study?		✓
2. Design requirements	✓
3. Was the percentage of missing items given?	✓
4 Was there a description of how missing items were handled?	✓
5. Was the sample size included in the internal consistency analysis adequate?	✓
6. Was the unidimensionality of the scale checked? i.e. was factor analysis or IRT model applied?	✓
7. Was the sample size included in the unidimensionality analysis adequate?	✓
8. Was an internal consistency statistic calculated for each (unidimensional) (sub)scale separately?	✓
9. Were there any important flaws in the design or methods of the study?	✓
Statistical methods	Yes	No	NA
1. for Classical Test Theory (CTT): Was Cronbach’s alpha calculated?			✓
2. for dichotomous scores: Was Cronbach’s alpha or KR-20 calculated?	✓
3. for IRT: Was a goodness of fit statistic at a global level calculated? e.g. χ², reliability coefficient of estimated latent trait value (index of (subject or item) separation)			✓
Reliability: relative measures (including test-retest reliability, inter-rater reliability and intra-rater reliability)	Yes	No	NA/?
Design requirements
Was the percentage of missing items given?	✓
Was there a description of how missing items were handled?	✓
Was the sample size included in the analysis adequate?	✓
Were at least two measurements available?	✓
Were the administrations independent?	✓
Was the time interval stated?	✓
Were patients stable in the interim period on the construct to be measured?	✓
Was the time interval appropriate?	✓
Were the test conditions similar for both measurements? e.g. type of administration, environment, instructions	✓
Were there any important flaws in the design or methods of the study?		✓
Statistical methods
for continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	✓
for dichotomous/nominal/ordinal scores: Was kappa calculated?	✓
for ordinal scores: Was a weighted kappa calculated?	✓
for ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	✓
Measurement error: absolute measures
Design requirements
Was the percentage of missing items given?	✓
Was there a description of how missing items were handled?	✓
Was the sample size included in the analysis adequate?	✓
Were at least two measurements available?	✓
Were the administrations independent?	✓
Was the time interval stated?	✓
Were patients stable in the interim period on the construct to be measured?	✓
Was the time interval appropriate?	✓
Were the test conditions similar for both measurements? e.g. type of administration, environment, instructions	✓
Were there any important flaws in the design or methods of the study?		✓
Statistical methods
for CTT: Was the Standard Error of Measurement (SEM), Smallest Detectable Change (SDC) or Limits of Agreement (LoA) calculated?	✓
Hypotheses testing	Yes	No	?
Design requirements	✓
Was the percentage of missing items given?	✓
Was there a description of how missing items were handled?	✓
Was the sample size included in the analysis adequate?	✓
Were hypotheses regarding correlations or mean differences formulated a priori (i.e. before data collection)?	✓
	Yes	No	NA
Was the expected direction of correlations or mean differences included in the hypotheses?	✓
Was the expected absolute or relative magnitude of correlations or mean differences included in the hypotheses?	✓
for convergent validity: Was an adequate description provided of the comparator instrument(s)?	✓
for convergent validity: Were the measurement properties of the comparator instrument(s) adequately described?	✓
Were there any important flaws in the design or methods of the study?		✓
Statistical methods	Yes	No	NA
Were design and statistical methods adequate for the hypotheses to be tested?	✓
Interpretability	Yes	No	NA
Was the percentage of missing items given?	✓
Was there a description of how missing items were handled?	✓
Was the sample size included in the analysis adequate?	✓
Was the distribution of the (total) scores in the study sample described?	✓
Was the percentage of the respondents who had the lowest possible (total) score described?	✓
Was the percentage of the respondents who had the highest possible (total) score described?	✓
Were scores and change scores (i.e. means and SD) presented for relevant (sub) groups? e.g. for normative groups, subgroups of patients, or the general population	✓
Was the minimal important change (MIC) or the minimal important difference (MID) determined?	✓
Were there any important flaws in the design or methods of the study?		✓