. 2023;23(1):145–164.

Appendix 2.

COSMIN Risk of Bias for PPT studies. (Ratings: (V) Very good; (A) Adequate; (D) Doubtful; (I) Inadequate; N/Α R1: Rater 1 R2: Rater 2 C: Consensus).

Intrarater Reliability	Study 1 De Groef et al. 2016			Study 2 Nascimento et al. 2019			Study 3 Wang-Price et al. 2019			Study 4 Vaegter et al. 2018			Study 5 Jones et al. 2007			Study 6 Persson et al. 2004			Study 7 Vanderweeën et al. 1996			Study 8 Levoska et al. 1993
*Design requirements*				R 1	R2	C	R 1	R2	C	R 1	R2	C	R 1	R2	C	R 1	R2	C	R 1	R2	C	R 1	R2	C
1 Stability of the patients				D	D	D	D	D	D	D	D	D	V	V	V	D	D	D	V	V	V	D	D	D
2 Time interval				D	D	D	V	V	V	V	V	V	V	V	V	V	V	V	D	D	D	V	V	V
3 Similarity of measurement condition				D	D	D	D	D	D	V	V	V	D	D	D	V	V	V	V	V	V	D	D	D
4 Administation without knowledge of scores or values				V	V	V	D	D	D	D	D	D	D	D	D	D	D	D	D	D	D	D	D	D
5 Score assignment or determination without knowledge of the scores or values				V	V	V	D	D	D	D	D	D	I	Ι	Ι	D	D	D	D	D	D	D	D	D
6 Other important flaws				V	V	V	D	D	D	D	D	D	D	D	D	D	D	D	D	D	D	D	D	D
*Statistical Methods*
7 For continuous scores: ICC				V	V	V	A	D	D	A	A	A	A	A	A	V	V	V	A	A	A	A	A	A
8 For ordinal scores: Kappa				-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
9 For dichotomous/nominal scores: Kappa for each category against the other categories				-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
FINAL RATING (Lowest score of items)				D	D	D	D	D	D	D	D	D	Ι	Ι	Ι	D	D	D	D	D	D	D	D	D
Interrater Reliability	Study 1 De Groef et al. 2016			Study 2 Nascimento et al. 2019												Study 6 Persson et al. 2004						Study 8 Levoska et al. 1993
*Design requirements*	R 1	R2	C	R 1	R2	C										R 1	R2	C				R 1	R2	C
1 Stability of the patients	A	A	A	A	D	D										D	D	D				D	D	D
2 Time interval	D	D	D	V	V	V										V	V	V				D	D	D
3 Similarity of measurement condition	D	D	D	D	D	D										V	V	V				D	D	D
4 Administation without knowledge of scores or values	V	V	V	V	V	V										D	D	D				D	D	D
5 Score assignment or determination without knowledge of the scores or values	V	V	V	V	V	V										D	D	D				D	D	D
6 Other important flaws	D	D	D	V	V	V										D	D	D				D	D	D
*Statistical Methods*
7 For continuous scores: ICC	V	V	V	V	V	V										Ι	Ι	Ι				A	A	A
8 For ordinal scores: Kappa	-	-	-	-	-	-										-	-	-				-	-
9 For dichotomous/nominal scores: Kappa for each category against the other categories	-	-	-	-	-	-										-	-	-				-	-	-
FINAL RATING (Lowest score of items)	D	D	D	D	D	D										Ι	Ι	Ι				D	D	D
Intrarater Reliability	Study 1 De Groef et al. 2016			Study 2 Nascimento et al. 2019			Study 3 Wang-Price et al. 2019			Study 4 Vaegter et al. 2018						Study 6 Persson et al. 2004
*Design requirements*	R 1	R2	C	R 1	R2	C	R 1	R2	C	R 1	R2	C				R 1	R2	C
1 Stability of the patients	A	A	A	A	D	D	D	D	D	D	D	D				D	D	D
2 Time interval	D	D	D	V	V	V	V	V	V	V	V	V				V	V	V
3 Similarity of measurement condition	D	D	D	D	D	D	D	D	D	V	V	V				V	V	V
4 Administation without knowledge of scores or values	V	V	V	V	V	V	D	D	D	D	D	D				D	D	D
5 Score assignment or determination without knowledge of the scores or values	V	V	V	V	V	V	D	D	D	D	D	D				D	D	D
6 Other important flaws	D	D	D	V	V	V	D	D	D	D	D	D				D	D	D
*Statistical Methods*
7 For continuous scores: SEM, SDC, LoA or CV calculated?	V	V	V	V	V	V	V	V	V	A	A	A				D	A	A
8 For dichotomous/nominal/ordinal scores: Percentage specific (e.g. positive and negative) agreement	-	-	-	-	-	-	-	-	-	-	-	-				-	-	-
FINAL RATING (Lowest score of items)	D	D	D	D	D	D	D	D	D	D	D	D				D	D	D
Hypotheses testing for construct validity (comparison between subgrous)							Study 3 Wang-Price et al. 2019
*Design requirements*							R 1	R2	C
1 Adequate description of important characteristcs of the subgroups							V	V	V
Statistical Methods
2 Appropriate statistical method for the hypothesis to be tested							V	V	V
3 Other important flaws							V	V	V
FINAL RATING (Lowest score of items)							V	V	V