. 2019 Mar 6;19:48. doi: 10.1186/s12874-019-0688-x

Table 4.

Descriptive characteristics of tools used to assess the quality of a peer review report

Journal or Company Name ^a	First Author, Year	Format	Quality defined ^b	Overall quality assessment	Items (n)	Items weights ^c	Scoring range ^d	Scoring system instruction ^e	Scale/ Checklist Development ^f	Validity ^g	Reliability ^h	Internal consistency	RCTs ⁱ
Advances in Nursing Science; Issues in Mental Health Nursing; The Journal of Holistic Nursing	Shattell 2010 [33]	Scale	N	Summary Score	6	S	1–10	N	NR	NR	NR	NR	0
American Journal of Roentgenology	Friedman 1995 [22]	Scale	N	Single Score	1	NA	1–4	N	NR	NR	NR	NR	0
American Journal of Roentgenology	Kliewer 2005 [49]	Scale	N	Summary Score	4	NA	1–4	N	NR	NR	NR	NR	0
American Journal of Roentgenology	Rajesh 2013 [32]	Scale	N	Single Score	1	NA	1–4	P	NR	NR	NR	NR	0
American Journal of Roentgenology	Berquist 2017 [50]	Scale	N	Summary Score	4	NA	0–4	Y	NR	NR	NR	NR	0
Annals of Emergency Medicine	Callaham 1998 [25]	Scale	N	Single Score	1	NA	1–5	N	NR	NR	Inter-Rater (ICC = 0.44, 0.24, 0.12) ^l	NR	2 ^m
Annals of Emergency Medicine	Callaham 2002 [26, 51]	Scale	N	Summary Score	6	NA	1–5	N	NR	NR	Inter-Rater (ICC = 0.44, 0.24, 0.12) ^l	NR	1
Annals of Emergency Medicine; Annals of Internal Medicine; JAMA; Obstetrics & Gynecology and Ophthalmology	Justice 1998 [35]	Scale	N	Summary Score	4	S	1–5	N	NR	NR	NR	NR	0
British Journal of General Practice	Moore 2014 [29]	Scale	N	Single Score	1	NA	A-E	Y	NR	NR	NR		0
British Medical Journal	Black 1998 (RQI 3.2) [23, 39]	Scale	N	Summary Score	7	S	1–5	N	Y	Face (N = 20)	Test-Retest (Kw = 1.00)	Internal Consistency (Cronbach’s alpha = 0.84)	5
British Medical Journal	Black 1998 (RQI 3.2) [23, 39]	Scale	N	Mean						Content (N = 20) Construct	Inter-Rater (Kw = 0.83)	Internal Consistency (Cronbach’s alpha = 0.84)	5
British Medical Journal	Van Rooyen 1999 (RQI 4) [27]	Scale	N	Mean ⁿ	8	S	1–5	N	NR	NR	Inter-Rater (Kw = 0.38–0.67) ^o		2
Chinese Journal of Tuberculosis and Respiratory Diseases	Yang 2009 [52]	Checklist	N	NA	5	NA	NA	N	NR	NR	NR		0
Journal of Clinical Investigation	Stossel 1985 [30]	Scale	N	Single Score	1	NA	Good- Fair- Poor	Y	NR	NR	NR		0
Journal of General Internal Medicine	McNutt 1990 [28, 40]	Scale	N	Summary Score	9	S	1–5	N	NR	Construct	NR		1
Journal of Vascular Interventional Radiology	Feurer 1994 [41]	Scale	N	Sum	7	D	0–14	N	NR	Content (N = 2) Preliminary Criterion (N = 2) (Kendall = 0.94)	Inter-Rater (ICC = 0.84)		0
NA	Review quality collector (RQC) 2012 [53]	Scale	N	Mean	4	User-defined weights	0–100	N	NR	NR	NR		0
Nursing Research	Henly 2009 [24]	Scale	N	Mean (CAS, GAS scale)	15	S	1–5	P	NR	NR	Inter-Rater (ICC = 0.79) ^p		0
				Summary Score (OAS scale)			1–5
				Summary Score (GRQ scale)			0–100
Nursing Research	Henly 2010 [36]	Scale	N	Mean (CAS, GAR, SARNR scale)	26	S	1–5	P	NR	NR	Inter-Rater (ICC = 0.75)^p		0
Nursing Research	Henly 2010 [36]	Scale	N	Summary Score (GRQ scale)			0–100	P	NR	NR	Inter-Rater (ICC = 0.75)^p		0
Obstetrics & Gynecology, Dutch Journal of Medicine	Landkroon 2006 [42]	Scale	N	Summary Score	5	NA	1–5	Y	NR	NR	Test-Retest (ICC =0.66–0.88) Inter-Rater (ICC = 0.62)		0
Pakistan Journal of Medical Sciences	Jawaid 2006 [34]	Scale	N	NR ^q	5	S	1–5	N	NR	NR	NR		0
Peerage of science	Peerage Essay Quality (PEQ) 2011 [37]	Scale	N	Mean	3	S	1–5	N	NR	NR	NR		0
Publons Academy	Review Rating and Feedback Form 2016 [38]	Scale	N	Sum	4	S	0–3 (Full score: 0–12)	N	NR	NR	NR		0
The Journal of Bone and Joint Surgery	Thompson 2016 [31]	Scale	N	Single Score	1	NA	80–100	Y	NR	NR	Inter-Rater (ICC = -4.5 to 0.99) ^r		0
The National Medical Journal of India	Das Sinha 1999 [54]	Scale	N	Sum	5	D	0–100	N	NR	NR	NR		0

^aName of journal or company/organization where the tool was used to assess the quality of their peer review reports

^bThe quality of a peer review report is not clearly defined in any reports

^cNA Not applicable, S Same weight for each item, D Different weight for each item

^dNA Not applicable

^eY Yes defined, P Partially defined, N Not defined

^{f, g, h}NR Not reported

ⁱNumber of randomized controlled trials where the tool was used as outcome criteria

^lThe ICC was 0.44 for reviewers, 0.24 for editors, and 0.12 for manuscripts

^mOne article consists of two studies. First study is not a RCT while the second one is a RCT [55]

ⁿThe overall quality is based on the mean of the first seven items (the item about the tone of the review was not included)

^oThe inter-rater reliability was measured with weighted K for item from 1 to 7 for two editors’ independent assessments

^pThe tool includes more than one scale. We reported inter-rater reliability only for General Review Quality (GRQ) scale

^qNot reported. Although the authors reported that the reviewers were rated as excellent, good and average based on the quality of the reviews, it is not reported how they assessed the overall quality of peer review reports

^rICC range for 11 manuscripts. There was one outlier manuscript that if removed brought the range to 0.87–0.99