Skip to main content
. 2023 Sep 25;6(9):e2335377. doi: 10.1001/jamanetworkopen.2023.35377

Table 3. Interrater and Intrarater Reliability of APPRAISE-AI Items, Domains, and Overall Score Determined by ICCs.

Variable ICC (95% CI)a
Interrater reliability Intrarater reliability
Item
Title 0.76 (0.61-0.86) 0.76 (0.62-0.85)
Background 0.77 (0.64-0.86) 0.77 (0.64-0.86)
Objective and problem 0.74 (0.59-0.84) 0.74 (0.59-0.84)
Source of data 0.90 (0.80-0.95) 0.99 (0.98-0.99)
Eligibility criteria 0.77 (0.54-0.87) 0.90 (0.84-0.94)
Ground truth 1.00 1.00
Data abstraction, cleaning, preparation 0.80 (0.67-0.88) 0.98 (0.97-0.99)
Data splitting 0.75 (0.61-0.84) 1.00
Sample size calculation 1.00 1.00
Baseline 0.83 (0.72-0.89) 0.97 (0.95-0.98)
Model description 0.77 (0.63-0.86) 0.94 (0.90-0.97)
Hyperparameter tuning 0.76 (0.62-0.85) 0.96 (0.92-0.98)
Cohort characteristics 0.80 (0.68-0.88) 0.98 (0.96-0.99)
Model specification 0.81 (0.69-0.88) 0.90 (0.84-0.94)
Model evaluation 0.79 (0.66-0.87) 0.96 (0.94-0.98)
Clinical utility assessment 0.78 (0.63-0.87) 0.95 (0.91-0.97)
Bias assessment 0.79 (0.62-0.89) 0.96 (0.94-0.98)
Error analysis 1.00 1.00
Model explanation 0.82 (0.71-0.89) 0.96 (0.94-0.98)
Critical analysis 0.84 (0.74-0.90) 1.00
Implementation into clinical practice 0.77 (0.64-0.86) 0.92 (0.87-0.95)
Limitations 0.79 (0.67-0.87) 1.00
Disclosures 1.00 1.00
Transparency 0.95 (0.92-0.97) 0.99 (0.99-1.00)
Domain
Clinical relevance 0.83 (0.70-0.90) 0.89 (0.80-0.94)
Data quality 0.82 (0.70-0.89) 0.97 (0.95-0.98)
Methodological conduct 0.85 (0.75-0.91) 0.98 (0.97-0.99)
Robustness of results 0.81 (0.63-0.90) 0.94 (0.90-0.96)
Reporting quality 0.86 (0.78-0.92) 0.99 (0.99-1.00)
Reproducibility 0.92 (0.86-0.95) 0.99 (0.98-1.00)
Overall score 0.91 (0.85-0.95) 0.98 (0.96-0.99)

Abbreviation: ICC, intraclass correlation coefficient.

a

ICCs were calculated with 2-way random effects, absolute agreement, and single measurement.