. 2021 Mar 24;3(5):e286–e294. doi: 10.1016/S2589-7500(21)00039-X

Table 3.

Performance of progression prediction models

	C-index (95% CI)	F score (95% CI)	Sensitivity (95% CI)	Specificity (95% CI)	Binomial p value^*	Log-rank p value^†	χ² (95% CI)^†
Internal test set
Image-based model	0·737 (0·713–0·773)	0·790 (0·776–0·808)	0·696 (0·664–0·718)	0·775 (0·769–0·782)	<0·0001	<0·0001	17·33 (13·73–22·02)
Clinical-data-based model	0·769 (0·755–0·786)	0·811 (0·803–0·836)	0·656 (0·631–0·674)	0·811 (0·801–0·817)	<0·0001	<0·0001	31·77 (24·58–36·56)
Image and clinical data combined model	0·805 (0·800–0·820)	0·843 (0·836–0·863)	0·720 (0·700–0·749)	0·845 (0·840–0·850)	ref	<0·0001	26·51 (21·65–33·56)
Severity-score–based model	0·696 (0·676–0·711)	0·761 (0·752–0·775)	0·656 (0·635–0·669)	0·743 (0·736–0·752)	<0·0001	<0·0001	18·15 (9·45–23·70)
Severity score and clinical data combined model	0·781 (0·755–0·787)	0·805 (0·798–0·832)	0·678 (0·666–0·700)	0·798 (0·793–0·807)	0·0002	<0·0001	42·23 (33·63–49·59)
External test set
Image-based model	0·721 (0·700–0·727)	0·795 (0·779–0·813)	0·633 (0·606–0·662)	0·791 (0·788–0·796)	<0·0001	<0·0001	39·17 (28·62–48·58)
Clinical-data-based model	0·707 (0·695–0·729)	0·769 (0·756–0·780)	0·602 (0·583–0·621)	0·753 (0·751–0·762)	<0·0001	<0·0001	31·72 (26·41–42·94)
Image and clinical data combined model	0·752 (0·739–0·764)	0·805 (0·791–0·825)	0·667 (0·643–0·698)	0·798 (0·791–0·803)	ref	<0·0001	52·04 (46·50–66·14)
Severity-score–based model	0·606 (0·584–0·627)	0·720 (0·704–0·733)	0·528 (0·512–0·541)	0·695 (0·686–0·701)	<0·0001	<0·0001	11·65 (6·84–15·43)
Severity score and clinical data combined model	0·715 (0·704–0·721)	0·778 (0·757–0·795	0·667 (0·649–0·677)	0·759 (0·756–0·765)	<0·0001	<0·0001	37·62 (26·68–46·95)

C-index for right-censored data measures the model performance by comparing the progression information (critical labels and progression days) with predicted risk scores; a larger C-index correlates with better progression prediction performance. C-index=concordance index.

Measures the difference in performance between the image and clinical data combined model and other prediction models; a smaller p value represents greater likelihood of a difference between the combined model and other models.

^†

Shows a comparison of stratification performance of different models; a smaller p value and larger χ² correlate with better risk stratification performance.