Skip to main content
. 2021 Mar 24;3(5):e286–e294. doi: 10.1016/S2589-7500(21)00039-X

Table 3.

Performance of progression prediction models

C-index (95% CI) F score (95% CI) Sensitivity (95% CI) Specificity (95% CI) Binomial p value* Log-rank p value χ2 (95% CI)
Internal test set
Image-based model 0·737 (0·713–0·773) 0·790 (0·776–0·808) 0·696 (0·664–0·718) 0·775 (0·769–0·782) <0·0001 <0·0001 17·33 (13·73–22·02)
Clinical-data-based model 0·769 (0·755–0·786) 0·811 (0·803–0·836) 0·656 (0·631–0·674) 0·811 (0·801–0·817) <0·0001 <0·0001 31·77 (24·58–36·56)
Image and clinical data combined model 0·805 (0·800–0·820) 0·843 (0·836–0·863) 0·720 (0·700–0·749) 0·845 (0·840–0·850) ref <0·0001 26·51 (21·65–33·56)
Severity-score–based model 0·696 (0·676–0·711) 0·761 (0·752–0·775) 0·656 (0·635–0·669) 0·743 (0·736–0·752) <0·0001 <0·0001 18·15 (9·45–23·70)
Severity score and clinical data combined model 0·781 (0·755–0·787) 0·805 (0·798–0·832) 0·678 (0·666–0·700) 0·798 (0·793–0·807) 0·0002 <0·0001 42·23 (33·63–49·59)
External test set
Image-based model 0·721 (0·700–0·727) 0·795 (0·779–0·813) 0·633 (0·606–0·662) 0·791 (0·788–0·796) <0·0001 <0·0001 39·17 (28·62–48·58)
Clinical-data-based model 0·707 (0·695–0·729) 0·769 (0·756–0·780) 0·602 (0·583–0·621) 0·753 (0·751–0·762) <0·0001 <0·0001 31·72 (26·41–42·94)
Image and clinical data combined model 0·752 (0·739–0·764) 0·805 (0·791–0·825) 0·667 (0·643–0·698) 0·798 (0·791–0·803) ref <0·0001 52·04 (46·50–66·14)
Severity-score–based model 0·606 (0·584–0·627) 0·720 (0·704–0·733) 0·528 (0·512–0·541) 0·695 (0·686–0·701) <0·0001 <0·0001 11·65 (6·84–15·43)
Severity score and clinical data combined model 0·715 (0·704–0·721) 0·778 (0·757–0·795 0·667 (0·649–0·677) 0·759 (0·756–0·765) <0·0001 <0·0001 37·62 (26·68–46·95)

C-index for right-censored data measures the model performance by comparing the progression information (critical labels and progression days) with predicted risk scores; a larger C-index correlates with better progression prediction performance. C-index=concordance index.

*

Measures the difference in performance between the image and clinical data combined model and other prediction models; a smaller p value represents greater likelihood of a difference between the combined model and other models.

Shows a comparison of stratification performance of different models; a smaller p value and larger χ2 correlate with better risk stratification performance.