Table 2.
Models | Dataset | AUC (95% CI) | ACC (95% CI) | Specificity (95% CI) | Sensitivity (95% CI) |
---|---|---|---|---|---|
Clinical model | Training | 0.942 (0.916 to 0.969) | 0.844 (0.794 to 0.883) | 0.87 (0.762 to 0.935) | 0.835 (0.776 to 0.882) |
Validation | 0.881 (0.835 to 0.927) | 0.793 (0.739 to 0.838) | 0.826 (0.712 to 0.903) | 0.782 (0.718 to 0.835) | |
Testing | 0.739 (0.58 to 0.898)a | 0.647 (0.5 to 0.772) | 0.769 (0.46 to 0.938) | 0.605 (0.435 to 0.755) | |
Radiological model | Training | 0.922 (0.89 to 0.955) | 0.804 (0.751 to 0.848) | 0.957 (0.87 to 0.989) | 0.752 (0.687 to 0.809) |
Validation | 0.869 (0.82 to 0.918) | 0.775 (0.72 to 0.822) | 0.899 (0.796 to 0.955) | 0.733 (0.666 to 0.791) | |
Testing | 0.818 (0.698 to 0.938)a | 0.588 (0.442 to 0.721) | 1 (0.717 to 1) | 0.447 (0.29 to 0.615) | |
Radiomic model | Training | 0.962 (0.939 to 0.986) | 0.909 (0.867 to 0.939) | 0.884 (0.779 to 0.945) | 0.917 (0.869 to 0.95) |
Validation | 0.828 (0.767 to 0.889) | 0.825 (0.774 to 0.867) | 0.797 (0.68 to 0.881) | 0.835 (0.776 to 0.882) | |
Testing | 0.765 (0.585 to 0.946) | 0.667 (0.52 to 0.789) | 0.692 (0.389 to 0.896) | 0.658 (0.486 to 0.799) | |
Quantifying model | Training | 0.899 (0.863 to 0.935) | 0.815 (0.762 to 0.858) | 0.812 (0.696 to 0.892) | 0.816 (0.754 to 0.865) |
Validation | 0.803 (0.742 to 0.863) | 0.778 (0.724 to 0.825) | 0.725 (0.602 to 0.822) | 0.796 (0.733 to 0.848) | |
Testing | 0.607 (0.414 to 0.8)a | 0.608 (0.461 to 0.738) | 0.615 (0.323 to 0.849) | 0.605 (0.435 to 0.755) | |
Integrated model | Training | 0.984 (0.971 to 0.997) | 0.956 (0.923 to 0.976) | 0.899 (0.796 to 0.955) | 0.976 (0.941 to 0.991) |
Validation | 0.893 (0.841 to 0.946) | 0.88 (0.834 to 0.915) | 0.754 (0.633 to 0.846) | 0.922 (0.875 to 0.954) | |
Testing | 0.925 (0.856 to 0.994) | 0.843 (0.709 to 0.925) | 0.923 (0.621 to 0.996) | 0.816 (0.651 to 0.917) |
aDeLong test showed significant different (p < 0.05) between the model with integrated model on the testing cohort. CI confidence interval