Table 3. Patient-level performance evaluation for predicting clinical heart failure or severe tissue pathology from H&E stained whole-slide images for validation folds of the training data set.
Metric | Random Forest | Deep Learning | p-value |
---|---|---|---|
Image-level results | |||
Accuracy | 0.869 ± 0.05 | 0.954 ± 0.03 | 0.05 |
Sensitivity | 0.866 ± 0.07 | 0.968 ± 0.03 | n.s. |
Specificity | 0.872 ± 0.04 | 0.943 ± 0.05 | n.s. |
Positive predictive value | 0.848 ± 0.05 | 0.935 ± 0.05 | 0.05 |
AUC | 0.944 ± 0.04 | 0.977 ± 0.02 | 0.05 |
Patient-level results | |||
Accuracy | 0.923 ± 0.03 | 0.962 ± 0.02 | n.s. |
Sensitivity | 0.917 ± 0.07 | 0.979 ± 0.04 | n.s. |
Specificity | 0.930 ± 0.06 | 0.947 ± 0.05 | n.s. |
Positive predictive value | 0.919 ± 0.07 | 0.942 ± 0.06 | n.s. |
AUC | 0.963 ± 0.05 | 0.960 ± 0.05 | n.s. |
The results are presented as the Mean ± SD of three models. Each model was trained on ~770 images from ~70 patients. These models were evaluated on the validation fold of ~35 patients. The patient-level diagnosis is the majority vote over all the images from a single patient. Statistics were determined by an unpaired two-sample t-test with an N of three folds.