Skip to main content
. 2018 Apr 3;13(4):e0192726. doi: 10.1371/journal.pone.0192726

Table 4. Patient-level performance evaluation for predicting clinical heart failure or severe tissue pathology from H&E stained whole-slide images for the held-out test set.

Metric Random Forest Deep Learning p-value
Image-level results
Accuracy 0.871± 0.01 0.946 ± 0.01 < 0.001
Sensitivity 0.883 ± 0.02 0.968 ± 0.02 0.01
Specificity 0.860 ± 0.01 0.927 ± 0.01 0.01
Positive predictive value 0.847 ± 0.01 0.921 ± 0.01 < 0.001
AUC 0.935 ± 0.001 0.977 ± 0.01 < 0.001
Patient-level results
Accuracy 0.917 ± 0.01 0.962 ± 0.01 0.002
Sensitivity 0.932 ± 0.03 0.993 ± 0.01 0.033
Specificity 0.905 ± 0.03 0.935 ± 0.01 n.s.
Positive predictive value 0.896 ± 0.02 0.930 ± 0.01 n.s.
AUC 0.960 ± 0.01 0.989 ± 0.01 0.002

The results are presented as the Mean ± SD of three models. Each model was trained on ~770 images from ~70 patients. These models were evaluated on the held-out test set of 105 patients. The patient-level diagnosis is the majority vote over all the images from a single patient. Statistics were determined by an unpaired two-sample t-test with an N of three folds.