Table 8.
Performance comparison when using the vanilla AE on two held-out test sets (Malaria and Wisconsin Breast Cancer, respectively)
| Top Layers (AEs) | Accuracy (%) | MCC | Precision (%) | Recall (%) | F1 score | |
|---|---|---|---|---|---|---|
| Train | M: Encoding Layers | 91.21% ±1.56% | 0.81 ±0.03 | 86.61% ±3.77% | 90.84% ±2.91% | 88.59% ±1.86% |
| M: Complete Autoencoder | 90.19% ±2.08% | 0.80 ±0.04 | 85.29% ±4.69% | 89.69% ±2.83% | 87.33% ±2.33% | |
| WBC: Encoding Layers | 98.69% ±1.38% | 0.97 ±0.03 | 99.37% ±1.98% | 97.14% ±3.69% | 98.20% ±1.91% | |
| WBC: Complete Autoencoder | 97.90% ±2.07% | 0.96 ±0.04 | 95.54% ±5.18% | 99.29% ±2.26% | 97.28% ±2.65% | |
| Test | M: Encoding Layers | 89.64% | 0.78 | 90.02% | 81.40% | 85.49% |
| M: Complete Autoencoder | 86.10% | 0.70 | 82.86% | 79.34% | 81.06% | |
| WBC: Encoding Layers | 97.34% | 0.95 | 99.99% | 92.86% | 96.30% | |
| WBC: Complete Autoencoder | 95.74% | 0.91 | 98.44% | 90.00% | 94.03% |
The presented results in the first row (Train) are the 10-fold cross-validation mean values, at the validation set, by selecting the best performing model according to its F1 score. The second row (Test) gathers the results when evaluating the models on the testing phase. For both datasets, two thirds of the data were used in the training phase, and one third as the held-out in the test phase. M represents the Malaria dataset, and WBC the Wisconsin Breast Cancer one