. 2020 Aug 20;20(Suppl 5):141. doi: 10.1186/s12911-020-01150-w

Table 8.

Performance comparison when using the vanilla AE on two held-out test sets (Malaria and Wisconsin Breast Cancer, respectively)

	Top Layers (AEs)	Accuracy (%)	MCC	Precision (%)	Recall (%)	F₁ score
Train	M: Encoding Layers	91.21% ±1.56%	0.81 ±0.03	86.61% ±3.77%	90.84% ±2.91%	88.59% ±1.86%
	M: Complete Autoencoder	90.19% ±2.08%	0.80 ±0.04	85.29% ±4.69%	89.69% ±2.83%	87.33% ±2.33%
	WBC: Encoding Layers	98.69% ±1.38%	0.97 ±0.03	99.37% ±1.98%	97.14% ±3.69%	98.20% ±1.91%
	WBC: Complete Autoencoder	97.90% ±2.07%	0.96 ±0.04	95.54% ±5.18%	99.29% ±2.26%	97.28% ±2.65%
Test	M: Encoding Layers	89.64%	0.78	90.02%	81.40%	85.49%
	M: Complete Autoencoder	86.10%	0.70	82.86%	79.34%	81.06%
	WBC: Encoding Layers	97.34%	0.95	99.99%	92.86%	96.30%
	WBC: Complete Autoencoder	95.74%	0.91	98.44%	90.00%	94.03%

The presented results in the first row (Train) are the 10-fold cross-validation mean values, at the validation set, by selecting the best performing model according to its F₁ score. The second row (Test) gathers the results when evaluating the models on the testing phase. For both datasets, two thirds of the data were used in the training phase, and one third as the held-out in the test phase. M represents the Malaria dataset, and WBC the Wisconsin Breast Cancer one