Table 2.
Comparison of the experimental results for the five different MNIST cases described in the Methods.a
Experiments | AUROCb (95% CI) | F1-score (95% CI) | Precision (95% CI) | Recall (95% CI) |
CMLc | 0.999 (0.999-0.999) | 0.981 (0.978-0.983) | 0.981 (0.972-0.989) | 0.981 (0.971-0.989) |
Basic FLd | 0.997 (0.996-0.998) | 0.946 (0.941-0.950) | 0.945 (0.929-0.959) | 0.945 (0.930-0.959) |
Imbalanced FL | 0.995 (0.994-0.995) | 0.921 (0.917-0.927) | 0.920 (0.904-0.937) | 0.920 (0.903-0.937) |
Skewed FL | 0.992 (0.991-0.993) | 0.905 (0.899-0.911) | 0.905 (0.885-0.922) | 0.904 (0.885-0.920) |
Imbalanced and skewed FL | 0.990 (0.989-0.991) | 0.891 (0.884-0.896) | 0.890 (0.869-0.909) | 0.889 (0.868-0.908) |
aAll experiments used the same model and hyperparameters. All results are presented with a 95% CI by resampling the validation task 100 times.
bAUROC: area under the receiver operating characteristic curve.
cCML: centralized traditional machine-learning method.
dFL: federated learning.