Skip to main content
. 2020 Oct 26;22(10):e20891. doi: 10.2196/20891

Table 2.

Comparison of the experimental results for the five different MNIST cases described in the Methods.a

Experiments AUROCb (95% CI) F1-score (95% CI) Precision (95% CI) Recall (95% CI)
CMLc 0.999 (0.999-0.999) 0.981 (0.978-0.983) 0.981 (0.972-0.989) 0.981 (0.971-0.989)
Basic FLd 0.997 (0.996-0.998) 0.946 (0.941-0.950) 0.945 (0.929-0.959) 0.945 (0.930-0.959)
Imbalanced FL 0.995 (0.994-0.995) 0.921 (0.917-0.927) 0.920 (0.904-0.937) 0.920 (0.903-0.937)
Skewed FL 0.992 (0.991-0.993) 0.905 (0.899-0.911) 0.905 (0.885-0.922) 0.904 (0.885-0.920)
Imbalanced and skewed FL 0.990 (0.989-0.991) 0.891 (0.884-0.896) 0.890 (0.869-0.909) 0.889 (0.868-0.908)

aAll experiments used the same model and hyperparameters. All results are presented with a 95% CI by resampling the validation task 100 times.

bAUROC: area under the receiver operating characteristic curve.

cCML: centralized traditional machine-learning method.

dFL: federated learning.