Skip to main content
. 2020 Jul 20;8(7):e18910. doi: 10.2196/18910

Table 2.

Comparison of accuracy scores of five supervised machine learning models trained on real data and synthetic data across 19 datasets. Increase or decrease in accuracy compared with the model trained on real data shown in parentheses.

Dataset and training seta Machine learning algorithm accuracy

SGDb DTc KNNd RFe SVMf
A





Real 0.962 1.000 (Wg) 0.975 0.997 0.974

CARTh 0.966 (+0.004) 0.950 (–0.050) 0.967 (–0.008) 0.965 (–0.032) 0.969 (W) (–0.005)

Parametric 0.932 (–0.030) 0.907 (–0.093) 0.931 (–0.044) 0.927 (–0.070) 0.946 (W) (–0.028)

Bayesian 0.954 (–0.011) 0.924 (–0.076) 0.963 (–0.012) 0.947 (–0.050) 0.967 (W) (–0.007)
B





Real 0.668 0.931 (W) 0.758 0.924 0.83

CART 0.652 (–0.016) 0.698 (–0.233) 0.765 (+0.007) 0.749 (–0.175) 0.784 (W) (–0.046)

Parametric 0.706 (+0.048) 0.700 (–0.231) 0.748 (–0.010) 0.726 (–0.198) 0.753 (W) (–0.077)

Bayesian 0.674 (+0.006) 0.712 (–0.219) 0.744 (–0.014) 0.741 (–0.183) 0.770 (W) (–0.060)
C





Real 0.629 1.000 (W) 0.784 0.983 0.905

CART 0.603 (–0.026) 0.652 (–0.348) 0.662 (–0.122) 0.676 (–0.307) 0.729 (W) (–0.176)

Parametric 0.707 (+0.078) 0.702 (–0.298) 0.652 (–0.132) 0.709 (W) (–0.272) 0.700 (–0.205)

Bayesian 0.662 (+0.033) 0.709 (–0.291) 0.664 (–0.144) 0.747 (W) (–0.236) 0.710 (–0.195)
D





Real 0.632 1.000 (W) 0.726 0.962 0.66

CART 0.502 (–0.130) 0.664 (–0.336) 0.542 (–0.184) 0.706 (W) (–0.254) 0.536 (–0.124)

Parametric 0.472 (–0.160) 0.666 (W) (–0.334) 0.508 (–0.218) 0.628 (–0.334) 0.545 (–0.115)

Bayesian 0.438 (–0.194) 0.592 (–0.408) 0.511 (–0.215) 0.649 (W) (–0.313) 0.557 (–0.103)
E





Real 0.995 1.000 (W) 0.981 1.000 (W) 0.995

CART 0.972 (–0.023) 0.944 (–0.056) 0.967 (–0.014) 0.995 (W) (–0.005) 0.994 (–0.001)

Parametric 0.964 (–0.031) 0.981 (–0.019) 0.965 (–0.016) 0.988 (W) (–0.012) 0.988 (W) (–0.007)

Bayesian 0.986 (–0.009) 0.957 (–0.043) 0.974 (–0.007) 0.992 (–0.008) 0.993 (W) (–0.002)
F





Real 0.89 0.985 (W) 0.912 0.982 0.913

CART 0.869 (–0.021) 0.922 (W) (–0.063) 0.883 (–0.029) 0.921 (–0.061) 0.889 (–0.024)

Parametric 0.873 (–0.017) 0.907 (–0.078) 0.886 (–0.026) 0.914 (W) (–0.068) 0.894 (–0.019)

Bayesian 0.880 (–0.010) 0.918 (–0.067) 0.885 (–0.027) 0.924 (W) (–0.058) 0.893 (–0.020)
G





Real 0.746 0.959 0.78 0.971 (W) 0.82

CART 0.667 (–0.079) 0.848 (W) (–0.111) 0.678 (–0.102) 0.841 (–0.070) 0.748 (–0.072)

Parametric 0.669 (–0.077) 0.805 (W) (–0.154) 0.676 (–0.104) 0.801 (–0.107) 0.737 (–0.083)

Bayesian 0.676 (–0.070) 0.835 (W) (–0.124) 0.676 (–0.104) 0.822 (–0.149) 0.739 (–0.081)
H





Real 1.000 (W) 0.997 0.98 0.994 0.992

CART 0.940 (–0.060) 0.941 (–0.056) 0.891 (–0.089) 0.958 (W) (–0.036) 0.955 (–0.037)

Parametric 0.935 (–0.065) 0.951 (–0.046) 0.898 (–0.082) 0.959 (W) (–0.135) 0.959 (W) (–0.032)

Bayesian 0.940 (–0.060) 0.952 (–0.045) 0.899 (–0.081) 0.955 (–0.139) 0.959 (W) (–0.032)
I





Real 0.706 0.845 0.711 0.896 (W) 0.676

CART 0.594 (–0.112) 0.643 (–0.202) 0.634 (–0.077) 0.671 (W) (–0.225) 0.609 (–0.067)

Parametric 0.570 (–0.136) 0.638 (–0.207) 0.624 (–0.087) 0.663 (W) (–0.233) 0.608 (–0.068)

Bayesian 0.609 (–0.097) 0.648 (–0.197) 0.629 (–0.082) 0.667 (W) (–0.229) 0.622 (–0.054)
J





Real 0.453 0.981 (W) 0.642 0.981 (W) 0.651

CART 0.526 (+0.073) 0.655 (W) (–0.326) 0.579 (–0.063) 0.649 (–0.332) 0.551 (–0.100)

Parametric 0.555 (+0.102) 0.689 (W) (–0.292) 0.606 (–0.036) 0.628 (–0.354) 0.549 (–0.102)

Bayesian 0.545 (+0.092) 0.585 (–0.396) 0.585 (–0.057) 0.602 (W) (–0.379) 0.551 (–0.100)
K





Real 0.551 0.845 0.864 0.885 (W) 0.551

CART 0.510 (–0.041) 0.531 (W) (–0.314) 0.531 (W) (–0.333) 0.512 (–0.373) 0.531 (W) (–0.020)

Parametric 0.514 (–0.037) 0.545 (W) (–0.300) 0.510 (–0.354) 0.519 (–0.366) 0.531 (–0.020)

Bayesian 0.490 (–0.061) 0.538 (W) (–0.307) 0.510 (–0.354) 0.531 (–0.354) 0.510 (–0.041)
L





Real 0.851 1.000 (W) 0.861 0.977 0.865

CART 0.791 (–0.060) 0.781 (–0.219) 0.758 (–0.103) 0.803 (W) (–0.174) 0.785 (–0.080)

Parametric 0.822 (W) (–0.029) 0.758 (–0.242) 0.786 (–0.075) 0.809 (–0.168) 0.793 (–0.072)

Bayesian 0.785 (–0.066) 0.738 (–0.262) 0.818 (–0.043) 0.799 (–0.178) 0.834 (W) (–0.031)
M





Real 0.899 1.000 (W) 0.838 0.986 0.939

CART 0.726 (–0.173) 0.762 (–0.238) 0.762 (–0.076) 0.780 (–0.206) 0.782 (W) (–0.157)

Parametric 0.739 (–0.160) 0.765 (–0.235) 0.757 (–0.081) 0.772 (–0.214) 0.796 (W) (–0.143)

Bayesian 0.681 (–0.218) 0.662 (–0.338) 0.703 (–0.135) 0.746 (–0.240) 0.780 (W) (–0.159)
N





Real 0.713 0.908 (W) 0.713 0.908 (W) 0.713

CART 0.706 (–0.007) 0.667 (–0.241) 0.715 (+0.002) 0.680 (–0.228) 0.720(W) (+0.007)

Parametric 0.644 (–0.067) 0.614 (–0.294) 0.706 (–0.007) 0.646 (–0.262) 0.708 (W) (–0.005)

Bayesian 0.559 (–0.154) 0.591 (–0.317) 0.706 (W) (–0.007) 0.630 (–0.278) 0.694 (–0.019)
O





Real 0.449 0.732 0.458 0.762 (W) 0.56

CART 0.338 (–0.111) 0.401 (–0.331) 0.411 (–0.047) 0.410 (–0.352) 0.425 (W) (–0.135)

Parametric 0.317 (–0.192) 0.377 (–0.355) 0.413 (–0.045) 0.397 (–0.365) 0.433 (W) (–0.127)

Bayesian 0.293 (–0.156) 0.336 (–0.396) 0.375 (–0.083) 0.361 (–0.401) 0.419 (W) (–0.141)
P





Real 0.981 0.985 (W) 0.981 0.982 0.981

CART 0.981 (W) (0.000) 0.977 (–0.008) 0.981 (W) (0.000) 0.981 (W) (–0.001) 0.981 (W) (0.000)

Parametric 0.981 (W) (0.000) 0.976 (–0.009) 0.981 (W) (0.000) 0.981 (W) (–0.001) 0.981 (W) (0.000)

Bayesian 0.981 (W) (0.000) 0.977 (–0.008) 0.981 (W) (0.000) 0.981 (W) (–0.001) 0.981 (W) (0.000)
Q





Real 0.84 0.932 (W) 0.853 0.928 0.851

CART 0.834 (–0.006) 0.795 (–0.137) 0.850 (–0.003) 0.835 (–0.093) 0.851 (W) (0.000)

Parametric 0.798 (–0.042) 0.811 (–0.121) 0.848 (–0.005) 0.838 (–0.090) 0.849 (W) (–0.002)

Bayesian 0.823 (–0.017) 0.794 (–0.138) 0.846 (–0.007) 0.837 (–0.091) 0.851 (W) (0.000)
R





Real 0.755 0.989 (W) 0.795 0.961 0.738

CART 0.742 (–0.013) 0.819 (–0.170) 0.761 (–0.034) 0.825 (W) (–0.136) 0.733 (–0.005)

Parametric 0.749 (–0.006) 0.786 (–0.203) 0.764 (–0.031) 0.798 (W) (–0.163) 0.734 (–0.004)

Bayesian 0.748 (–0.007) 0.835 (W) (–0.154) 0.762 (–0.033) 0.832 (–0.129) 0.734 (–0.004)
S





Real 0.958 1.000 (W) 0.921 1.000 (W) 0.953

CART 0.903 (–0.055) 0.901 (–0.099) 0.899 (–0.022) 0.935 (W) (–0.065) 0.913 (–0.040)

Parametric 0.890 (–0.068) 0.913 (–0.087) 0.912 (–0.009) 0.930 (W) (–0.060) 0.926 (–0.027)

Bayesian 0.905 (–0.053) 0.914 (–0.086) 0.908 (–0.013) 0.936 (W) (–0.064) 0.930 (–0.023)

aTraining dataset name indicates if real or synthetic data were used to train the model and for synthetic datasets which synthetic data generator was used (ie, CART, parametric, or Bayesian).

bSGD: stochastic gradient descent.

cDT: decision tree.

dKNN: k-nearest neighbors.

eRF: random forest.

fSVM: support vector machine.

g(W) highlights the winning classifier for each training set.

hCART: classification and regression trees.