Table 2.
Comparison of accuracy scores of five supervised machine learning models trained on real data and synthetic data across 19 datasets. Increase or decrease in accuracy compared with the model trained on real data shown in parentheses.
| Dataset and training seta | Machine learning algorithm accuracy | |||||
|
|
SGDb | DTc | KNNd | RFe | SVMf | |
| A |
|
|
|
|
|
|
|
|
Real | 0.962 | 1.000 (Wg) | 0.975 | 0.997 | 0.974 |
|
|
CARTh | 0.966 (+0.004) | 0.950 (–0.050) | 0.967 (–0.008) | 0.965 (–0.032) | 0.969 (W) (–0.005) |
|
|
Parametric | 0.932 (–0.030) | 0.907 (–0.093) | 0.931 (–0.044) | 0.927 (–0.070) | 0.946 (W) (–0.028) |
|
|
Bayesian | 0.954 (–0.011) | 0.924 (–0.076) | 0.963 (–0.012) | 0.947 (–0.050) | 0.967 (W) (–0.007) |
| B |
|
|
|
|
|
|
|
|
Real | 0.668 | 0.931 (W) | 0.758 | 0.924 | 0.83 |
|
|
CART | 0.652 (–0.016) | 0.698 (–0.233) | 0.765 (+0.007) | 0.749 (–0.175) | 0.784 (W) (–0.046) |
|
|
Parametric | 0.706 (+0.048) | 0.700 (–0.231) | 0.748 (–0.010) | 0.726 (–0.198) | 0.753 (W) (–0.077) |
|
|
Bayesian | 0.674 (+0.006) | 0.712 (–0.219) | 0.744 (–0.014) | 0.741 (–0.183) | 0.770 (W) (–0.060) |
| C |
|
|
|
|
|
|
|
|
Real | 0.629 | 1.000 (W) | 0.784 | 0.983 | 0.905 |
|
|
CART | 0.603 (–0.026) | 0.652 (–0.348) | 0.662 (–0.122) | 0.676 (–0.307) | 0.729 (W) (–0.176) |
|
|
Parametric | 0.707 (+0.078) | 0.702 (–0.298) | 0.652 (–0.132) | 0.709 (W) (–0.272) | 0.700 (–0.205) |
|
|
Bayesian | 0.662 (+0.033) | 0.709 (–0.291) | 0.664 (–0.144) | 0.747 (W) (–0.236) | 0.710 (–0.195) |
| D |
|
|
|
|
|
|
|
|
Real | 0.632 | 1.000 (W) | 0.726 | 0.962 | 0.66 |
|
|
CART | 0.502 (–0.130) | 0.664 (–0.336) | 0.542 (–0.184) | 0.706 (W) (–0.254) | 0.536 (–0.124) |
|
|
Parametric | 0.472 (–0.160) | 0.666 (W) (–0.334) | 0.508 (–0.218) | 0.628 (–0.334) | 0.545 (–0.115) |
|
|
Bayesian | 0.438 (–0.194) | 0.592 (–0.408) | 0.511 (–0.215) | 0.649 (W) (–0.313) | 0.557 (–0.103) |
| E |
|
|
|
|
|
|
|
|
Real | 0.995 | 1.000 (W) | 0.981 | 1.000 (W) | 0.995 |
|
|
CART | 0.972 (–0.023) | 0.944 (–0.056) | 0.967 (–0.014) | 0.995 (W) (–0.005) | 0.994 (–0.001) |
|
|
Parametric | 0.964 (–0.031) | 0.981 (–0.019) | 0.965 (–0.016) | 0.988 (W) (–0.012) | 0.988 (W) (–0.007) |
|
|
Bayesian | 0.986 (–0.009) | 0.957 (–0.043) | 0.974 (–0.007) | 0.992 (–0.008) | 0.993 (W) (–0.002) |
| F |
|
|
|
|
|
|
|
|
Real | 0.89 | 0.985 (W) | 0.912 | 0.982 | 0.913 |
|
|
CART | 0.869 (–0.021) | 0.922 (W) (–0.063) | 0.883 (–0.029) | 0.921 (–0.061) | 0.889 (–0.024) |
|
|
Parametric | 0.873 (–0.017) | 0.907 (–0.078) | 0.886 (–0.026) | 0.914 (W) (–0.068) | 0.894 (–0.019) |
|
|
Bayesian | 0.880 (–0.010) | 0.918 (–0.067) | 0.885 (–0.027) | 0.924 (W) (–0.058) | 0.893 (–0.020) |
| G |
|
|
|
|
|
|
|
|
Real | 0.746 | 0.959 | 0.78 | 0.971 (W) | 0.82 |
|
|
CART | 0.667 (–0.079) | 0.848 (W) (–0.111) | 0.678 (–0.102) | 0.841 (–0.070) | 0.748 (–0.072) |
|
|
Parametric | 0.669 (–0.077) | 0.805 (W) (–0.154) | 0.676 (–0.104) | 0.801 (–0.107) | 0.737 (–0.083) |
|
|
Bayesian | 0.676 (–0.070) | 0.835 (W) (–0.124) | 0.676 (–0.104) | 0.822 (–0.149) | 0.739 (–0.081) |
| H |
|
|
|
|
|
|
|
|
Real | 1.000 (W) | 0.997 | 0.98 | 0.994 | 0.992 |
|
|
CART | 0.940 (–0.060) | 0.941 (–0.056) | 0.891 (–0.089) | 0.958 (W) (–0.036) | 0.955 (–0.037) |
|
|
Parametric | 0.935 (–0.065) | 0.951 (–0.046) | 0.898 (–0.082) | 0.959 (W) (–0.135) | 0.959 (W) (–0.032) |
|
|
Bayesian | 0.940 (–0.060) | 0.952 (–0.045) | 0.899 (–0.081) | 0.955 (–0.139) | 0.959 (W) (–0.032) |
| I |
|
|
|
|
|
|
|
|
Real | 0.706 | 0.845 | 0.711 | 0.896 (W) | 0.676 |
|
|
CART | 0.594 (–0.112) | 0.643 (–0.202) | 0.634 (–0.077) | 0.671 (W) (–0.225) | 0.609 (–0.067) |
|
|
Parametric | 0.570 (–0.136) | 0.638 (–0.207) | 0.624 (–0.087) | 0.663 (W) (–0.233) | 0.608 (–0.068) |
|
|
Bayesian | 0.609 (–0.097) | 0.648 (–0.197) | 0.629 (–0.082) | 0.667 (W) (–0.229) | 0.622 (–0.054) |
| J |
|
|
|
|
|
|
|
|
Real | 0.453 | 0.981 (W) | 0.642 | 0.981 (W) | 0.651 |
|
|
CART | 0.526 (+0.073) | 0.655 (W) (–0.326) | 0.579 (–0.063) | 0.649 (–0.332) | 0.551 (–0.100) |
|
|
Parametric | 0.555 (+0.102) | 0.689 (W) (–0.292) | 0.606 (–0.036) | 0.628 (–0.354) | 0.549 (–0.102) |
|
|
Bayesian | 0.545 (+0.092) | 0.585 (–0.396) | 0.585 (–0.057) | 0.602 (W) (–0.379) | 0.551 (–0.100) |
| K |
|
|
|
|
|
|
|
|
Real | 0.551 | 0.845 | 0.864 | 0.885 (W) | 0.551 |
|
|
CART | 0.510 (–0.041) | 0.531 (W) (–0.314) | 0.531 (W) (–0.333) | 0.512 (–0.373) | 0.531 (W) (–0.020) |
|
|
Parametric | 0.514 (–0.037) | 0.545 (W) (–0.300) | 0.510 (–0.354) | 0.519 (–0.366) | 0.531 (–0.020) |
|
|
Bayesian | 0.490 (–0.061) | 0.538 (W) (–0.307) | 0.510 (–0.354) | 0.531 (–0.354) | 0.510 (–0.041) |
| L |
|
|
|
|
|
|
|
|
Real | 0.851 | 1.000 (W) | 0.861 | 0.977 | 0.865 |
|
|
CART | 0.791 (–0.060) | 0.781 (–0.219) | 0.758 (–0.103) | 0.803 (W) (–0.174) | 0.785 (–0.080) |
|
|
Parametric | 0.822 (W) (–0.029) | 0.758 (–0.242) | 0.786 (–0.075) | 0.809 (–0.168) | 0.793 (–0.072) |
|
|
Bayesian | 0.785 (–0.066) | 0.738 (–0.262) | 0.818 (–0.043) | 0.799 (–0.178) | 0.834 (W) (–0.031) |
| M |
|
|
|
|
|
|
|
|
Real | 0.899 | 1.000 (W) | 0.838 | 0.986 | 0.939 |
|
|
CART | 0.726 (–0.173) | 0.762 (–0.238) | 0.762 (–0.076) | 0.780 (–0.206) | 0.782 (W) (–0.157) |
|
|
Parametric | 0.739 (–0.160) | 0.765 (–0.235) | 0.757 (–0.081) | 0.772 (–0.214) | 0.796 (W) (–0.143) |
|
|
Bayesian | 0.681 (–0.218) | 0.662 (–0.338) | 0.703 (–0.135) | 0.746 (–0.240) | 0.780 (W) (–0.159) |
| N |
|
|
|
|
|
|
|
|
Real | 0.713 | 0.908 (W) | 0.713 | 0.908 (W) | 0.713 |
|
|
CART | 0.706 (–0.007) | 0.667 (–0.241) | 0.715 (+0.002) | 0.680 (–0.228) | 0.720(W) (+0.007) |
|
|
Parametric | 0.644 (–0.067) | 0.614 (–0.294) | 0.706 (–0.007) | 0.646 (–0.262) | 0.708 (W) (–0.005) |
|
|
Bayesian | 0.559 (–0.154) | 0.591 (–0.317) | 0.706 (W) (–0.007) | 0.630 (–0.278) | 0.694 (–0.019) |
| O |
|
|
|
|
|
|
|
|
Real | 0.449 | 0.732 | 0.458 | 0.762 (W) | 0.56 |
|
|
CART | 0.338 (–0.111) | 0.401 (–0.331) | 0.411 (–0.047) | 0.410 (–0.352) | 0.425 (W) (–0.135) |
|
|
Parametric | 0.317 (–0.192) | 0.377 (–0.355) | 0.413 (–0.045) | 0.397 (–0.365) | 0.433 (W) (–0.127) |
|
|
Bayesian | 0.293 (–0.156) | 0.336 (–0.396) | 0.375 (–0.083) | 0.361 (–0.401) | 0.419 (W) (–0.141) |
| P |
|
|
|
|
|
|
|
|
Real | 0.981 | 0.985 (W) | 0.981 | 0.982 | 0.981 |
|
|
CART | 0.981 (W) (0.000) | 0.977 (–0.008) | 0.981 (W) (0.000) | 0.981 (W) (–0.001) | 0.981 (W) (0.000) |
|
|
Parametric | 0.981 (W) (0.000) | 0.976 (–0.009) | 0.981 (W) (0.000) | 0.981 (W) (–0.001) | 0.981 (W) (0.000) |
|
|
Bayesian | 0.981 (W) (0.000) | 0.977 (–0.008) | 0.981 (W) (0.000) | 0.981 (W) (–0.001) | 0.981 (W) (0.000) |
| Q |
|
|
|
|
|
|
|
|
Real | 0.84 | 0.932 (W) | 0.853 | 0.928 | 0.851 |
|
|
CART | 0.834 (–0.006) | 0.795 (–0.137) | 0.850 (–0.003) | 0.835 (–0.093) | 0.851 (W) (0.000) |
|
|
Parametric | 0.798 (–0.042) | 0.811 (–0.121) | 0.848 (–0.005) | 0.838 (–0.090) | 0.849 (W) (–0.002) |
|
|
Bayesian | 0.823 (–0.017) | 0.794 (–0.138) | 0.846 (–0.007) | 0.837 (–0.091) | 0.851 (W) (0.000) |
| R |
|
|
|
|
|
|
|
|
Real | 0.755 | 0.989 (W) | 0.795 | 0.961 | 0.738 |
|
|
CART | 0.742 (–0.013) | 0.819 (–0.170) | 0.761 (–0.034) | 0.825 (W) (–0.136) | 0.733 (–0.005) |
|
|
Parametric | 0.749 (–0.006) | 0.786 (–0.203) | 0.764 (–0.031) | 0.798 (W) (–0.163) | 0.734 (–0.004) |
|
|
Bayesian | 0.748 (–0.007) | 0.835 (W) (–0.154) | 0.762 (–0.033) | 0.832 (–0.129) | 0.734 (–0.004) |
| S |
|
|
|
|
|
|
|
|
Real | 0.958 | 1.000 (W) | 0.921 | 1.000 (W) | 0.953 |
|
|
CART | 0.903 (–0.055) | 0.901 (–0.099) | 0.899 (–0.022) | 0.935 (W) (–0.065) | 0.913 (–0.040) |
|
|
Parametric | 0.890 (–0.068) | 0.913 (–0.087) | 0.912 (–0.009) | 0.930 (W) (–0.060) | 0.926 (–0.027) |
|
|
Bayesian | 0.905 (–0.053) | 0.914 (–0.086) | 0.908 (–0.013) | 0.936 (W) (–0.064) | 0.930 (–0.023) |
aTraining dataset name indicates if real or synthetic data were used to train the model and for synthetic datasets which synthetic data generator was used (ie, CART, parametric, or Bayesian).
bSGD: stochastic gradient descent.
cDT: decision tree.
dKNN: k-nearest neighbors.
eRF: random forest.
fSVM: support vector machine.
g(W) highlights the winning classifier for each training set.
hCART: classification and regression trees.