Table 2.
Model | MMD↓ | Gauss. MMD↓ | MRR↑ | ΔEntropy | ΔDistance | |
---|---|---|---|---|---|---|
Positive Control | 0.011 ± 0.000 | 0.010 ± 0.000 | 0.893 ± 0.016 | 0.966 ± 0.018 | 0.002 ± 0.006 | −0.000 ± 0.001 |
Negative Control | 1.016 ± 0.000 | 0.935 ± 0.000 | 0.090 ± 0.000 | 0.099 ± 0.001 | 0.728 ± 0.006 | 1.843 ± 0.001 |
ProteoGAN | 0.043 ± 0.001 | 0.027 ± 0.001 | 0.554 ± 0.031 | 0.709 ± 0.034 | −0.010 ± 0.010 | 0.012 ± 0.004 |
Predictorguided | 0.026 ± 0.001 | 0.018 ± 0.000 | 0.114 ± 0.007 | 0.136 ± 0.016 | 0.014 ± 0.009 | 0.001 ± 0.003 |
Non-Hierarchical | 0.337 ± 0.118 | 0.242 ± 0.096 | 0.306 ± 0.034 | 0.406 ± 0.039 | −0.352 ± 0.178 | 0.290 ± 0.171 |
ProGen | 0.048 | 0.030 | 0.394 | 0.556 | −0.156 | 0.037 |
CVAE | 0.232 ± 0.078 | 0.148 ± 0.058 | 0.301 ± 0.053 | 0.424 ± 0.083 | 0.247 ± 0.027 | 0.145 ± 0.085 |
OpC-ngram | 0.056 ± 0.001 | 0.034 ± 0.001 | 0.402 ± 0.018 | 0.505 ± 0.034 | 0.208 ± 0.006 | −0.050 ± 0.002 |
OpC-HMM | 0.170 ± 0.003 | 0.108 ± 0.002 | 0.095 ± 0.001 | 0.143 ± 0.002 | −0.579 ± 0.014 | 0.199 ± 0.004 |
OpL-GAN | 0.036 | 0.023 | 0.597 | 0.747 | −0.062 | 0.022 |
OpL-ngram | 0.060 ± 0.001 | 0.037 ± 0.001 | 0.329 ± 0.009 | 0.396 ± 0.009 | 0.232 ± 0.007 | −0.053 ± 0.002 |
OpL-HMM | 0.195 ± 0.002 | 0.126 ± 0.002 | 0.100 ± 0.003 | 0.147 ± 0.002 | −0.654 ± 0.015 | 0.244 ± 0.004 |
ProteoGAN (100 labels) | 0.036 | 0.024 | 0.585 | 0.736 | −0.026 | 0.019 |
ProteoGAN (200 labels) | 0.162 | 0.112 | 0.374 | 0.524 | 0.104 | 0.051 |
Note: An arrow indicates that lower (↓) or higher (↑) is better. The positive control is a sample of real sequences and simulates a perfect model, the negative control is a sample that simulates the worst possible model for each metric (constant sequence for MMD, randomized labels for MRR, repeated sequences for diversity measures). Best results in bold, second best underlined. Given are mean and standard deviation over five data splits. Due to the computational effort, OpL-GAN and ProGen were only trained on one split.