Table 1.
EC classification accuracy on the custom EC40 and EC50 datasets
Level | EC40 |
EC50 |
|||||
---|---|---|---|---|---|---|---|
0 | 1 | 2 | 0 | 1 | 2 | ||
Baseline | Seq; non-red. | 0.83 | 0.38 | 0.25 | 0.88 | 0.71 | 0.70 |
Seq | 0.84 | 0.61 | 0.47 | 0.92 | 0.80 | 0.79 | |
Seq+PSSM; non-red.; clean | 0.91 | 0.84 | 0.72 | 0.95 | 0.94 | 0.91 | |
Seq+PSSM; non-red.; leak. | 0.92 | 0.85 | 0.71 | 0.95 | 0.95 | 0.92 | |
UDSMProt | Fwd; pretr.; non-red. | 0.82 | 0.79 | 0.71 | 0.93 | 0.94 | 0.92 |
Fwd; from scratch | 0.87 | 0.79 | 0.74 | 0.94 | 0.94 | 0.92 | |
Fwd; pretr. | 0.89 | 0.84 | 0.83 | 0.95 | 0.96 | 0.94 | |
Bwd; pretr. | 0.90 | 0.85 | 0.81 | 0.95 | 0.96 | 0.94 | |
Fwd+bwd; pretr. | 0.91 | 0.87 | 0.84 | 0.96 | 0.97 | 0.95 |
Note: The best-performing classifiers are marked in bold face.
Fwd/bwd, training in forward/backward direction; seq, raw sequence as input; non-red, training on non-redundant sequences, i.e. representatives only; pretr., using language model pre-training; leak., leakage PSSM features computed on the full dataset.