Skip to main content
. 2020 Jan 8;36(8):2401–2409. doi: 10.1093/bioinformatics/btaa003

Table 1.

EC classification accuracy on the custom EC40 and EC50 datasets

Level EC40
EC50
0 1 2 0 1 2
Baseline Seq; non-red. 0.83 0.38 0.25 0.88 0.71 0.70
Seq 0.84 0.61 0.47 0.92 0.80 0.79
Seq+PSSM; non-red.; clean 0.91 0.84 0.72 0.95 0.94 0.91
Seq+PSSM; non-red.; leak. 0.92 0.85 0.71 0.95 0.95 0.92
UDSMProt Fwd; pretr.; non-red. 0.82 0.79 0.71 0.93 0.94 0.92
Fwd; from scratch 0.87 0.79 0.74 0.94 0.94 0.92
Fwd; pretr. 0.89 0.84 0.83 0.95 0.96 0.94
Bwd; pretr. 0.90 0.85 0.81 0.95 0.96 0.94
Fwd+bwd; pretr. 0.91 0.87 0.84 0.96 0.97 0.95

Note: The best-performing classifiers are marked in bold face.

Fwd/bwd, training in forward/backward direction; seq, raw sequence as input; non-red, training on non-redundant sequences, i.e. representatives only; pretr., using language model pre-training; leak., leakage PSSM features computed on the full dataset.