. 2020 Jan 8;36(8):2401–2409. doi: 10.1093/bioinformatics/btaa003

Table 1.

EC classification accuracy on the custom EC40 and EC50 datasets

	Level	EC40			EC50
	Level	0	1	2	0	1	2
Baseline	Seq; non-red.	0.83	0.38	0.25	0.88	0.71	0.70
	Seq	0.84	0.61	0.47	0.92	0.80	0.79
	Seq+PSSM; non-red.; clean	0.91	0.84	0.72	0.95	0.94	0.91
	Seq+PSSM; non-red.; leak.	0.92	0.85	0.71	0.95	0.95	0.92
UDSMProt	Fwd; pretr.; non-red.	0.82	0.79	0.71	0.93	0.94	0.92
	Fwd; from scratch	0.87	0.79	0.74	0.94	0.94	0.92
	Fwd; pretr.	0.89	0.84	0.83	0.95	0.96	0.94
	Bwd; pretr.	0.90	0.85	0.81	0.95	0.96	0.94
	Fwd+bwd; pretr.	0.91	0.87	0.84	0.96	0.97	0.95

Note: The best-performing classifiers are marked in bold face.

Fwd/bwd, training in forward/backward direction; seq, raw sequence as input; non-red, training on non-redundant sequences, i.e. representatives only; pretr., using language model pre-training; leak., leakage PSSM features computed on the full dataset.