Skip to main content
. 2023 May 31;6:0153. doi: 10.34133/research.0153
Key points:
  • An HDMLF framework is proposed to predict EC numbers by using protein sequence data.
  • A protein language model and an extreme multilabel classifier are adopted to reduce the heavy head-crafted feature engineering and elevate the prediction performance.
  • The proposed framework remarkably outperforms the existing state-of-the-art method in terms of accuracy and mF1 score by 70% and 20%, respectively.
  • An online service and an offline bundle are provided for end users to annotate EC numbers in high throughput easily and efficiently.