Skip to main content
. 2021 Apr 12;38(11):1994–2023. doi: 10.1039/d1np00006c

Selected pros and cons of different computational methods for enzyme discovery covered in this review.

In silico methods for enzyme discovery Phylogenetics Sequence similarity networking Genome neighborhoods and protein interaction networks 3D-structural methods, motifs, and active site residues Machine learning
Pros • Longstanding, well-established methods to investigate functional relationships between proteins • Intuitive graphical representation of thousands of protein sequences simultaneously • Guilt-by-association methods can reveal new functional relationships for proteins independent of primary sequence • Variations in active site architecture can have large consequences for biocatalysis → handles for discovery • Deep learning, transfer learning, and autoencoding methods useful to learn complex or hidden relationships for functional inference
• Insights into evolution of protein families, e.g., through ancestral sequence reconstruction • Allows users to quickly identify clusters without known representatives in sequence space • Unusual co-occurring domains or interacting proteins are new targets for enzyme discovery • Structural motifs are useful for searches independent of full-length primary sequence • Capable of recognizing patterns in big metagenomic datasets
Cons • Heavily influenced by the quality of the underlying sequence alignment • Pruning of SSNs by BLAST e-value can be subjective • Analysis of gene neighborhoods from metagenomes requires assembly → introduces errors and not always possible to recover flanking genes for lowly-abundant organisms • Similar structural folds catalyze a wide range of different reactions • Requires a large quantity of ‘labeled’ e.g., experimentally-verified training data
• Not all biosynthetic domains have a consistent or strong phylogenetic signal • Unclear how to handle or gain functional insights from ‘singletons’ • Relatively few structures solved from metagenomic sources • Classification systems limited in their ability to predict entirely new enzyme functions