. 2021 Apr 12;38(11):1994–2023. doi: 10.1039/d1np00006c

Selected pros and cons of different computational methods for enzyme discovery covered in this review.

In silico methods for enzyme discovery	Phylogenetics	Sequence similarity networking	Genome neighborhoods and protein interaction networks	3D-structural methods, motifs, and active site residues	Machine learning
Pros	• Longstanding, well-established methods to investigate functional relationships between proteins	• Intuitive graphical representation of thousands of protein sequences simultaneously	• Guilt-by-association methods can reveal new functional relationships for proteins independent of primary sequence	• Variations in active site architecture can have large consequences for biocatalysis → handles for discovery	• Deep learning, transfer learning, and autoencoding methods useful to learn complex or hidden relationships for functional inference
Pros	• Insights into evolution of protein families, e.g., through ancestral sequence reconstruction	• Allows users to quickly identify clusters without known representatives in sequence space	• Unusual co-occurring domains or interacting proteins are new targets for enzyme discovery	• Structural motifs are useful for searches independent of full-length primary sequence	• Capable of recognizing patterns in big metagenomic datasets
Cons	• Heavily influenced by the quality of the underlying sequence alignment	• Pruning of SSNs by BLAST e-value can be subjective	• Analysis of gene neighborhoods from metagenomes requires assembly → introduces errors and not always possible to recover flanking genes for lowly-abundant organisms	• Similar structural folds catalyze a wide range of different reactions	• Requires a large quantity of ‘labeled’ e.g., experimentally-verified training data
Cons	• Not all biosynthetic domains have a consistent or strong phylogenetic signal	• Unclear how to handle or gain functional insights from ‘singletons’		• Relatively few structures solved from metagenomic sources	• Classification systems limited in their ability to predict entirely new enzyme functions