Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Sep 1.
Published in final edited form as: Trends Biochem Sci. 2022 Apr 13;47(9):785–794. doi: 10.1016/j.tibs.2022.03.017

Activity-based annotation: The emergence of systems biochemistry

Kyu Y Rhee 1,*, Robert S Jansen 2,*, Christoph Grundner 3,4,5,*
PMCID: PMC9378515  NIHMSID: NIHMS1798674  PMID: 35430135

Abstract

Current tools to annotate protein function have failed to keep pace with the speed of DNA sequencing and exponentially growing number of proteins of unknown function (PUFs). A major contributing factor to this mismatch is the historical lack of high throughput methods to experimentally determine biochemical activity. Activity-based methods, such as activity-based metabolite and -protein profiling, are emerging as new approaches for unbiased, global, biochemical annotation of protein function. In this review, we highlight recent experimental, activity-based approaches that offer new opportunities to determine protein function in a biologically agnostic and systems-level manner.

Keywords: Annotation, proteins of unknown function, activity-based metabolite profiling, activity-based protein profiling

The growing challenge of protein annotation

Nucleic acid sequencing technologies have permeated virtually every field of biological research, producing huge amounts of genomic data every day. The advent of metagenomic technologies has more recently made it possible to access the genomes of unculturable organisms and complex communities and generate hundreds to thousands of different genome sequences from a single analysis. This explosion of genomic data has been accompanied by an expansion of protein sequences from model organisms to ever more diverse members of the tree of life. The Global Ocean Sampling expedition, for example, almost doubled the number of known protein sequences, accessing a deep cache of proteins from aquatic microorganisms that were vastly underrepresented in the collective genome data [1]. This expansion of DNA sequence data to non-model organisms has been accompanied by a growing number of genes that lie beyond the reach of standard similarity-based bioinformatic annotation methods. While indispensable for annotation of newly sequenced genomes, existing bioinformatic methods have failed to annotate as much as 50% of putative protein coding sequences and left many known and essential predicted protein activities unannotated [24]. Accordingly, it has been estimated that over 30% of the proteins in well-characterized organisms such as Saccharomyces cerevisiae and humans are uncharacterized enzymes [5]. Plant genomes are even more sparsely annotated [68].

The scope of similarity-based annotations (see Glossary) for all but the most conserved proteins is further limited by drift in protein function and has inadvertently given rise to potentially significant rates of misannotation [9]. Indeed, some estimates indicate that nearly 50% of current similarity-based annotations could be inaccurate at the gene level [10], while as much as >80% provide only general class or protein superfamily level annotations [3,11]. Misannotation unfortunately propagates alongside correct annotation and erodes the overall annotation quality. Existing similarity-based annotations have thus proven insufficient to meet the challenge of protein annotation.

Protein function is a product of multiple factors, such as biochemical activity, cellular localization, pathway context, protein-protein interactions, and other binding partners such as ligands and nucleic acids and manifests as isolated molecular activity to higher level, physiologic phenotype. Function can coincide with different domains of the protein as in many signaling proteins with modular architecture or may even be housed in the same domain such as the phosphatase activity of bacterial histidine kinases [12]. Annotation of protein function thus requires knowledge beyond its isolated coding sequence and full annotation may require several independent annotations.

While a number of tools ranging from functional genomics to structure prediction have helped to fill these gaps on a more global and high-throughput scale, biochemical methods have lagged in their ability to achieve a similar systems-level scale. Here, we highlight two notable exceptions that enable high-throughput, systems biochemistry: Activity-based metabolite (ABMP) and -protein profiling (ABPP). We argue that these emerging tools have the potential to accelerate experimental and unbiased protein annotation and, in conjunction with existing bioinformatic tools, close the gap between protein sequence and biochemical function.

Improvements in existing annotation tools

Sequence-based annotation is becoming increasingly sophisticated and, in addition to homology-based methods, now incorporates context features that can increase the accuracy of prediction [13]. One such example is gene proximity, or synteny, either within or across genomes, which can indicate shared function, a concept that has long been known for bacteria but that also holds for eukaryotes [14]. As such, gene locus features are now part of most automatic annotation pipelines. Additional gene context-related features include co-expression and conserved, sequence-driven protein-protein interactions.

A mainstay of high-throughput functional annotation has been genome-scale mutagenesis coupled to whole genome sequencing. By screening a given library of mutants under a given condition, it has become possible to identify groups of genes whose loss of function is associated with a similar fitness cost, indicating potentially shared or related functions [15]. In cases where such gene sets include already annotated functions, this approach has proven especially informative by enabling direct experimental validation of the putative function [1619]. Interestingly, the predictive specificity of this approach was recently further expanded through the compilation of datasets from different experimental conditions into a single repository. Doing so allowed for the identification of potential functionally related genes not only through the presence of conserved genes across different gene sets but also through the identification of genes with similar or shared patterns of associated gene sets across experiments (i.e., gene set fingerprints) [20]. The advent of temporally and quantitatively tunable clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR interference (CRISPRi) [21] technologies have further expanded the experimental scope of such approaches, including for genes required for growth and viability [22,23].

Protein structure is significantly more conserved than nucleic acid and amino acid sequence [24], making structure the better predictor of function, especially for divergent proteins. However, the proportion of protein sequences with known structures remains small. For example, structure for only ~17% of amino acid residues in the human proteome has been determined experimentally to date [25,26]. That said, recent advances in protein structure prediction provide a viable way to scale structure prediction for annotation purposes for all organisms. One particularly groundbreaking advance was the very recent introduction of Google’s DeepMind AlphaFold2 algorithm [27], which makes use of neural network-based deep learning methods, and together with a similar approach from academia, RoseTTAfold [28], achieves near experimental accuracy. Already, AlphaFold2, in collaboration with EMBL’s European Bioinformatics Institute, predicted the structures of >150,000 proteins from high-interest organisms such as human, Plasmodium falciparum, and Mycobacterium tuberculosis (https://alphafold.ebi.ac.uk). With the source codes openly available, many new sequences can now be confidently predicted, and 58% of residues in the human proteome have already been predicted by AlphaFold2 with high confidence [29], providing a wealth of new starting points for functional annotation.

The emergence of systems biochemistry

From a biological perspective, biochemistry is the operational effector of cellular and organismal physiology. However, biochemistry’s breadth exceeds the experimental scope of available technologies. This is because the chemical diversity of biochemistry within a given cell or organism vastly exceeds that of nucleic acid-encoded genetics [30]. As such, experimental studies of biochemistry required focusing on specific, individual proteins that often began with an activity of interest and ideally led to the identification and study of an individual protein in isolation, dissociated from its native physiologic context [31,32]. The advent of molecular biology made it possible to identify proteins of interest on the basis of their genetic phenotype and sequence but then quickly returned to the need for biochemistry and its dependence on empirical candidate, trial-and-error-based experiments [33,34].

From a conceptual perspective, biochemistry can be experimentally conceptualized along two axes: one that spans the biological diversity of proteins studied and another spanning the chemical diversity of their substrates and/or enzymatic activities (Fig. 1). Classical biochemistry historically began with a single orphan enzyme activity detected in a complex cellular protein extract (Fig 1A). Only cumbersome enzyme activity-guided fractionation to purity, when successful, enabled identifying unique enzyme-substrate pairs. Success was limited to empirical, trial-and-error based purification approaches. Molecular biology enabled starting immediately with potential enzyme-substrate pairs though also with often insufficient and biased outside information to enable predictably meaningful progress.

Figure: Classical and activity-based biochemical analytical landscape.

Figure:

Schematics illustrating classical (A) and activity-based (B) biochemical experiments. A. In classical biochemistry, assays were limited to a single substrate and only cumbersome protein fractionation allowed identification of unique enzyme-substrate pairs. B. Recent activity-based biochemical approaches leverage the analytical power of mass spectrometry to cover the full metabolite and protein space, ABPP using activity-based probes and ABMP using purified protein. The axes on the top and left indicate the number of sampled proteins and metabolites that can be assayed by each method, the arrows indicate the experimental trajectory of proteomics and metabolomics from global to targeted assays. The ideal experimental space for high-throughput functional protein annotation is indicated in the bottom right quadrant. This prospective, true systems biochemistry approach will likely comprise elements of both ABPP and ABMP and additional approaches such as structural proteomics that can sample many proteins against many metabolites simultaneously.

A major barrier to achieving biologically unbiased systems level studies of biochemistry, in contrast to genetics, has been the analytical inability to systematically traverse the full chemical range of protein and biochemical activity space. While assays with relaxed substrate specificity allowed assigning class-level activities to enzymes (e.g., phosphatase, dehydrogenase), finding physiological substrates remained challenging [35]. Mass spectrometry (MS) has emerged as an analytical technique with high sensitivity towards broad chemical classes that is unbiased towards biology. Thankfully, recent advances in high resolution MS (coupled to high performance liquid chromatography) now allow the detection of thousands of proteins and metabolites in a single run and have made it possible to perform activity-based biochemical experiments covering the entire proteome and metabolome (Fig 1B). In particular, two recent examples of these technologies have been applied to enable biologically agnostic systems-level studies of physiologic biochemistry and new protein annotation.

Annotation by activity-based metabolite profiling

Genetic ablation of genes of unknown function in combination with metabolite profiling is becoming an effective technique to discover gene function [36]. However, the inference of actual catalytic activity from cellular metabolomes remains greatly hampered by metabolic plasticity, leading to pleiotropic effects on the metabolome of genetic mutants. ABMP has helped to overcome this challenge. ABMP involves incubating a purified, usually recombinant, protein of interest with a complex orthologous cellular metabolite extract that functions as substrate library (Fig. 1B) [37].Over the course of incubation, metabolite levels are monitored in an unbiased fashion to identify protein-dependent substrate consumption and product formation [37,38]. Unlike metabolite profiling of intact cells, ABMP thus provides direct, substrate-level enzyme annotation. Since its first reported use in 2006 [37], ABMP has been instrumental in the specific annotation of enzymes from a wide variety of enzyme classes, such as phosphatases [37,39], aminotransferases [16,40], lyases [41], monooxygenases [4244], and transketolases [45].

In its most basic and exploratory form, ABMP only involves a purified enzyme and a metabolite extract [39,41,45,46]. For example, Shen et al. demonstrated that CLYBL, a conserved mitochondrial enzyme of unknown function, has citramalyl-CoA lyase activity using this approach [41]. Similarly, de Carvalho et al. used basic ABMP to reannotate Rv1248, a protein from M. tuberculosis that was annotated as thiamine diphosphate-dependent alpha-ketoglutarate decarboxylase based on its sequence, as 2-hydroxy-3-oxoadipate synthase [45]. Metabolite extracts are expected to contain enzyme cofactors and substrates, but their concentrations can be at levels that prevent substrate or product detection. To increase the chance of annotation by ABMP, mixtures containing general cofactors such as NADH, ATP, and SAM have been used [37,47]. Similarly, low substrate concentration in metabolite extracts can be overcome by using metabolite extracts derived from gene knockouts that are expected to accumulate substrates immediately downstream of inactivated genes [48].

In cases where enzymes are annotated at the class-level, metabolite extracts can be altered in a more focused manner. In search for the substrates of UDP-glucosyltransferases from grapes, Bönisch et al. used a metabolite extract that was first treated with hydrolases to convert grape glycoconjugates into aglycone substrates for glucosyltransferases [49]. Using this approach, they were able to identify two putative UDP-glucosyltransferases from grape cv Pinot Noir as UDP-glucose:monoterpenol β-d-glucosyltransferases. Class-level knowledge on enzyme function additionally allows the use of labeled co-substrates. Jansen et al., for example, supplemented ABMP reactions of uncharacterized aminotransferases with 15N-labeled amino acid substrates which allowed the detection of 15N-amino acid products that remained undetectable using unlabeled substrates [40]. This approach led to the substrate-level annotation of three aminotransferases, a group of enzymes that are notoriously hard to functionally annotate based on sequence. The most extensive use of labeled co-substrates has been in the characterization of orphan cytochrome P450 enzymes. Although the addition of labeled oxygen is not essential [50], the addition of 18O2 in combination with isotope detection software allows specific filtering for oxygenation products [51] and has led to the characterization of several cytochrome P450 families [4244,50].

Finally, ABMP can be applied to confirm the specificity of functional annotation made by other techniques. Black et al., for example, used ABMP to confirm that a putative D-amino acid transaminase only accepted the expected D-amino acids [16]. Liscombe et al. similarly used a metabolite extract from Madagascar periwinkle to confirm the expected activity an S-adenosyl-L-methionine-dependent N-methyltransferase that catalyzes a nitrogen methylation involved in vindoline biosynthesis [52].

Together, the above examples demonstrate the versatility and power of ABMP in activity-based protein annotation. Importantly, the accessible enzyme chemistry is not limited by chemical probes and requires little to no prior knowledge of substrates. The rate of annotation is, however, severely hampered by the need for protein expression and purification. In a technical tour de force, Sévin et al. overcame this limitation by using an automated approach to express 1,275 orphan E. coli proteins and performing high-throughput ABMP [53]. This high-throughput but technically challenging approach led to the prediction of 241 protein functions, of which 12 were validated. Other limitations relate to the fact that metabolite extracts only represent a single metabolic state of the organism and are often limited to polar metabolites, thus underrepresenting the full biological metabolomic complexity. To overcome this limitation, different extractions can be performed on cells grown under various conditions – preferably linked to expression levels of the gene of interest.

At the same time, highly abundant but poor substrates can outcompete low abundant but preferred substrates [54]. ABMP experiments with three different aminotransferases, for example, all led to dominant formation of ketoglutaramate, the keto acid of glutamine. More controlled single-substrate assays later revealed that glutamine is actually a poor substrate and that the dominance of the observed ketoglutaramate was, in fact, an in vitro artefact [40].

Annotation by activity-based protein profiling

Similar to ABMP, the challenge of unbiased, experimental function finding is elegantly solved by an emerging chemoproteomic approach that directly detects biochemical activity through chemical probes - ABPP [55]. In ABPP, a selective chemical probe identifies a specific catalytic mechanism (Fig. 1B). Such probes require three elements: The reactive moiety that recognizes the catalytic mechanism, a selectivity scaffold, and a tag for detection and/or purification. The probe binds the catalytic site of the target protein in a mechanism-dependent way and thus assigns a biochemical function to target proteins proteome-wide. The labeling of the target protein, which is typically covalent, occurs by approaches such as photocrosslinking or in a way reminiscent of suicide substrates. For example, a well-characterized activity-based probe (ABP) modeled on ATP developed initially for the labeling of kinases (ATP-ABP) was designed by appending a reactive acylphosphate in the γ-phosphate position of ATP [56]. The resulting probe is a close mimic of ATP that binds to most ATP binding sites. Instead of transferring phosphates to a substrate, the probe appends an acylphosphate to one of two nucleophilic arginines that are typically found in kinases’ ATP binding sites to coordinate the β- and γ-phosphates of ATP, resulting in C-O bond cleavage and a stable acetamide-protein adduct. Through this mechanism, the probe identifies ATPases in a mechanism-dependent way with only small modifications to the natural substrate. By changing the adenosine scaffold of the probe, the selectivity can be switched to recognize other nucleotide binding proteins [57].

Importantly, ABPP by itself is not a high-throughput approach. ABPP only unfolds its full annotation potential in combination with MS, which can translate biochemical activity into peptide abundance and identifies the labeled proteins against the background of a complex proteome. The potential of combining ABPP with MS for annotation purposes was realized early [58,59]. For example, in a series of studies, the Cravatt lab identified ~80% of predicted human serine hydrolases using a fluorophosphonate probe [59]. Expanding this approach to microbial pathogens, two studies using the ATP-ABP in combination with quantitative shotgun proteomics identified 72 proteins of unknown function (PUFs) in M. tuberculosis [60] and 37 PUFs in P. falciparum [61] as putative ATPases. The same studies also confirmed the existing ATPase annotation of 240 and 141 proteins, respectively. As ABPP is agnostic of sequence, even the most distant enzyme outliers of a family can be detected. In a striking example of this independence from sequence, a serine hydrolase probe identified an enzyme without a conserved Ser that instead used a Thr-based nucleophile for catalyzing lipid hydrolysis [62], defining a distant hydrolase family. This example illustrates how activity-based biochemistry can unearth the most divergent (and convergent) members of a functional class that are invisible to sequence-based methods.

Detail of mechanistic insight and breadth of proteome coverage are tradeoffs in ABPP. For broad coverage, more probes for major classes of enzymes (i.e., hydrolase, oxidase) would be desirable. Even probes that only identify reactive amino acid residues can be of use and identify candidate enzymes. For example, reactive cysteines are characteristic identifiers for many enzyme classes, and their identification can be a first step towards those enzymes’ identification. The utility of such broad labeling was illustrated in studies that used iodoacetamide to identify and rank the reactivity of cellular cysteines [63] and sulfotetrafluorophenyl to detect reactive lysines [64].

Further, for annotation purposes, ABPP should ideally be broad and specific at the same time. One possible solution to this conundrum can perhaps be found in competitive ABPP. The idea of competitive ABPP is that the readout of binding of a more generic probe to a proteome can be refined by competition with ligands characteristic of smaller subfamilies of enzymes. For example, a widely used and very broadly reactive ABP detects serine hydrolases based on the reactivity of fluorophosphonate with serine nucleophiles. Serine hydrolases comprise many large subfamilies of enzymes such that the serine hydrolase annotation itself is of limited use. However, by competing for probe binding with substrates, for example with peptides, the serine hydrolases with protein or peptide substrates such as aminopeptidases might be identified. Similarly, probes with slightly different binding selectivity can parse activities more finely. This approach has been employed for the characterization of metalloproteases [65] and seems particularly useful for characterizing protease activities in general.

Most biochemical assays are targeted, and biochemical function-finding without any a priori information is often a lost cause. ABPP can provide such a priori information to point towards the relevant biochemistry. Once a general enzyme activity is established, the step to more targeted biochemical assays is usually short. For example, Ortega et al. followed a serine hydrolase screen in M. tuberculosis by a simple assay using a generic protease substrate, which identified several likely proteases among the hydrolases [66]. In this way, especially for initial annotation, breadth of probe binding, even at the expense of detail, can be desirable, and while the general assignment of an enzyme family can still be a long way from a full understanding of the protein’s function, it does inform and focus further studies. Every global ABPP-MS experiment is inherently an exercise in annotation: Any PUFs that are detected by the probe can be tentatively assigned to the protein family defined by the probe, and any annotation that differs from that suggested by the probe may warrant another look into the strength of the conflicting annotation.

Applying structural proteomics to annotation

ABPP is bottlenecked by the availability of suitable probes, and ABMP by the availability of catalytically active recombinant protein. Another central limitation of ABPP and ABMP is that non-catalytic protein families such as structural proteins, transcription factors, or transporters are not tractable by these activity-based approaches. What other approaches might access PUFs more comprehensively? Another set of promising systems biochemistry approaches that probe ligand binding to proteins proteome-wide has recently been developed and could address several limitations of the activity-based approaches. A protein’s function is defined to a large part by its ligands, and for an enzyme, by its substrate(s). Ligand binding often affects the structure and/or the stability of a protein, and such changes have now been measured proteome-wide by structural proteomics approaches such as thermal protein profiling (TPP) [67] and limited proteolysis-MS (LiP-MS) [68,69]. A protein that binds a ligand is more (sometimes less) stable towards heat denaturation, altering the protein’s abundance in solution after heating and precipitation. In the case of LiP-MS, ligand binding translates into altered susceptibility to proteolysis, which produces different types and quantities of peptides in the apo- and ligand-bound forms. Conveniently, proteolytic peptides are also the currency of MS, allowing for seamless analysis. This type of global ligand binding analysis appears particularly useful for ligands that cannot readily be turned into probes such as metal ions. While metal binding, for example, may not offer much in terms of annotation, the binding of ligands such as a metabolite, on the other hand, might give substantial clues to function. In fact, in several cases, ligand identification has served as the first step for the full characterization of a PUF’s function [7072]. Structural proteomics is already beginning to be used to map the full matrix of cellular protein-metabolite interactions [73] and together with the activity-based approaches ABMP and ABPP adds another potentially powerful new systems biochemistry approach to the annotation toolkit.

Concluding remarks

PUFs make up more than one third of proteins even in model organisms, and many more in non-model organisms. Current annotation, initially, is based on inferring function by detecting sequence similarity to proteins with known functions. This approach, while indispensable, is inherently biased toward the annotation of the evolutionarily most conserved functions. New approaches for experimental and high-throughput annotation are needed to understand the vast diversity of protein function, the biology it enables, and to access its potential benefits. Several recent biochemical approaches are poised to make this step towards systems biochemistry. We anticipate that the impact of these new tools, in particular structural and activity-based proteomics and -metabolomics, will be particularly high for protein annotation. There are currently 200 million proteins in Uniprot, a number that grows by ~30 million each year. The global function-finding capabilities of ABPP and ABMP can explore this vast unknown enzyme space with high throughput and identify biochemical functions of even non-canonical and unusual enzymes. They offer opportunities to escape the limitations of current annotation approaches, to accomplish the technical leap that is needed if we want to not only sequence genes but also understand their products’ functions, and to unlock truly new biochemistry.

Outstanding questions.

  • Biochemistry has historically centered on single enzymes. How can biochemistry transform into an experimental systems-level discipline and integrate with other systems-level technologies?

  • ABPP surveys proteins, ABMP surveys metabolites. Can we simultaneously survey and link proteins and metabolites on a systems-level scale?

  • How can we generate ample probes to cover new and/or unknown enzyme space by ABPP?

  • Structural proteomics can globally identify ligand binding. How far can ligand binding be exploited for annotation purposes?

  • What unknown biology is hidden in the millions of PUFs, and how can we access it?

Highlights.

  • The number of genes encoding proteins of unknown function is growing exponentially

  • Protein annotation is a central challenge for the molecular life sciences

  • Existing computational tools for protein annotation are invaluable but also inadequate to meet the challenge of protein annotation

  • Activity-based biochemical approaches such as activity-based protein profiling and activity-based metabolite profiling offer systems-level experimental methods for determining protein activity

  • Further development and application of such systems biochemistry approaches are needed to solve the annotation problem

Acknowledgements:

CG is supported by NIH grants R01AI158159, R21AI137571, and R21AI133388. RJ is supported by the Dutch Research Council (NWO) grant OCENW.XS21.2.028. KR is supported by NIH grants P01AI143575, R25AI140472, and P01159402.

Glossary

Activity-based protein profiling (ABPP)

a chemical proteomics approach to detect enzyme function through activity-based probes, often combined with mass spectrometry.

Activity-based metabolite profiling (ABMP)

a functional metabolomic approach that measures global changes in the metabolome upon addition of recombinant enzyme or a specific enzyme perturbation.

Activity-based probe

a chemical probe that binds to enzymes that have a specific, shared catalytic mechanism and thus identifies the sub-proteome with that activity.

Functional genomics

collective term for genome-wide or systems approaches to identify gene and protein function. Examples include genetic interaction mapping, protein-protein interaction mapping, but also ABMP and ABPP.

Homology- or similarity-based annotation

the transfer of a known protein function to a gene with similar sequence but unknown function.

Limited proteolysis mass spectrometry (LiP-MS)

a proteomic approach by which the proteolytic susceptibility of a proteome upon a perturbation, often ligand binding, is measured. The perturbation induces structural changes in affected proteins that lead to differences in proteolytic patterns, which are then detected by mass spectrometry.

Metabolomics

the global analysis of metabolites and metabolite changes in a biological system.

Neural network-based deep learning methods

a machine learning approach based on algorithms that mimic information processing by neural networks. Deep learning with these multi-layered artificial neural networks can extract patterns from large datasets and predict outcomes.

Protein of unknown function (PUF)

a protein that is expressed but has no characterized function, sometimes also called hypothetical protein.

Synteny

association of genes in a common location or pattern on a chromosome, which can be indicative of linked function.

Thermal proteome profiling (TPP)

a proteomic approach that measures structural changes in a proteome upon perturbation, often ligand binding, based on the change of a protein’s melting temperature and thus change in solubility. Related to LiP-MS, TPP involves precipitation of proteins after heating to separate protein populations with differential solubility.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Yooseph S et al. (2007) The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol 5, e16. 10.1371/journal.pbio.0050016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Chen L and Vitkup D (2007) Distribution of orphan metabolic activities. Trends Biotechnol 25, 343–348. 10.1016/j.tibtech.2007.06.001 [DOI] [PubMed] [Google Scholar]
  • 3.Lobb B et al. (2020) An assessment of genome annotation coverage across the bacterial tree of life. Microb Genom 6. 10.1099/mgen.0.000341 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Sorokina M et al. (2014) Profiling the orphan enzymes. Biol Direct 9, 10. 10.1186/1745-6150-9-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ellens KW et al. (2017) Confronting the catalytic dark matter encoded by sequenced genomes. Nucleic Acids Res 45, 11495–11514. 10.1093/nar/gkx937 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bolger ME et al. (2018) Plant genome and transcriptome annotations: from misconceptions to simple solutions. Brief Bioinform 19, 437–449. 10.1093/bib/bbw135 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hanson AD et al. (2009) ‘Unknown’ proteins and ‘orphan’ enzymes: the missing half of the engineering parts list--and how to find it. Biochem J 425, 1–11. 10.1042/BJ20091328 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Niehaus TD et al. (2015) Proteins of Unknown Biochemical Function: A Persistent Problem and a Roadmap to Help Overcome It. Plant Physiol 169, 1436–1442. 10.1104/pp.15.00959 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Holliday GL et al. (2020) A strategy for large-scale comparison of evolutionary- and reaction-based classifications of enzyme function. Database (Oxford) 2020. 10.1093/database/baaa034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Jones CE et al. (2007) Estimating the annotation error rate of curated GO database sequence annotations. BMC Bioinformatics 8, 170. 10.1186/1471-2105-8-170 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Schnoes AM et al. (2009) Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput Biol 5, e1000605. 10.1371/journal.pcbi.1000605 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Casino P et al. (2010) The mechanism of signal transduction by two-component systems. Curr Opin Struct Biol 20, 763–771. 10.1016/j.sbi.2010.09.010 [DOI] [PubMed] [Google Scholar]
  • 13.Makrodimitris S et al. (2020) Automatic Gene Function Prediction in the 2020’s. Genes (Basel) 11. 10.3390/genes11111264 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Foflonker F and Blaby-Haas CE (2021) Colocality to Cofunctionality: Eukaryotic Gene Neighborhoods as a Resource for Function Discovery. Mol Biol Evol 38, 650–662. 10.1093/molbev/msaa221 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Griffin JE et al. (2011) High-resolution phenotypic profiling defines genes essential for mycobacterial growth and cholesterol catabolism. PLoS Pathog 7, e1002251. 10.1371/journal.ppat.1002251 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Black KA et al. (2021) Metabolic bifunctionality of Rv0812 couples folate and peptidoglycan biosynthesis in Mycobacterium tuberculosis. J Exp Med 218. 10.1084/jem.20191957 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.DeJesus MA et al. (2017) Statistical analysis of genetic interactions in Tn-Seq data. Nucleic Acids Res 45, e93. 10.1093/nar/gkx128 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Patel V et al. (2019) Bacillus subtilis PgcA moonlights as a phosphoglucosamine mutase in support of peptidoglycan synthesis. PLoS Genet 15, e1008434. 10.1371/journal.pgen.1008434 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.van Opijnen T et al. (2015) Genome-Wide Fitness and Genetic Interactions Determined by Tn-seq, a High-Throughput Massively Parallel Sequencing Method for Microorganisms. Curr Protoc Microbiol 36, 1E 3 1–1E 3 24. 10.1002/9780471729259.mc01e03s36 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Jinich A; Zaveri AD, M.A.; Flores-Bautista E; Smith CM Sassetti CM; Rock JM; Ehrt S; Schnappinger D; Ioerger TR; Rhee K (2021) The Mycobacterium tuberculosis transposon sequencing database (MtbTnDB): a large-scale guide to genetic conditional essentiality. 10.1101/2021.03.05.434127 [DOI] [Google Scholar]
  • 21.Qi LS et al. (2013) Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152, 1173–1183. 10.1016/j.cell.2013.02.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bosch B et al. (2021) Genome-wide gene expression tuning reveals diverse vulnerabilities of M. tuberculosis. Cell 184, 4579–4592 e4524. 10.1016/j.cell.2021.06.033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Silvis MR et al. (2021) Morphological and Transcriptional Responses to CRISPRi Knockdown of Essential Genes in Escherichia coli. mBio 12, e0256121. 10.1128/mBio.02561-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Illergard K et al. (2009) Structure is three to ten times more conserved than sequence--a study of structural response in protein cores. Proteins 77, 499–508. 10.1002/prot.22458 [DOI] [PubMed] [Google Scholar]
  • 25.SWISS-MODEL.
  • 26.Waterhouse A et al. (2018) SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res 46, W296–W303. 10.1093/nar/gky427 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Jumper J et al. (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589. 10.1038/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Baek M et al. (2021) Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876. 10.1126/science.abj8754 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Tunyasuvunakool K et al. (2021) Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596. 10.1038/s41586-021-03828-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Giera M et al. (2022) Metabolite discovery: Biochemistry’s scientific driver. Cell Metab 34, 21–34. 10.1016/j.cmet.2021.11.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Kresge N et al. (2005) Otto Fritz Meyerhof and the elucidation of the glycolytic pathway. J Biol Chem 280, e3. [PubMed] [Google Scholar]
  • 32.van Schaftingen E et al. (1982) Fructose-2,6-bisphosphatase from rat liver. Eur J Biochem 124, 143–149. 10.1111/j.1432-1033.1982.tb05917.x [DOI] [PubMed] [Google Scholar]
  • 33.Collard F et al. (2010) Molecular identification of N-acetylaspartylglutamate synthase and beta-citrylglutamate synthase. J Biol Chem 285, 29826–29833. 10.1074/jbc.M110.152629 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Wiame E et al. (2009) Molecular identification of aspartate N-acetyltransferase and its mutation in hypoacetylaspartia. Biochem J 425, 127–136. 10.1042/BJ20091024 [DOI] [PubMed] [Google Scholar]
  • 35.Kuznetsova E et al. (2005) Enzyme genomics: Application of general enzymatic screens to discover new enzymes. FEMS Microbiol Rev 29, 263–279. 10.1016/j.femsre.2004.12.006 [DOI] [PubMed] [Google Scholar]
  • 36.Fuhrer T et al. (2017) Genomewide landscape of gene-metabolome associations in Escherichia coli. Mol Syst Biol 13, 907. 10.15252/msb.20167150 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Saito N et al. (2006) Metabolomics approach for enzyme discovery. J Proteome Res 5, 1979–1987. 10.1021/pr0600576 [DOI] [PubMed] [Google Scholar]
  • 38.Prosser GA et al. (2014) Metabolomic strategies for the identification of new enzyme functions and metabolic pathways. EMBO Rep 15, 657–669. 10.15252/embr.201338283 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Larrouy-Maumus G et al. (2013) Discovery of a glycerol 3-phosphate phosphatase reveals glycerophospholipid polar head recycling in Mycobacterium tuberculosis. Proc Natl Acad Sci U S A 110, 11320–11325. 10.1073/pnas.1221597110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Jansen RS et al. (2020) Aspartate aminotransferase Rv3722c governs aspartate-dependent nitrogen metabolism in Mycobacterium tuberculosis. Nat Commun 11, 1960. 10.1038/s41467-020-15876-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Shen H et al. (2017) The Human Knockout Gene CLYBL Connects Itaconate to Vitamin B12. Cell 171, 771–782 e711. 10.1016/j.cell.2017.09.051 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Tang Z et al. (2009) Elucidation of functions of human cytochrome P450 enzymes: identification of endogenous substrates in tissue extracts using metabolomic and isotopic labeling approaches. Anal Chem 81, 3071–3078. 10.1021/ac900021a [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Tang Z et al. (2010) Human cytochrome P450 4F11: heterologous expression in bacteria, purification, and characterization of catalytic function. Arch Biochem Biophys 494, 86–93. 10.1016/j.abb.2009.11.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Xiao Y and Guengerich FP (2012) Metabolomic analysis and identification of a role for the orphan human cytochrome P450 2W1 in selective oxidation of lysophospholipids. J Lipid Res 53, 1610–1617. 10.1194/jlr.M027185 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.de Carvalho LP et al. (2010) Activity-based metabolomic profiling of enzymatic function: identification of Rv1248c as a mycobacterial 2-hydroxy-3-oxoadipate synthase. Chem Biol 17, 323–332. 10.1016/j.chembiol.2010.03.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Carere J et al. (2016) Enzyme-driven metabolomic screening: a proof-of-principle method for discovery of plant defence compounds targeted by pathogens. New Phytol 212, 770–779. 10.1111/nph.14067 [DOI] [PubMed] [Google Scholar]
  • 47.Saito N et al. (2009) Metabolite profiling reveals YihU as a novel hydroxybutyrate dehydrogenase for alternative succinic semialdehyde metabolism in Escherichia coli. J Biol Chem 284, 16442–16451. 10.1074/jbc.M109.002089 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Cader MZ et al. (2020) FAMIN Is a Multifunctional Purine Enzyme Enabling the Purine Nucleotide Cycle. Cell 180, 278–295 e223. 10.1016/j.cell.2019.12.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Bonisch F et al. (2014) Activity-based profiling of a physiologic aglycone library reveals sugar acceptor promiscuity of family 1 UDP-glucosyltransferases from grape. Plant Physiol 166, 23–39. 10.1104/pp.114.242578 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Cheng Q et al. (2010) Cyclization of a cellular dipentaenone by Streptomyces coelicolor cytochrome P450 154A1 without oxidation/reduction. J Am Chem Soc 132, 15173–15175. 10.1021/ja107801v [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Sanchez-Ponce R and Guengerich FP (2007) Untargeted analysis of mass spectrometry data for elucidation of metabolites and function of enzymes. Anal Chem 79, 3355–3362. 10.1021/ac0622781 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Liscombe DK et al. (2010) Homolog of tocopherol C methyltransferases catalyzes N methylation in anticancer alkaloid biosynthesis. Proc Natl Acad Sci U S A 107, 18793–18798. 10.1073/pnas.1009003107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Sevin DC et al. (2017) Nontargeted in vitro metabolomics for high-throughput identification of novel enzymes in Escherichia coli. Nat Methods 14, 187–194. 10.1038/nmeth.4103 [DOI] [PubMed] [Google Scholar]
  • 54.Deng Z et al. (2017) Investigating the Relationship between the Substrates’ Consumption and Their Abundances in a Complex Enzymatic System. Anal Chem 89, 10644–10648. 10.1021/acs.analchem.7b03616 [DOI] [PubMed] [Google Scholar]
  • 55.Cravatt BF et al. (2008) Activity-based protein profiling: from enzyme chemistry to proteomic chemistry. Annu Rev Biochem 77, 383–414. 10.1146/annurev.biochem.75.101304.124125 [DOI] [PubMed] [Google Scholar]
  • 56.Patricelli MP et al. (2007) Functional interrogation of the kinome using nucleotide acyl phosphates. Biochemistry 46, 350–358. 10.1021/bi062142x [DOI] [PubMed] [Google Scholar]
  • 57.Xiao Y et al. (2013) Proteome-wide discovery and characterizations of nucleotide-binding proteins with affinity-labeled chemical probes. Anal Chem 85, 3198–3206. 10.1021/ac303383c [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Barglow KT and Cravatt BF (2007) Activity-based protein profiling for the functional annotation of enzymes. Nat Methods 4, 822–827. 10.1038/nmeth1092 [DOI] [PubMed] [Google Scholar]
  • 59.Simon GM and Cravatt BF (2010) Activity-based proteomics of enzyme superfamilies: serine hydrolases as a case study. J Biol Chem 285, 11051–11055. 10.1074/jbc.R109.097600 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Ansong C et al. (2013) Identification of widespread adenosine nucleotide binding in Mycobacterium tuberculosis. Chem Biol 20, 123–133. 10.1016/j.chembiol.2012.11.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Ortega C et al. (2018) A Global Survey of ATPase Activity in Plasmodium falciparum Asexual Blood Stages and Gametocytes. Mol Cell Proteomics 17, 111–120. 10.1074/mcp.RA117.000088 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Parsons WH et al. (2016) AIG1 and ADTRP are atypical integral membrane hydrolases that degrade bioactive FAHFAs. Nat Chem Biol 12, 367–372. 10.1038/nchembio.2051 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Weerapana E et al. (2010) Quantitative reactivity profiling predicts functional cysteines in proteomes. Nature 468, 790–795. 10.1038/nature09472 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Hacker SM et al. (2017) Global profiling of lysine reactivity and ligandability in the human proteome. Nat Chem 9, 1181–1190. 10.1038/nchem.2826 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Sieber SA et al. (2006) Proteomic profiling of metalloprotease activities with cocktails of active-site probes. Nat Chem Biol 2, 274–281. 10.1038/nchembio781 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Ortega C et al. (2016) Systematic Survey of Serine Hydrolase Activity in Mycobacterium tuberculosis Defines Changes Associated with Persistence. Cell Chem Biol 23, 290–298. 10.1016/j.chembiol.2016.01.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Savitski MM et al. (2014) Tracking cancer drugs in living cells by thermal profiling of the proteome. Science 346, 1255784. 10.1126/science.1255784 [DOI] [PubMed] [Google Scholar]
  • 68.Feng Y et al. (2014) Global analysis of protein structural changes in complex proteomes. Nat Biotechnol 32, 1036–1044. 10.1038/nbt.2999 [DOI] [PubMed] [Google Scholar]
  • 69.Leuenberger P et al. (2017) Cell-wide analysis of protein thermal unfolding reveals determinants of thermostability. Science 355. 10.1126/science.aai7825 [DOI] [PubMed] [Google Scholar]
  • 70.Hermann JC et al. (2007) Structure-based activity prediction for an enzyme of unknown function. Nature 448, 775–779. 10.1038/nature05981 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Song L et al. (2007) Prediction and assignment of function for a divergent N-succinyl amino acid racemase. Nat Chem Biol 3, 486–491. 10.1038/nchembio.2007.11 [DOI] [PubMed] [Google Scholar]
  • 72.Zhao S et al. (2013) Discovery of new enzymes and metabolic pathways by using structure and genome context. Nature 502, 698–702. 10.1038/nature12576 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Piazza I et al. (2018) A Map of Protein-Metabolite Interactions Reveals Principles of Chemical Communication. Cell 172, 358–372 e323. 10.1016/j.cell.2017.12.006 [DOI] [PubMed] [Google Scholar]

RESOURCES