Abstract
Protein modifications modulate nearly every aspect of cell biology in organisms ranging from Archaea to Eukaryotes. The earliest evidence of covalent protein modifications was found in the early 20th century by studying the amino acid composition of proteins by chemical hydrolysis. These discoveries challenged what defined a canonical amino acid. The advent and rapid adoption of mass spectrometry-based proteomics in the latter part of the 20th century enabled a veritable explosion in the number of known protein modifications, with over 500 discrete modifications counted today. Now, new computational tools in data science, machine learning, and artificial intelligence are poised to allow researchers to make significant progress discovering new protein modifications and determining their function. In this review, we take an opportunity to revisit the historical discovery of key post-translational modifications, quantify the current landscape of covalent protein adducts, and assess the role that new computational tools will play in the future of this field.
Keywords: Post-translational modifications, Protein acylation, Protein modifications, Amino acid, Metabolism
eTOC blurb
MOLECULAR-CELL-D-20-02152R1
Covalent protein modifications discovered in the early 20th century challenged what defined a canonical amino acid. Mass spectrometry-based proteomics has increased the number of modifications to over 500 today. We revisit the history of key modifications, quantify the current landscape, and assess how new computational tools will shape this field.
Introduction
Post-translational modifications (PTMs) are covalent chemical adducts to the primary structure of proteins. These modifications most often occur on the side-chains or n-terminus of a protein, thereby changing its fundamental chemical composition. Protein modifications allow a level of chemical diversity far greater than can be achieved using canonical amino acids alone (Figure 1). PTMs are also reversible, which enables spatial and temporal control of protein chemistry and function.
Figure 1.

Twenty proteinogenic amino acid structures depicted using standard Corey–Pauling–Koltun coloring convention: black, carbon; red, oxygen; blue, nitrogen, yellow, sulfur; thin lines, single bond; medium lines, double bond; hydrogens omitted for clarity.
Protein modifications impart broad, far-reaching functional consequences on proteins. Some modifications are highly abundant, highly specific, or incredibly potent, whereas other modifications have not-yet-known physiological consequences or were just recently found. Indeed, the number of protein modifications continues to increase as new protein modifications are discovered. The growing number of protein modifications introduces a new level of complexity in our understanding of this regulatory system. In this review, we first revisit the historical discovery of key post-translational modifications. Next, we use tools in data science to systematically quantify the past and current landscape of covalent protein adducts. Finally, we look to the future and assess the role that new data science and machine learning tools will play in this field.
Discovering the first protein modification
At the turn of the 20th century, the scientific community was particularly interested in characterizing the amino acid composition of proteins. Scientists used chemical methods, mainly acid hydrolysis, to identify the chemical make-up of single amino acids, as well as the distribution of amino acids by percent abundance in proteins. Some proteins contained unusually high levels of phosphorous, amongst the expected carbon, nitrogen, and oxygen. Through pain-staking analytical chemistry, Levene and Alsberg ultimately discovered phosphorylation was present on the amino acid serine in the protein vitellin/vitellenic acid in 1906 (Levene and Alsberg, 1906) (Figure 2).
Figure 2.

Timeline of protein modification discovery. Cumulative number of protein modifications known at each year. For methods of calculation and code, see www.github.com/matthewhirschey.
The early discovery of phosphorylation was particularly notable given that some of the 20 canonical amino acids had not yet been fully characterized. Methionine and threonine wouldn’t be fully characterized until 1922 and 1935, respectively [recounted in (Bradford Vickery, 1972; Vickery and Schmidt, 1931)]. Initially, phosphorylated serine was viewed as a novel amino acid rather than a modification. Studies to identify the location of the phosphoryl group disagreed as to whether it was present on the side-chain of serine or some other location (Posternak, 1927; Rimington, 1927). In 1932, Fritz Lipmann, then a fellow in Levene’s lab at Rockefeller, showed the phospho moiety was attached to serine [recounted in (Lipmann, 1983)]. Lipmann first demonstrated specificity of the modification on discrete amino acids, paving the way for future studies expanding the scope of known PTMs.
The ensuing two decades saw little advancement in the discovery of new PTMs (Figure 2): the structure and composition of hydroxylysine was determined in 1951 (Bergström and Lindstedt, 1951), thirty years after the report of an ‘unidentified base’ from a preparation of gelatin. Discovery and characterization of novel amino acid PTMs were slow and technically difficult. Elemental analysis and NMR spectroscopy were ill-suited to analyze most PTMs, because generally PTMs are stoichiometrically rare, elementally similar to proteins, and frequently labile. At the time, the concept of a reversible PTM was not well developed. The idea that phospho-serine was a distinct amino acid, rather than a reversible modification of canonical serine persisted.
Not until the mid-1950s did protein PTMs become a distinct idea in scientific literature. A major breakthrough came in 1956 with the revelation that covalent acyl-serine modifications were reversible (Dixon et al., 1956). Later, a similar reversibility was observed for serine phosphorylation (Rabinowitz and Lipmann, 1960). These findings provided support for the idea of a “high energy bond”, one that could release energy upon cleavage, which we now know seeded ideas explaining many aspects of cellular metabolism (Lipmann, 1983).
The 1970’s saw an explosion in PTM discovery with approximately 40% of currently known PTMs having been discovered by 1980 (Figure 2). At the time, modifications were viewed as a means of generating increased chemical diversity needed for life, as opposed to a key mechanism of protein regulation. Nonetheless, many of the PTMs identified during these early decades of PTM discovery would later come to be appreciated as absolutely integral to regulation of protein function.
Discovering the landscape
The scientific community continued to debate whether PTMs were modifications to canonical amino acids or constituted bona fide non-essential amino acids. As late as 1977, a seminal review discussing the burgeoning field of protein PTMs used phrases describing “20 genetically encoded amino acids, but over 140 amino acids have been identified” (Uy and Wold, 1977). This early attempt to comprehensively catalog the chemical diversity of amino acids counted 140 discrete combinations of modified amino acids by reviewing the literature, consulting colleagues, and quite literally counting (Uy and Wold, 1977). Lysine catalogued the greatest number of discrete modifications, while leucine, isoleucine, and tryptophan were conspicuously absent from the list. Some modifications were rationalized to have ‘fairly obvious explanations’, including protein derivatives in which a coenzyme is covalently modified to its cognate enzyme active site. Other modifications, such as N-alpha-acetylation or many methylation and halogenation modifications, had no known functions at the time. While this catalog included a number of species (such as tRNAs) which are not considered PTMs today, it represented one of the first attempts to precisely define PTMs as a class.
In the late 1970s, the concept of PTMs encompassed a far broader range of protein modifications than today’s understanding. For example, N-terminal removal of signaling peptides were considered just as much a PTM as protein phosphorylation. As late as 1981, disulfide bond formation was viewed as a post-translational modification likely catalyzed by unknown agents in the cell (Wold, 1981). However, transient cysteine modifications believed to be involved in the catalytic mechanism of proteins were not considered PTMs (Wold, 1981). The field was discovering, cataloguing, and grappling with the definition of these new modifications.
Discovering mass-spectrometry proteomics
While almost half of all currently known PTMs had been discovered by 1980 (Figure 2), rapid progress in the field awaited the advent of protein mass spectrometry (MS). Up until this time, MS was primarily used in organic chemistry to identify synthetic intermediates and end-product detection, as well as natural compound identification. But as the understanding of protein chemistry advanced, analysis of proteins and modifications were primed for discovery by MS. The field of mass spectrometry proteomics was revolutionized by the development of Electrospray Ionization (ESI). This technique, pioneered by Malcom Dole in 1968, ‘gently’ ionizes macromolecules for analysis by mass spectrometry. Further development of ESI-MS by John Fenn and Koichi Tanaka, awardees of the 2002 Nobel Prize in Chemistry, permitted sensitive detection of a wide range of PTMs, and represented one of the first applications of MS technology to protein biology (Biemann, 1992).
However, the rapid dominance of this new technique in PTM research initially failed to generate any noticeable uptick in the rate of novel PTM discovery. This was in part because most MS-based PTM identification protocols required enrichment of the PTM prior to sample analysis, which in turn requires knowledge of the modification being enriched. Additionally, processing MS data required selecting target masses that corresponded to the specific PTM(s) of interest. These two technical requirements prevented rapid detection of uncharacterized PTMs. Indeed, much of the signal generated from MS remains unmapped to known peptides or peptide-PTM adducts, which has been labeled in recent years the “dark proteome”.
But by the 1990s, ESI-MS lead to the identification of many new PTMs (Figure 2). Technological advancements in sensitivity allowed MS-based studies to quantify both relative PTM levels and absolute stoichiometries. As the catalogue of known PTMs steadily increased, this newfound resolution also shed light on the role of many PTMs in regulating protein activity.
Discovering the function of protein modifications
Protein modifications influence proteins in every imaginable way: activation, repression, translocation, degradation, etc. As such, the process of uncovering the biological role of PTMs has proven to be more difficult than their identification. Early studies on PTMs only allowed speculation as to the possible biological implications of the modifications described, and in many cases made no attempt to assign function at all. By the 1970s, few PTMs had any clearly documented biological role. Protein phosphorylation and acetylation are notable exceptions, which each have an important arc of discovery of the function of these two important PTMs, and have provided significant insight into human health and disease.
The earliest studies to identify cellular functions for phosphorylation and acetylation were published within a decade of one another in mid-20th century. In both cases, these discoveries required a similar set of circumstances that built on what was known and available in the field: the investigators needed a model experimental system, a way to assay the PTM of interest, and a cellular outcome to measure. Sutherland, and Fischer and Krebs initially uncovered a role for phosphorylation in interconverting phosphorylase a and b (Krebs and Fischer, 1956; Sutherland and Wosilait, 1955), whereas Allfrey, Faulkner, and Mirsky were studying lysine/arginine acetylation of histones (Allfrey et al., 1964). Sutherland et al. enriched phosphorylase from liver tissue samples and measured radioactive phosphate, concluding that phosphorylase could ‘gain’ radioactive phosphate, which stimulated enzyme activity, and that this was reversible. They observed that phosphorylation was stimulated by epinephrine and inhibited by an unidentified ‘inactivating enzyme’, also from liver (Sutherland and Wosilait, 1955). To study early histone modifications, Allfrey et al. worked with calf thymus nuclei, assaying total histone 14C-acetate and 14C-methionine accumulation, from which lysine/arginine rich histones could also be isolated using carboxymethylcellulose chromatography (Allfrey et al., 1964). They showed that these arginine/lysine rich histones could be rapidly acetylated independent of protein synthesis, and that acetylated histones inhibited RNA synthesis in a dose dependent manner. Here, the field began to glimpse the complex and essential roles for PTMs in enzymatic and non-enzymatic protein activity, the regulation of metabolism, chromatin dynamics, and how they can integrate systemic endocrine signals essential in physiology.
As mass spectrometry became the workhorse for identifying and mapping protein modifications, proteome-wide occupancy has been determined for phosphorylation (Ochoa et al., 2020) and lysine acetylation (Choudhary et al., 2009). Identifying the complete landscape of these two important PTMs is a key pre-requisite step to determine the functional consequences of protein modification across the proteome. Advances in genome engineering, the discovery and characterization of protein-modifying enzymes, and sophisticated computational approaches have become mainstream tools of scientists attempting to determine PTM function, and are paving the way to a better understanding of the role of PTMs in human health and disease.
The human proteome has recently been found to contain over 119,000 phosphorylation sites (Ochoa et al., 2020). One keystone example of how phosphorylation can alter protein function and impact health came from the study of the tau protein. Disorders collectively known as Tauopathies (Morris et al., 2011), the most studied of which is Alzheimer disease, are centered around this intrinsically disordered, microtubule-binding protein that associates with the cytoskeleton in neurons and glia. The intracellular accumulation of tau was described early in the 20th century and is a shared pathologic feature. One of the early hallmark features of tau and other proteins contained in these aggregates was a high degree of phosphorylation, which preceded neurofibrillary tangle formation and cognitive decline (Bancher et al., 1989). Moreover, modified versions of tau accumulate with 26S proteasomal subunits and post-translational modifying enzymes in neurofibrillary tangles, thus suggesting that the cell’s inability to degrade tau is key to understanding the neurotoxicity that occurs in certain neurodegenerative diseases. Mouse models of Alzheimer disease, developed through genetic modification of amyloid precursor protein resulting in accumulation of the pathogenic amyloid Aβ1–42 (Mucke et al., 2000), have been used to deeply interrogate endogenous and disease-related tau modifications in vivo (Morris et al., 2015). This has resulted in an expansion of the known tau modifications by one-third. Identifying these modifications has enabled the systematic investigation into how specific PTMs affect tau function, degradation, and aggregation, and have paved the way for clinical trials targeting tau kinase inhibition (Soeda and Takashima, 2020). Moreover, because tau can be modified with a myriad of PTMs at up to 35% of its amino acid residues, with modifications including small chemical groups (phosphorylation, acetylation, methylation), proteins (ubiquitination, SUMOylation) and carbohydrates (O-GlcNAcylation, N-glycosylation, glycation) [reviewed in (Alquezar et al., 2020)], studying the diversity of tau modifications has shed light on fundamental complexities of the effects PTMs can have on proteins, such as cooperative and competitive effects from multiple modifications (Luo et al., 2014).
The discovery that histones were modifiable and could alter chromatin dynamics have enabled advances in the field of cancer biology. Widespread epigenetic modifications can regulate almost all of the hallmarks of cancer (Hanahan and Weinberg, 2011). Specifically, targeting the enzymes responsible for histone lysine deacetylation (HDACs or KDACs) has led to new FDA approved chemotherapeutics (Li et al., 2020), and the investigation of the functional effects of histone modifications have provided mechanistic insight into their cellular effects. For example, hyperacetylation of histones H3 and H4 is required for expression of the cell cycle inhibitor p21, and this protein modification was found to be reversed in cancer cells with high levels of HDAC expression (Gui et al., 2004). Treatment with the HDAC inhibitor suberoylanilide hydroxamic acid (SAHA, or Vorinostat) reverses the loss of H3/4 lysine acetylation, reducing Myc protomer occupancy and increases p21 expression, which leads to reduced cancer cell proliferation. Vorinostat (amongst other HDAC inhibitors) has been FDA-approved as an anticancer agent for cutaneous T cell lymphoma since 2006, with many more HDAC inhibitors being evaluated in clinical trials for a variety of cancer types (reviewed by Li et al. (Li et al., 2020)).
In addition to HDAC activity, acetyl-lysine homeostasis is also regulated by lysine acetyltransferases (KATs), and inhibitors of these enzymes are also under active investigation as anti-cancer agents (Baell et al., 2018). That both lysine acetylation and lysine deacetylation can be targeted for cancer therapy illustrates how complex the functional consequences of PTMs can be in cells. Indeed, proteome-wide MS-based investigation has revealed that the lysine acetylome is associated with a breadth of cellular processes, including DNA damage repair, cell cycling, ribosomal function, and actin cytoskeleton remodeling (Choudhary et al., 2009).
Owing to their multitude of effects on cellular behavior, it is not surprising that protein modifications can contribute to the development and progression of human diseases. The study of disease-associated PTMs or protein modifying enzymes has contributed to new therapies and provided fundamental insight into PTM biology; however, in many cases our understanding of the role of PTMs in human disease is incomplete.
Discovering today’s PTM landscape
In the ensuing 40 years since the 1977 attempt to quantify the landscape, a veritable explosion in research on protein modifications has increased our appreciation of the richness and complexity of PTMs (Figure 2). Thus, we set out to refresh our understanding and systematically quantify of the number and nature of discrete protein modifications. Using new tools in data science, we queried the UniProt knowledge base (UniProt Consortium, 2018), a database containing over 60 million protein sequences and associated annotations, which includes a ‘controlled vocabulary’ of post-translational protein modifications. By quantifying all protein modifications, we now count more than 500 discrete modifications (Figure 3A, Supplemental Table 1). While all 20 amino acids are represented in this analysis, serine, cysteine, and lysine contain the greatest diversity in annotated modifications. Remarkably, this growing list of protein modifications ranges from large changes in mass (nearly 900 Da) to negative changes in mass, coincident with a chemical loss (Figure 3B). Below, we highlight some of the most and least commonly modified amino acids.
Figure 3.

Current landscape of protein modifications. A. Approximately 500 protein modifications have been described across all 20 protein amino acids; colors represent most frequent modifications by ‘keyword’; protein crosslinks were removed from analysis. Data accessed from www.uniprot.org (UniProt Consortium, 2018). For methods of calculation and code, see www.github.com/hirscheylab. B. Distribution of masses of modifications appended to proteins. Line represents frequency of masses, scaled to 1; rug-plot hashes represent individual masses. Protein crosslinks were removed from analysis. For methods of calculation and code, see www.github.com/matthewhirschey.
Serine
While most often studied for its modification by phosphorylation, serine is also heavily modified by other moieties. Our analysis counted 70 discrete modifications (Figure 3A). We find 13 serine modifications contain phosphate: one is phosphate alone, and twelve are modifications containing both carbon and phosphate. Additionally, a complex range of carbohydrate modifications are described on serine. For example, serine holds O-linked glucose, galactose, glucosamine, and other complex branching carbohydrates. In fact, masses for these chemical moieties are often not reported, given their chemical diversity and the broad range of metabolites from which they are derived. Thus, this analysis might underestimate the true number of serine-based protein modifications. Given similar chemistries, the amino acids threonine and tyrosine are not far behind serine occupying position 4 and 6, respectively, on the ranked list of total known modifications with considerable overlap in identified modifications (Figure 3A).
Cysteine
The electron-rich thiolate of cysteine is the most powerful nucleophile found in an amino acid (Walsh, 2005), making the chemistry amenable to several modifications. Among the most-studied cysteine modifications are: S-nitrosylation, S-glutathionylation, S-palmitoylation, S-farnesylation. We count that cysteines have 57 distinct modifications (Figure 3A), ranging from redox losses/gains to long, hydrophobic lipid modifications. Using a chemical biology strategy and an electrophilic probe, over 700 reactive cysteines have been mapped onto proteins with known drug targets, as well as onto ‘undruggable’ proteins, including transcription factors, adaptor/scaffolding proteins, and uncharacterized proteins (Backus et al., 2016; Weerapana et al., 2010). These findings support the idea that cysteines can be readily targeted for endogenous biological modifications, including the modifications that are cataloged here and beyond.
Lysine
The last of the three most heavily modified amino acids is lysine, with 47 unique chemical moieties documented (Figure 3A). Lysine acetylation is among the most studied post-translational modifications, with acetylation of histone lysines serving as a canonical example of a protein modification with an established functional impact, described above (Allfrey et al., 1964). Acetylation of histone lysines is widely accepted as a primary mechanism to control access to chromatin to regulate gene expression (for a review, see (Verdin and Ott, 2015)). Enabled by mass spectrometry, landmark papers showed widespread protein lysine acetylation, including mitochondrial proteins (Choudhary et al., 2009; Kim et al., 2006). Following these discoveries, mass spectrometry-based studies continued to find hundreds of proteins were acetylated (Wang et al., 2010; Zhao et al., 2010). As another example of lysine modification, histone methylation was first observed in the early 1960s (Murray, 1964). Subsequent work has demonstrated that histone and protein methylation on lysines, including mono-, di-, and tri-methyllysine, is a dynamic histone mark in human health and disease (Greer and Shi, 2012).
Other Amino Acids
A notable difference in our current analysis of protein modifications from the original compendium is the breadth and depth of chemical moieties on amino acids. All 20 proteogenic amino acids are represented with annotated modifications. Some amino acid modifications are quite rare. Of the five modifications recorded for phenylalanine – the least modified amino acid in this analysis – only one modification called 3-hydroxyphenylalanine is not linked to the N-terminus. This exotic modification is rarely found in published databases and has no known biological functions. This exemplifies the on-going challenge in understanding the relationship between protein modifications and function. While mass spectrometry is becoming more sophisticated and more sensitive for measuring protein modifications across a proteome, our understanding of the functional consequences of these modifications generally requires a one-at-a-time interrogation approach.
Discovering new PTMs
Most new PTM studies begin with prediction: stories of discovery often begin with the idea that similar chemical features of amino acid side chains support new protein modifications; others have described new enzymes with PTM transferase activities; finally, some studies have described the chemical reactivity of metabolites, which leads to spurious modification of proteins. One example is a new class of cysteine modifications that was discovered almost 15 years ago originally called S-(2-succinyl)cysteine (2SC) (Alderson et al., 2006), but now more commonly referred to as protein succination. This modification occurs by a Michael addition reaction between cysteine thiol groups and the Kreb’s Cycle metabolite fumarate. A similar reaction occurs between cysteine and the immunomodulatory metabolite itaconate, which like fumarate, is a carboxylic acid with an α,β-unsaturated double bond.
As another example, we recently discovered that a specific subset of carboxyl acyl-CoAs had increased reactivity compared to straight-chain acyl-CoAs (Wagner et al., 2017). Negatively charged dicarboxylic acyl-CoAs with a backbone of 4–5 saturated carbons undergo intramolecular catalysis (i.e. self-hydrolysis) forming an anhydride and free CoA. This class of metabolites includes succinyl-CoA, glutaryl-CoA, and CoAs with similar structures, such as HMG-CoA driving protein HMGylation (Wagner et al., 2017).
In addition to reactive acyl-CoA species, reactive acyl-phosphate species also modify proteins. 1,3-Bisphosphoglycerate (1,3-BPG) is a reactive acyl-phosphate metabolite generated in the glycolytic pathway, that non-enzymatically reacts with several proteins in this pathway and leads to 3-phosphoglyceryl-lysine (pgK) modifications (Moellering and Cravatt, 2013). Another recent study described activated amino acids on aminoacyl tRNA synthetases as reactive aminoacyl-adenylates that can induce protein modifications (He et al., 2017). Remarkably, this study found widespread lysine modifications by all twenty proteinogenic aminoacyl tRNAs. The authors concluded that the acyl-phosphate linkage of activated amino acids could allow spurious protein aminoacylation.
This prediction approach has led to the discovery and validation of several new PTMs. Specifically, assessing chemical similarity has strong potential for continued discovery. Recent studies have demonstrated acyltransferase enzyme promiscuity that could catalyze the addition of several acyl-groups onto proteins (Han et al., 2018; Huang et al., 2018a; Huang et al., 2018b). As such, we quantified known acyl-CoA metabolites from the Human Metabolome Data Base [HMDB, (Wishart et al., 2018)] using the rationale that acyl-CoAs are chemically similar to known enzymatic- and/or non-enzymatic-derived protein modifications. Our analysis counted 361 acyl-CoAs spanning 234 unique molecular weights (Figure 4, Supplemental Table 2); the difference being due to stereoisomers that have the same chemical structure but different spatial arrangement of their atoms. These acyl-CoA species have various chemical properties and are generally divided into different classes based on these properties (Figure 4, legend). By comparing masses of all acyl-CoA species found in the human metabolome database to masses of known lysine modifications, we find less than 10% overlap (27/361, Figure 5A), suggesting a wide range of new protein modifications from acyl-CoA species could be found on proteins.
Figure 4.

Known acyl-CoA species. Chain length and chemical properties of all known acyl-CoA metabolites in humans. Data accessed from www.hmdb.ca (Wishart et al., 2018). For methods of calculation and code, see www.github.com/matthewhirschey.
Figure 5.

Predicted protein modifications. A. Acyl-CoA species from the human metabolome database were compared to known masses matching lysine modifications; B. The human metabolome database was queried for known reactive acyl-phosphates, thioesters, or aldehydes, and plotted for number of possible carbons appended to proteins. Data accessed from www.hmdb.ca (Wishart et al., 2018). For methods of calculation and code, see www.github.com/matthewhirschey.
Extending these analyses beyond acyl-CoA species, a survey of the human metabolome database, described above, reveals over 600 metabolites with high reactivity, herein called reactive carbon species (RACS, Figure 5B, Supplemental Table 3). While not all of these have been associated with protein modifications, studies focused on these RACS may find them attached to proteins. In fact, this predictive strategy was previously used to identify lysine glutarylation as a bona fide protein modification (Tan et al., 2014).
While single-modification-at-a-time approaches are required to validate PTM predictions, one exception to this generality is a recent study that sought to characterize protein modifications in prokaryotes en masse using a multiscale workflow (Brunk et al., 2018). This analysis first identified regulatory nodes in an E. coli metabolic network, which then implicated candidates that could be regulated by protein modifications. To characterize the cellular impact of modified sites, they used multiplexed automated genome engineering (MAGE) to mutate a given amino acid to mimic a constitutively modified or unmodified amino acid. A caveat of this approach is that amino acid mutations mimicking amino acid modifications cannot faithfully recapitulate a true modification. However, they successfully assessed global changes in cellular fitness in a pooled screen, and performed follow-up studies on three prioritized candidate E. coli proteins. Technological developments like those in this study might help disentangle the complexity inherent to studying 500 discrete modifications across 20 amino acids.
Discovering the future
Despite their complexity, the essential roles that protein PTMs play in cellular behavior and human disease make their ongoing study a priority in biomedical science. The widespread use of MS-proteomics research beginning in the 1980s produced great interest and knowledge in the PTM field. Although the contributions MS has made to PTM research is difficult to overstate, the dynamic and labile nature of PTMs continue to present unique challenges to scientists. Promising new advancements in peptide mapping algorithms are tackling the difficult task of completing the PTM landscape. These efforts have set the stage for another explosion in our understanding of PTM biology.
Improvements in mass spectrometry-based proteomics research – including the discovery of PTMs – has been tightly linked to the use of computational tools (Na and Paek, 2015; Schwammle et al., 2015). Less than a decade after MS-based proteomics began, Sequest was developed to automate protein identification against a reference protein database (Eng et al., 1994). Automated peptide assignment algorithms in Sequest and later Mascot (Perkins et al., 1999) enabled a rapid expansion in MS-based proteomics, and posed an attractive avenue for the identification of PTMs. These algorithms could be modified to detect simple, known PTMs by incorporating (observed – expected) mass shifts into a modified reference proteome. Despite this, a substantial proportion of peptide mass spectra remained suspiciously unassigned. To investigate this ‘dark matter’ of the proteome, search algorithms were set to permit wider tolerance in precursor ions, thought to contain the modified peptides, while still maintaining highly specific fragment ion mass settings (Han et al., 2011). The potential of this approach to discover the PTM landscape was realized in 2015, when Sequest search space was expanded to interrogate an ‘open’ precursor ion spectrum of ± 500 Da, which permitted searching for peptides with mass modifications corresponding to >90% of known PTMs (Figure 3B) (Chick et al., 2015). Within this unexplored region of the proteome, an additional 184,000 peaks corresponding to previously undiscovered modified peptides were found. These modifications encompassed both chemical (post-isolation) and biologic changes, including rare but significant biologic modifications such as glycerol phosphorylethanolamine (GPE)-modified Elongation Factor 1a2. In effect, the novel application of existing computational tools to the dark proteome shifted the fundamental approach in PTM mapping.
While open search revolutionized PTM discovery and mapping from tandem MS data, several obstacles remain that have prevented the field from truly deciphering the PTM landscape. Most notably was the problem of ‘search space’, an obstacle in mapping known PTMs to peptides in complex protein mixtures. Whereas previously a relatively simple modification, such as phosphorylation of serine residues might be easily investigated by Ser-containing peptides with and without added phosphoryl masses; mapping multiple PTMs simultaneously results in an exponential increase in search space for modified peptides with all combinations of mass shifts. This has been a major barrier by reducing search sensitivity, elevating the false discovery rate at which modified peptides are identified, and greatly prolonging the time required to analyze a given dataset.
A second major barrier is discovering previously unannotated modifications. PTM mapping algorithms assigned modifications on the basis of a reference database – typically UniMod – which catalogs discrete masses of modifications to be queried against each identified peptide using the ‘controlled vocabulary’ described above. Because of this, most algorithms do not automatically detect previously unannotated modifications, such as some of the more recently discovered acyl modifications. Instead, these need to be manually annotated with exact masses and searched. Moreover, two or more different types of modifications on a single peptide could fail to identify an appropriate unmodified peptide or erroneously match a larger, single modification.
Computational tools are now being developed that are able to address the above limitations while still maximizing the search space of open approaches. A recent example of this is called TagGraph (Devabhaktuni et al., 2019). To overcome the obstacles of contemporary algorithms described above, TagGraph was developed with a number of unique characteristics, which expanded the known PTM landscape to nearly 40,000 protein modifications across the human proteome (Kim et al., 2014). First, it uses a combined de novo and string-based peptide matching approach, finding short peptide sequences de novo (‘substrings’) that exactly match an unmodified protein sequence, thereafter searching the remaining peptide to examine potential modifications on or near the index substring. This approach reduces the ‘peptide pool’ that is initially considered by the algorithm, thus allowing TagGraph to deeply interrogate 25 million protein mass spectra across 30 human tissues in timescales much faster than contemporary search algorithms were capable. The 40,000 PTMs it described across the human proteome modified just over 1 million uniquely identified peptides – a 3-fold expansion of total peptide counts and a 10-fold expansion of modified peptides identified from the same human proteome dataset.
To mitigate the loss of sensitivity that is typical in de novo search platforms, TagGraph abandoned the target-decoy false discovery rate (FDR) model of statistical analysis, instead applying a validated machine learning approach called Bayesian hierarchical expectation maximum analysis that used 14 characteristics of the peptide in question to determine FDR. Because of this, TagGraph’s de novo approach was able to handle a wider search space, and in doing so was able discover previously unidentified peptide modifications and greatly expand the number of modifications assigned to highly modified proteins. For example, peptide modifications across the five major histone proteins increased 2-fold, and prolyl hydroxylation 10-fold, from previous analyses. By continuing to overcome the computational barriers facing the analysis of large sets of peptide mass spectra, TagGraph and other tools like it have provided a major step forward in our knowledge of the PTM landscape.
The future of PTM research will consider not only what programs like TagGraph enable for proteomics, but also which aspects of MS instrumentation helped enable such success. For example, the performance of TagGraph’s statistical analysis rapidly decayed when searching low-resolution mass spectra (Shishkova et al., 2016). This limitation highlights how complementary improvements in instrumentation are also necessary for further advancements (Devabhaktuni et al., 2019)., Indeed, improvements in chromatographic separation (Shishkova et al., 2016) and the development of high performance mass spectrometers (Aebersold and Mann, 2003) have provided the very high resolution spectra that are necessary for TagGraph’s accuracy, illustrating a key link between technology and discovery.
Discovering questions
Over the last 50 years, the field of PTM proteomics has advanced at an unimaginable pace. Tandem, high-resolution mass spectrometry is now commonplace, and provides spectra from non-digested protein mixtures after separation with multidimensional chromatography. TagGraph and other peptide identification algorithms (Devabhaktuni et al., 2019; Na et al., 2012) can now reliably annotate peptides bearing multiple PTMs without sample enrichment, and are capable of quantifying PTM stoichiometry from complex protein mixtures. Thus, the PTM field is well-poised for its next leap forward, which centers around two major questions: Has the PTM landscape been completely illuminated? And, perhaps more importantly: How will the functions of individual PTMs be determined across such a vast landscape?
To what degree has the PTM landscape been illuminated? When 185,000 modified peptides were discovered using an open Sequest search in HEK293 cells, the authors noted post-hoc that closed searches for specific PTMs within the modified spectra returned nearly twice as many hits (Chick et al., 2015). This suggested that a highly sensitive search of the dark proteome may yield at least twice as many modified peptides. Reanalyzing this and other spectra with TagGraph could provide interesting information. However, even if modern algorithms are capable of identifying nearly all of the modified peptides across the spectra, we may be far from determining either the identity or the biological significance of the modification. For example, in the human proteome TagGraph identified over 1,700 individual mass shifts (a hit defined as having >=20 spectral counts), with sizes ranging from −148 to +999.4 Da, that were not attributable to known modifications. Although this is a small fraction of the total modified spectra identified [5.5%; Supplementary Data Set 9 in (Devabhaktuni et al., 2019)], it illustrates how difficult it may be to fully understand the chemical composition of the modified human proteome.
How will scientists determine the important functions of individual PTMs in such a vast landscape? Although the PTM landscape may never be unambiguously complete, the field is also poised to make important steps forward in our understanding of PTM function. As discussed in earlier sections, PTMs influence nearly every aspect of cellular function; ranging from short-lived modifications in enzyme activity to heritable changes in the epigenome. Whereas PTMs of interest can be closely studied in a variety of ways at the bench, data science approaches are increasingly important for attempting to nominate PTMs of interest or begin to understand specific biologic functions for a newly described PTM. In an analysis of the human proteome, tissue-specific modifications were identified, including histone H4 R56 dimethylation in fetal tissues (Devabhaktuni et al., 2019). This finding suggests a unique function for H4 R56 in development. In another example, the same study correlated PTM abundance with GO Biological Process and Cellular Compartment terms on the basis of modifications of proteins contained within GO categories. This approach confirmed that post-translational arginine methylation was enriched on proteins involved with RNA splicing, and that lysine modifications were abundant on proteins contained within GO categories of chromosome organization.
Data science and machine learning (ML) is being more heavily used in PTM analysis, including predicting functional PTMs from primary protein sequences and more complex protein characteristics (Blom et al., 2004; He et al., 2018). For example, primary sequence PTM data was combined with 3D protein structural information in a machine learning neural network, called SAPH-ire NN, to predict the clustering and interaction of functional protein PTMs (Torres et al., 2016). Because PTMs exert such widespread effects, machine learning algorithms may also be useful in discovering patterns within proteomic, metabolomic, epigenomic, and transcriptomic datasets that are generated after manipulating a candidate PTM. This approach has already been used to decipher the modes of action for certain small molecule inhibitors using multi-omics data (Patel-Murray et al., 2020). ML-based data integration approaches may provide valuable information on the functional effects of PTMs under investigation, and could highlight modules downstream of specific modifications. As one possible example, to further the discovery of metabolite-derived PTMs, ML training sets may be derived from metabolite data such as acyl-CoA species, an increasingly recognized source of PTMs that regulate widespread metabolic and transcriptional processes (James et al., 2018; Trub and Hirschey, 2018).
The factors discussed above all focus on improving our study of the PTM landscape at a single point in time and averaged across many cells. Ultimately, the dynamic, spatiotemporal study of protein PTMs will be required to fully understand their complex roles in physiology and disease. A time when PTMs can be resolved between or within cells at the timescales that determine cell behavior may seem like an unimaginable future. However, recent work shows exciting steps forward achieved by leveraging the complementary strengths brought by each subdiscipline. Specifically, current MS imaging modalities are capable of deconstructing spatial metabolite distribution within tissues (Buchberger et al., 2018; Pareek et al., 2020), single cell proteomics has recently become a reality (Dou et al., 2019; Marx, 2019), and temporal resolution of cysteine oxidation has been achieved in vitro (Behring et al., 2020) and in vivo over time (Xiao et al., 2020). Perhaps this future is not far away.
Conclusion
What began as a search for the chemical constituents of the protein vitellin, led to the unexpected discovery of phospho-serine and an unrealized world of protein modifications. Our catalogued understanding of primary amino acid modifications has grown to more than 500 modifications known today (Figure 2). Despite progress in mapping new protein modifications, mapping the breadth and depth of PTMs is far from complete. Our understanding of the functional consequences of each modification remains quite limited, as is our understanding if modifications contribute to physiological pathway feedback, if they are pathophysiological, or both. Technological advances in mass spectrometry led to rapid discoveries of new PTMs. Recent advances in data science and machine learning mean we are paving the way for a second inflection point in PTM discovery (Figure 2). The search for protein modifications now has shifted from chemical to computational, from the composition of egg yolk proteins to the composition of peptide masses in silico space. Continued investigation into this emerging area of biology will perhaps answer the age-old question: “is it a PTM”?
Supplementary Material
Table S1. Raw data of protein modifications, derived from Uniprot.
Table S2. Raw data of metabolites, derived from HMDB.
Table S3. Summary table of reactive metabolites.
Acknowledgements
We would like to thank Harsha Srijay and Matthew Huang (Duke) for assistance with data collection; Andreas Madsen (Copenhagen), Deb Muoio (Duke), Chris Newgard (Duke), Christian A. Olsen (Copenhagen) for reviewing an early draft of the manuscript; and the unnamed reviewers of manuscript drafts for thoughtful and constructive feedback. Work on protein modifications in the Hirschey lab is supported by The Glenn Foundation, the National Institutes of Health/NIA grant R01AG045351, the National Institutes of Health/NIDDK grant R01DK115568. EKK is supported by 3R01DK11556803S1. DKZ is supported through the R38 SCI-StARR program in the Department of Pediatrics. We apologize to colleagues whose work was not cited due to space limitations or oversight. Please bring errors and egregious omissions to our attention.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Aebersold R, and Mann M (2003). Mass spectrometry-based proteomics. Nature 422, 198–207. [DOI] [PubMed] [Google Scholar]
- Alderson NL, Wang Y, Blatnik M, Frizzell N, Walla MD, Lyons TJ, Alt N, Carson JA, Nagai R, Thorpe SR, et al. (2006). S-(2-Succinyl)cysteine: a novel chemical modification of tissue proteins by a Krebs cycle intermediate. Arch Biochem Biophys 450, 1–8. [DOI] [PubMed] [Google Scholar]
- Allfrey VG, Faulkner R, and Mirsky AE (1964). Acetylation and Methylation of Histones and Their Possible Role in the Regulation of Rna Synthesis. Proc Natl Acad Sci U S A 51, 786–794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alquezar C, Arya S, and Kao AW (2020). Tau Post-translational Modifications: Dynamic Transformers of Tau Function, Degradation, and Aggregation. Front Neurol 11, 595532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Backus KM, Correia BE, Lum KM, Forli S, Horning BD, Gonzalez-Paez GE, Chatterjee S, Lanning BR, Teijaro JR, Olson AJ, et al. (2016). Proteome-wide covalent ligand discovery in native biological systems. Nature 534, 570–574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baell JB, Leaver DJ, Hermans SJ, Kelly GL, Brennan MS, Downer NL, Nguyen N, Wichmann J, McRae HM, Yang Y, et al. (2018). Inhibitors of histone acetyltransferases KAT6A/B induce senescence and arrest tumour growth. Nature 560, 253–257. [DOI] [PubMed] [Google Scholar]
- Bancher C, Brunner C, Lassmann H, Budka H, Jellinger K, Wiche G, Seitelberger F, Grundke-Iqbal I, Iqbal K, and Wisniewski HM (1989). Accumulation of abnormally phosphorylated tau precedes the formation of neurofibrillary tangles in Alzheimer’s disease. Brain Res 477, 90–99. [DOI] [PubMed] [Google Scholar]
- Behring JB, van der Post S, Mooradian AD, Egan MJ, Zimmerman MI, Clements JL, Bowman GR, and Held JM (2020). Spatial and temporal alterations in protein structure by EGF regulate cryptic cysteine oxidation. Sci Signal 13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bergström S, and Lindstedt S (1951). On the Isolation and Structure of Hydroxylysine. Acta Chemica Scandinavica 5, 157–167. [Google Scholar]
- Biemann K (1992). Mass spectrometry of peptides and proteins. Annu Rev Biochem 61, 977–1010. [DOI] [PubMed] [Google Scholar]
- Blom N, Sicheritz-Ponten T, Gupta R, Gammeltoft S, and Brunak S (2004). Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4, 1633–1649. [DOI] [PubMed] [Google Scholar]
- Bradford Vickery H (1972). The History of the Discovery of the Amino Acids II. A Review of Amino Acids Described Since 1931 as Components of Native Proteins. In Advances in Protein Chemistry, Anfinsen CB, Edsall JT, and Richards FM, eds. (Academic Press; ), pp. 81–171. [Google Scholar]
- Brunk E, Chang RL, Xia J, Hefzi H, Yurkovich JT, Kim D, Buckmiller E, Wang HH, Cho BK, Yang C, et al. (2018). Characterizing posttranslational modifications in prokaryotic metabolism using a multiscale workflow. Proc Natl Acad Sci U S A 115, 11096–11101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buchberger AR, DeLaney K, Johnson J, and Li L (2018). Mass Spectrometry Imaging: A Review of Emerging Advancements and Future Insights. Anal Chem 90, 240–265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chick JM, Kolippakkam D, Nusinow DP, Zhai B, Rad R, Huttlin EL, and Gygi SP (2015). A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides. Nat Biotechnol 33, 743–749. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Choudhary C, Kumar C, Gnad F, Nielsen ML, Rehman M, Walther TC, Olsen JV, and Mann M (2009). Lysine acetylation targets protein complexes and co-regulates major cellular functions. Science 325, 834–840. [DOI] [PubMed] [Google Scholar]
- Devabhaktuni A, Lin S, Zhang L, Swaminathan K, Gonzalez CG, Olsson N, Pearlman SM, Rawson K, and Elias JE (2019). TagGraph reveals vast protein modification landscapes from large tandem mass spectrometry datasets. Nat Biotechnol 37, 469–479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dixon GH, Dreyer WJ, and Neurath H (1956). THE REACTION OF p-NITROPHENYL ACETATE WITH CHYMOTRYPSIN1. Journal of the American Chemical Society 78, 4810–4810. [Google Scholar]
- Dou M, Clair G, Tsai CF, Xu K, Chrisler WB, Sontag RL, Zhao R, Moore RJ, Liu T, Pasa-Tolic L, et al. (2019). High-Throughput Single Cell Proteomics Enabled by Multiplex Isobaric Labeling in a Nanodroplet Sample Preparation Platform. Anal Chem 91, 13119–13127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eng JK, McCormack AL, and Yates JR (1994). An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5, 976–989. [DOI] [PubMed] [Google Scholar]
- Greer EL, and Shi Y (2012). Histone methylation: a dynamic mark in health, disease and inheritance. Nat Rev Genet 13, 343–357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gui CY, Ngo L, Xu WS, Richon VM, and Marks PA (2004). Histone deacetylase (HDAC) inhibitor activation of p21WAF1 involves changes in promoter-associated proteins, including HDAC1. Proc Natl Acad Sci U S A 101, 1241–1246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Han X, He L, Xin L, Shan B, and Ma B (2011). PeaksPTM: Mass spectrometry-based identification of peptides with unspecified modifications. J Proteome Res 10, 2930–2936. [DOI] [PubMed] [Google Scholar]
- Han Z, Wu H, Kim S, Yang X, Li Q, Huang H, Cai H, Bartlett MG, Dong A, Zeng H, et al. (2018). Revealing the protein propionylation activity of the histone acetyltransferase MOF (males absent on the first). J Biol Chem 293, 3410–3420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hanahan D, and Weinberg RA (2011). Hallmarks of cancer: the next generation. Cell 144, 646–674. [DOI] [PubMed] [Google Scholar]
- He W, Wei L, and Zou Q (2018). Research progress in protein posttranslational modification site prediction. Brief Funct Genomics 18, 220–229. [DOI] [PubMed] [Google Scholar]
- He XD, Gong W, Zhang JN, Nie J, Yao CF, Guo FS, Lin Y, Wu XH, Li F, Li J, et al. (2017). Sensing and Transmitting Intracellular Amino Acid Signals through Reversible Lysine Aminoacylations. Cell Metab. [DOI] [PubMed] [Google Scholar]
- Huang H, Tang S, Ji M, Tang Z, Shimada M, Liu X, Qi S, Locasale JW, Roeder RG, Zhao Y, et al. (2018a). p300-Mediated Lysine 2-Hydroxyisobutyrylation Regulates Glycolysis. Mol Cell 70, 984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang H, Wang DL, and Zhao Y (2018b). Quantitative Crotonylome Analysis Expands the Roles of p300 in the Regulation of Lysine Crotonylation Pathway. Proteomics 18, e1700230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- James AM, Smith CL, Smith AC, Robinson AJ, Hoogewijs K, and Murphy MP (2018). The Causes and Consequences of Nonenzymatic Protein Acylation. Trends Biochem Sci 43, 921–932. [DOI] [PubMed] [Google Scholar]
- Kim MS, Pinto SM, Getnet D, Nirujogi RS, Manda SS, Chaerkady R, Madugundu AK, Kelkar DS, Isserlin R, Jain S, et al. (2014). A draft map of the human proteome. Nature 509, 575–581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim SC, Sprung R, Chen Y, Xu Y, Ball H, Pei J, Cheng T, Kho Y, Xiao H, and Xiao L (2006). Substrate and Functional Diversity of Lysine Acetylation Revealed by a Proteomics Survey. Mol Cell 23, 607–618. [DOI] [PubMed] [Google Scholar]
- Krebs EG, and Fischer EH (1956). The phosphorylase b to a converting enzyme of rabbit skeletal muscle. Biochim Biophys Acta 20, 150–157. [DOI] [PubMed] [Google Scholar]
- Levene PA, and Alsberg CL (1906). THE CLEAVAGE PRODUCTS OF VITELLIN. Journal of Biological Chemistry 2, 127–133. [Google Scholar]
- Li G, Tian Y, and Zhu WG (2020). The Roles of Histone Deacetylases and Their Inhibitors in Cancer Therapy. Front Cell Dev Biol 8, 576946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lipmann F (1983). Analysis of phosphoproteins and developments in protein phosphorylation. Trends in Biochemical Sciences 8, 334–336. [Google Scholar]
- Luo HB, Xia YY, Shu XJ, Liu ZC, Feng Y, Liu XH, Yu G, Yin G, Xiong YS, Zeng K, et al. (2014). SUMOylation at K340 inhibits tau degradation through deregulating its phosphorylation and ubiquitination. Proc Natl Acad Sci U S A 111, 16586–16591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marx V (2019). A dream of single-cell proteomics. Nat Methods 16, 809–812. [DOI] [PubMed] [Google Scholar]
- Moellering RE, and Cravatt BF (2013). Functional lysine modification by an intrinsically reactive primary glycolytic metabolite. Science 341, 549–553. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morris M, Knudsen GM, Maeda S, Trinidad JC, Ioanoviciu A, Burlingame AL, and Mucke L (2015). Tau post-translational modifications in wild-type and human amyloid precursor protein transgenic mice. Nat Neurosci 18, 1183–1189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morris M, Maeda S, Vossel K, and Mucke L (2011). The many faces of tau. Neuron 70, 410–426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mucke L, Masliah E, Yu GQ, Mallory M, Rockenstein EM, Tatsuno G, Hu K, Kholodenko D, Johnson-Wood K, and McConlogue L (2000). High-level neuronal expression of abeta 1–42 in wild-type human amyloid protein precursor transgenic mice: synaptotoxicity without plaque formation. J Neurosci 20, 4050–4058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murray K (1964). The Occurrence of Epsilon-N-Methyl Lysine in Histones. Biochemistry 3, 10–15. [DOI] [PubMed] [Google Scholar]
- Na S, Bandeira N, and Paek E (2012). Fast multi-blind modification search through tandem mass spectrometry. Mol Cell Proteomics 11, M111 010199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Na S, and Paek E (2015). Software eyes for protein post-translational modifications. Mass Spectrom Rev 34, 133–147. [DOI] [PubMed] [Google Scholar]
- Ochoa D, Jarnuczak AF, Vieitez C, Gehre M, Soucheray M, Mateus A, Kleefeldt AA, Hill A, Garcia-Alonso L, Stein F, et al. (2020). The functional landscape of the human phosphoproteome. Nat Biotechnol 38, 365–373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pareek V, Tian H, Winograd N, and Benkovic SJ (2020). Metabolomics and mass spectrometry imaging reveal channeled de novo purine synthesis in cells. Science 368, 283–290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patel-Murray NL, Adam M, Huynh N, Wassie BT, Milani P, and Fraenkel E (2020). A Multi-Omics Interpretable Machine Learning Model Reveals Modes of Action of Small Molecules. Sci Rep 10, 954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Perkins DN, Pappin DJ, Creasy DM, and Cottrell JS (1999). Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567. [DOI] [PubMed] [Google Scholar]
- Posternak S (1927). The Phosphorus Nucleus of Caseinogen. Biochem J 21, 289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rabinowitz M, and Lipmann F (1960). Reversible phosphate transfer between yolk phosphoprotein and adenosine triphosphate. J Biol Chem 235, 1043–1050. [PubMed] [Google Scholar]
- Rimington C (1927). The Phosphorus of Caseinogen: Isolation of a Phosphorus-containing Peptone from Tryptic Digests of Caseinogen. Biochem J 21, 1179–1186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwammle V, Verano-Braga T, and Roepstorff P (2015). Computational and statistical methods for high-throughput analysis of post-translational modifications of proteins. J Proteomics 129, 3–15. [DOI] [PubMed] [Google Scholar]
- Shishkova E, Hebert AS, and Coon JJ (2016). Now, More Than Ever, Proteomics Needs Better Chromatography. Cell Syst 3, 321–324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soeda Y, and Takashima A (2020). New Insights Into Drug Discovery Targeting Tau Protein. Front Mol Neurosci 13, 590896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sutherland EW Jr., and Wosilait WD (1955). Inactivation and activation of liver phosphorylase. Nature 175, 169–170. [DOI] [PubMed] [Google Scholar]
- Tan M, Peng C, Anderson KA, P Chhoy Z Xie, Dai L, Park JS, Chen Y, Huang H, Zhang Y, et al. (2014). Lysine Glutarylation Is a Protein Post-translational Modification Regulated by SIRT5. Cell Metab 19, 605–617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Torres MP, Dewhurst H, and Sundararaman N (2016). Proteome-wide Structural Analysis of PTM Hotspots Reveals Regulatory Elements Predicted to Impact Biological Function and Disease. Mol Cell Proteomics 15, 3513–3528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trub AG, and Hirschey MD (2018). Reactive Acyl-CoA Species Modify Proteins and Induce Carbon Stress. Trends Biochem Sci 43, 369–379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- UniProt Consortium T. (2018). UniProt: the universal protein knowledgebase. Nucleic Acids Res 46, 2699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uy R, and Wold F (1977). Posttranslational covalent modification of proteins. Science 198, 890–896. [DOI] [PubMed] [Google Scholar]
- Verdin E, and Ott M (2015). 50 years of protein acetylation: from gene regulation to epigenetics, metabolism and beyond. Nat Rev Mol Cell Biol 16, 258–264. [DOI] [PubMed] [Google Scholar]
- Vickery HB, and Schmidt CLA (1931). The History of the Discovery of the Amino Acids. Chemical Reviews 9, 169–318. [Google Scholar]
- Wagner GR, Bhatt DP, O’Connell TM, Thompson JW, Dubois LG, Backos DS, Yang H, Mitchell GA, Ilkayeva OR, Stevens RD, et al. (2017). A class of reactive acyl-CoA species reveals the nonenzymatic origins of protein acylation. Cell Metab 25, 823–837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walsh CT (2005). Posttranslational Modification of Proteins: Expanding Nature’s Inventory, Vol 45 (Roberts & Co.). [Google Scholar]
- Wang Q, Zhang Y, Yang C, Xiong H, Lin Y, Yao J, Li H, Xie L, Zhao W, Yao Y, et al. (2010). Acetylation of metabolic enzymes coordinates carbon source utilization and metabolic flux. Science 327, 1004–1007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weerapana E, Wang C, Simon GM, Richter F, Khare S, Dillon MB, Bachovchin DA, Mowen K, Baker D, and Cravatt BF (2010). Quantitative reactivity profiling predicts functional cysteines in proteomes. Nature 468, 790–795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wishart DS, Feunang YD, Marcu A, Guo AC, Liang K, Vazquez-Fresno R, Sajed T, Johnson D, Li C, Karu N, et al. (2018). HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res 46, D608–D617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wold F (1981). In vivo chemical modification of proteins (post-translational modification). Annu Rev Biochem 50, 783–814. [DOI] [PubMed] [Google Scholar]
- Xiao H, Jedrychowski MP, Schweppe DK, Huttlin EL, Yu Q, Heppner DE, Li J, Long J, Mills EL, Szpyt J, et al. (2020). A Quantitative Tissue-Specific Landscape of Protein Redox Regulation during Aging. Cell 180, 968–983 e924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao S, Xu W, Jiang W, Yu W, Lin Y, Zhang T, Yao J, Zhou L, Zeng Y, Li H, et al. (2010). Regulation of cellular metabolism by protein lysine acetylation. Science 327, 1000–1004. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Table S1. Raw data of protein modifications, derived from Uniprot.
Table S2. Raw data of metabolites, derived from HMDB.
Table S3. Summary table of reactive metabolites.
