Abstract
Covalent drugs constitute cornerstones of modern medicine. The past decade has witnessed growing enthusiasm for development of covalent inhibitors, fueled by clinical successes as well as advances in analytical techniques associated with the drug discovery pipeline. Among these, mass spectrometry-based chemoproteomic methods stand out due to their broad applicability from focused analysis of electrophile-containing compounds to surveying proteome-wide inhibitor targets. Here, we review applications of both foundational and cutting-edge chemoproteomic techniques across target identification, hit discovery, and lead characterization/optimization in covalent drug discovery. We focus on the practical aspects necessary for the general drug discovery scientist to design, interpret, and evaluate chemoproteomic experiments. We also present three case studies on clinical stage molecules to further showcase the real world significance and future opportunities of these methodologies.
Graphical Abstract
Introduction
Covalent inhibitors can achieve exquisite potency and durable target occupancy through a combination of covalent and noncovalent interactions. Drugs that possess a covalent mechanism-of-action encompass recent blockbusters (Clopidogrel) as well as early cornerstones of modern medicine (aspirin and penicillin). Despite this well-documented history and the advantages from sustained target engagement, pharmaceutical companies have traditionally shied away from covalent drug programs due to concerns about potential idiosyncratic toxicity1. In fact, for many covalent drugs the covalent mechanism of action was serendipitously discovered after they were approved for clinical use. However, clinical successes over the past decade (e.g. Ibrutinib), along with improvements in technologies used throughout the drug discovery pipeline, have reignited broad interest in this class of therapeutics. [Fig. 1]
Over the past two decades, mass spectrometry has become the technique of choice to characterize complex proteomes, including post-translational modifications (PTM). Since covalent drugs modify amino acid side chains in ways similar to endogenous PTMs, mass spectrometry methods provide particularly powerful tools in covalent drug discovery. The field of chemoproteomics seeks to characterize interactions between small molecules and their protein targets and includes a growing set of techniques. For the purposes of this review, chemoproteomic techniques are split into two categories: (1) methods for in vitro analysis of isolated protein-ligand conjugates, and (2) methods that map interactions between small molecules and proteins in complex proteomes such as cells or lysates. The diversity of discrete methods within these broad categories creates opportunities to use chemoproteomics across the drug discovery pipeline, from the early stages of target and hit identification through lead optimization and in vivo studies. While this review focuses on covalent inhibitors, methods reviewed are widely applied in noncovalent drug discovery as well.
Our goal with this review is to provide a timely discussion of various chemoproteomic techniques that are likely to benefit scientists working in drug discovery, as well as those driving mass spectrometry technology development. We provide an in-depth discussion of chemoproteomic method development from a perspective of covalent drug discovery. We discuss methods for characterizing interactions between electrophile-containing compounds and purified proteins, as well as those that explore interactions between covalent small molecules and cellular proteomes. Additionally, we elaborate on the strengths, weaknesses, and complementary nature of mass spectrometry and other bio-physical/chemical techniques used in drug discovery. We illustrate the use of chemoproteomics through several case studies of clinical-stage molecules, which showcases their use directly in a drug discovery context.
Purified protein-ligand conjugates
In this section, we discuss methods designed for the analysis of electrophile-bearing compounds incubated with individual, recombinant proteins. These strategies can inform on the site(s) of covalent attachment, stoichiometry, and specificity of the potential covalent interaction. We will describe how MS facilitates main steps in drug discovery pipeline, such as lead characterization and hit/lead discovery.
In general, most strategies for analyzing proteins using mass spectrometry can be divided broadly into “intact protein mass spectrometry” where proteins are analyzed without digestion (top-down proteomics), and methods where the main analytes are peptides derived from proteolytic digestion (bottom-up proteomics). In intact protein MS, purified protein(s) are reacted with electrophile-containing compounds of interest and are then typically analyzed by liquid chromatography (LC) or capillary electrophoresis (CE) coupled directly to a mass spectrometer with electrospray ionization (LC-MS or CE-MS) [Fig. 2a]. The protein of interest is detected as a series of multiply-charged ions, which can be computationally deconvoluted to yield a nominal mass. Alternatively, proteins can be subjected to proteolytic digestion to generate peptides prior to chromatographic separation and MS analysis. These methods can yield more granular information beyond intact mass (for example, the location of any covalent modifications), and are therefore complementary to intact protein MS. Below we discuss the use of these approaches in different aspects of covalent drug discovery.
Lead Characterization
Several critical steps in covalent lead characterization benefit greatly from the power of MS-based analysis. In this section we highlight how MS is used to confirm the covalent mode of compound attachment to the target, to identify the specific site of the attachment and binding stoichiometry, to discriminate against compounds with broad reactivity, and discover new leads. It is relevant to note that, although targeting residues beyond cysteine is feasible as exemplified by the number of approved drugs targeting active-site serines/threonines, cysteine remains the most frequently exploited residue in this context. Therefore, most of the examples we discuss in this review are of cysteine-directed covalent compounds.
Protein-level analysis
In the context of covalent inhibitor development, a common early step in validation and characterization is intact protein mass spectrometry. Intact protein mass spectrometry is commonly used early in the drug discovery process to validate and characterize new candidate electrophilic compounds. A shift in protein mass corresponding to that of the inhibitor minus any leaving group between compound-treated and control samples indicates covalent labelling of the protein [Fig. 2a]. This direct observation of the inhibitor-protein adduct provides information about covalent mechanism of action. Additionally, an emphasis on one-to-one binding stoichiometry can be used to triage hyperreactive compounds which modify multiple residues [Fig. 2b]. For example, Fry et al. validated the first-generation covalent EGFR/ErbB2 inhibitor P168393 in this manner: a shift of 370Da after compound incubation indicated 1:1 covalent binding. The same inhibition mechanism was later leveraged in afatinib, which was approved for treatment of metastatic non-small cell lung cancer in 20132.
In some cases, several reactive residues may be present near the compound binding pocket and multiple labelling events might be acceptable for an initial lead. Garland et al. leveraged intact MS to evaluate the selenide compound ebselen as a potential covalent inhibitor of the botulinum neurotoxin serotype A light chain protease. The authors found that the compound labelled the protein at a 1:2 stoichiometry, and subsequent peptide-level analysis revealed the targeted residues to be two cysteines, C134 and C165, both at the active site3. Here, double labelling resulted from two reactive cysteine residues in the same pocket rather than compound hyperreactivity, and the authors continued with this hit for further optimization.
In principle, intact protein MS can also reveal unexpected mass shifts upon compound treatment, which suggest that chemical changes, other than simple protein-inhibitor adduct formation, have occurred. Analysis of these changes can lead to mechanistic insights, as recently shown by Bashore et al. who observed an unexpected -34Da peak upon USP7 treatment with a cyanopyrrolidine inhibitor. They demonstrated that the inhibition mechanism involved beta-elimination of the enzyme-inhibitor adduct, resulting in conversion of the active-site cysteine into dehydroalanine [Fig. 2c]4. In this case, intact protein mass spectrometry provided insight on the inhibition mechanism for this covalent inhibitor.
Despite its usefulness, intact protein MS does have several important limitations related to the size and purity of the input protein. During the ionization process, each protein molecule sequesters a particular number of protons resulting in a distribution of mass-to-charge (m/z) peaks in the spectrum. Larger proteins tend to sequester a higher number of protons, yielding more multiply-charged peaks within the same m/z range of the spectrum [Fig. 2d]. However it is often more difficult to purify larger proteins to homogeneity. As a result, the high charge state peaks for larger proteins may actually overlap in m/z due to heterogeneous sequence variants in addition to spurious PTMs. In these cases deconvolution algorithms may fail, making it difficult to confidently assign inhibitor-based mass shifts. Use of ultra-high resolution mass spectrometers can compensate for deleterious effects from protein heterogeneity, but typically require greater upfront financial investment and more complex instrument acquisition methods.
Peptide-level analysis
In many cases the same inhibitor-protein reaction mixture used for intact protein mass spectrometry can be processed for peptide-level analysis5. Generally, intact protein mass spectrometry and peptide level analysis go hand-in-hand and provide complementary information. While intact protein mass spectrometry offers confirmation of covalent bond formation and protein-level stoichiometry, peptide level analysis offers information about the precise site(s) of modification as well as inhibitor site occupancy.
First reported in the late 1980s, MS identification of cysteine residues targeted by covalent small molecules is now part of routine procedure5. The protein of interest is treated with dithiothreitol (DTT) and iodoacetamide to reduce and alkylate cysteine residues, respectively, followed by trypsin digestion to yield peptides [Fig. 3a]. Resulting peptide mixtures are desalted, and typically analyzed by liquid chromatography coupled to electrospray ionization for MS and MS/MS acquisition. Generally, as peptides are temporally separated on the LC column and introduced into the mass spectrometer, the MS scan records m/z values for each peptide. In MS/MS, individual peptides, or more correctly m/z regions, are isolated and fragmented to provide primary amino acid sequence information. A number of commercial and open-source software packages are available to automatically assign MS/MS spectra to peptide sequence and assist in identifying the amino acid site of modification [Fig. 3b]. Users should be mindful that covalent inhibitors may alter the dissociation of peptides during MS/MS, potentially reducing the effectiveness of these algorithms. However our recent work demonstrated that knowledge of inhibitor-specific fragmentation behavior can be leveraged to improve peptide sequence identification[Fig. 3c]6. This procedure of identifying proteins from peptides using a combination of liquid chromatography and mass spectrometry is termed shotgun or bottom-up proteomics [Fig. 3c].
The strength of LC-MS/MS analysis at the peptide level lies in an unbiased view for identifying the targeted residue with site occupancy information. This is especially important because some warheads can react with multiple amino acid side chains. In one example, Pettinger et al. designed a lead that was intended to target a cysteine on HSP72. However, peptide level LC-MS/MS revealed that the molecule reacted with a lysine residue instead7. This result highlights the strength of peptide level LC-MS/MS to provide an unbiased assessment of unexpected side-chain targeting by a new electrophilic lead compound. It is important to highlight that MS-based inferences for mechanism of action should always be confirmed with orthogonal methods, such as mutagenesis of the key residue into an amino acid variant unreactive with compound.
Peptide-level analysis of in vitro protein reactions can also further triage compounds after intact protein MS [Fig. 3d]. Compounds that label target proteins with apparent one-to-one stoichiometry may in fact be nonspecifically reacting with different residues on each target protein molecule. Ward et al.’s investigation of the failed clinical compound VLX1570 exemplifies this scenario8. Even though intact protein mass spectrometry indicated one-to-one binding to protein CIAPIN1, peptide-level analysis revealed that VLX1570 modified CIAPIN1 nonspecifically, reacting with any one of seven cysteine residues on the protein. This hyperreactive behavior provided a hypothesis for the dose-limiting toxicity of VLX1570 which led to the end of the clinical trial.
Some reversible covalent inhibitor-protein complexes are known to dissociate in the denaturing conditions of protein digestion. For example, in investigating a cyanopyrrolidine-based covalent inhibitor of UCHL1, Kooij et. al reported that the isothiourea bond between the protein UCHL1 and the inhibitor dissociates when reduced with β-mercaptoethanol9. Hence, researchers must be cognizant of experimental artifacts that may arise as a result of the specific chemistries associated with different warheads. In addition, the identity and cleavage specificities of the digestion enzyme should also be kept in mind; the targeted residue might not always be captured in a peptide amenable to standard LC-MS/MS analysis (generally 6-30 amino acids in length) by use of standard trypsin digestion alone.
In primary screening/hit discovery
In addition to characterizing covalent leads, intact protein MS can be leveraged in hit discovery contexts as a primary screening assay. Typically, libraries for MS primary screening are much smaller than diversity collections for biochemical or phenotypic screening, with <1000 electrophile-bearing compounds custom-synthesized for each screening effort. These are most often “fragments”, compounds of approximately 250Da. The purified protein of interest is incubated with pools of compounds, and then analyzed by MS. As in lead characterization, a shift in mass corresponding to that of the inhibitor minus any leaving groups between compound-treated and control samples indicates a positive hit. Each fragment in the hit pool is then individually incubated with the protein to identify and confirm the hit compound.
Compared to the more frequently encountered biochemical or phenotypic screens, intact protein MS screens offer several advantages. First, MS is relatively agnostic to the protein of interest. While biochemical screens must have assays reoptimized for each new target protein, intact protein mass spectrometry requires minimal calibration across proteins in completely different families. In addition, because the readout of intact MS is based on physical mass shift, it is relatively resistant to artifacts often encountered in biochemical assays such as autofluorescence. Information about a covalent mechanism of action can also be obtained in primary screening, streamlining characterization and optimization. When combined with peptide-level analysis, intact protein MS primary screening is well-suited to identifying novel compound binding pockets. We will illustrate this concept further below (see the KRASG12C inhibitor in the case studies section).
The primary limitation in screening compounds by intact protein MS is throughput. Unlike biochemical methods which can screen a hundred thousand compounds per day, intact protein MS primary screens can only analyze hundreds of compounds a day. At the present, this does not pose a practical limitation due to the modest size and limited availability of current electrophilic small molecule libraries. However, as interest in covalent drug development expands, it is likely that the limited throughput of intact protein MS will place an operational boundary on the library size, and thus chemical space which can be explored in the context of a primary screen. In addition, requirements on the size, purity, and amount of the analyte protein are more stringent than for biochemical assays. Protein amounts in the microgram scale might be required per analysis depending on the molecular weight of the protein of interest, which might be prohibitive for certain targets. Furthermore, the assay formally measures compound binding and not enzyme inhibition; therefore additional biochemical assays are required to validate the inhibitory behavior of hit compounds.
Disulfide tethering
As mentioned in the introduction, drug discovery efforts have traditionally avoided covalent inhibitors due to concerns about their reactivity and off-target toxicities. As such, it follows that MS based approaches for screening covalent libraries are derived from analogous efforts in noncovalent inhibitor development. The classical screening technique based on intact protein MS is disulfide tethering, which was initially developed to identify weak noncovalent ligands. For this method, molecules in the screening library contain disulfide moieties which tether weakly interacting noncovalent structures to nearby cysteines, enabling a mass spectrometry readout10. The reducing environment facilitates rapid thiol exchange, leading to the identification of even weakly binding reversible ligands, since fragments which form specific interactions are less likely to dissociate [Fig. 2e]. Once ligands are identified, the disulfide and linker are removed, and the hits are validated and optimized as noncovalent inhibitors. This method was first demonstrated by Erlanson et al. in 2000, where the authors screened 1200 fragments using intact protein MS against thymidylate synthase10. They successfully identified a 1mM hit and then leveraged structural insight to optimize it into a 300nM noncovalent inhibitor.
Since then, applications of disulfide tethering have been expanded to identify covalent inhibitors. As the formation of a disulfide bond necessitates a reactive cysteine thiol near the ligand binding site, this exact thiol can be leveraged for covalent inhibitor development. After initial screening, the thiol and linker are replaced by a cysteine-reactive electrophile such as an acrylamide to yield candidate covalent inhibitors. The Shokat Lab used this strategy to discover a covalent inhibitor targeting one of the most common oncogenes, KRASG12C, eventually resulting in Amgen’s drug candidate, AMG 510, currently in clinical trials.11,12
Electrophilic fragment screening
In recent years, electrophilic fragment screening emerged as a novel strategy to identify novel covalent inhibitors. Instead of disulfides, which were used in the previous example, compounds in electrophilic fragment screening libraries contain irreversible electrophilic warheads, most commonly acrylamides and chloroacetamides. The compounds are screened against the purified protein of interest by intact protein MS. Initial studies focused on characterizing the extent to which potential hyperreactivity of irreversible modifiers would be problematic for the development of selective binders. In 2014, Kathman et al. screened 100 electrophilic compounds bearing an acrylamide warhead in pools of 10 against papain as a prototype cysteine protease13. The authors noted that while each compound was incubated with papain at a ten-fold excess, most compounds did not label papain. This serves as proof-of-concept that well-designed irreversible covalent small molecules can serve as libraries for productive screening efforts. The same covalent fragment library and screening strategy from Kathman et al. has been used to discover inhibitors against other enzymes since.
In a recent study, Resnick et. al built upon this approach to concurrently screen a library of covalent fragments against 10 human, bacterial, and viral proteins across multiple protein families14. Each purified protein was incubated with pools of five electrophilic fragments, then analyzed by LC-MS to examine covalent binding. Concurrent screening enables triage of hits that displayed multitargeted behavior. Ultimately, the screen identified promising chemical starting points for 8 out of 10 screened proteins. These results confirm that electrophilic fragment screening is broadly applicable against a diverse range of targets.
As noted above, the throughput of MS-based hit discovery is relatively modest. In the aforementioned studies, pooling multiple compounds (usually 5-10) in each well has been an effective method to increase the number of compounds that can be screened in a given run, but data acquisition and analysis time per compound remains a bottleneck. In recent years, innovation efforts have focused on these two aspects to improve throughput of assays built around intact protein MS. A group at Amgen built upon the Agilent Rapidfire MS system and applied an automated data analytics platform to enhance throughput15. The solid phase extraction (SPE) approach in the Rapidfire MS system reduced sample processing time to approximately 10-15s per sample, while the automated deconvolution maintained spectral processing at a rate equivalent to data acquisition. Using this method, a 96-well plate can be screened in under 12 minutes. Another innovation of interest is acoustic mist ionization (AMI) systems developed by AstraZeneca and Waters, which achieves increased throughput via acoustic ejection sample loading. While compatibility with protein samples has been demonstrated, application in characterizing covalent inhibitor-protein adducts has not yet been reported16.
Collectively, MS analysis of intact protein-inhibitor conjugates combined with bottom-up LC-MS/MS analysis of corresponding proteolytic digests represents a powerful toolbox across hit discovery and characterization. These methods provide key information about site(s), stoichiometry, and specificity of the potential covalent interaction.
Proteome-wide methods
This section focuses on methods that examine interactions between small molecules and proteins in a cellular proteome. The defining strength of mass spectrometry in chemoproteomic experiments is its ability to identify on- and off-targets of covalent probes in a physiologic, cellular context without prior knowledge of protein function, localization, or activation state. However, while MS is agnostic with respect to the underlying target biology, it is inherently abundance-biased. Coupled to the incredible complexity and dynamic range of protein expression in human cells and tissues, this bias against low abundance species holds important implications in both chemoproteomics and other mass spectrometry-based approaches to study complex proteomes.
A brief comparison with mass spectrometry analysis of cellular phosphorylation is informative in understanding some of the challenges for proteome-wide chemoproteomic methods. Using common shotgun acquisition methods, current state-of-the-art mass spectrometers typically provide identification of 8,000-10,000 proteins from human cells in a few hours’ analysis time. Critically, little to no information on protein phosphorylation or any PTMs is available in these datasets. This is because most phosphorylated peptides occur at low abundance in a vast background of protein expression. The varying stoichiometry of protein phosphorylation and the abundance-biased nature of mass spectrometry acquisition means that in practice unmodified tryptic peptides overwhelm detection of phosphopeptides. Research over the past two decades has established selective enrichment of phosphopeptides from the background proteome as a robust foundation for phosphoproteomic studies17. By isolating phosphopeptides from the proteome, the enrichment step focuses the entire mass spectrometry acquisition bandwidth on phosphopeptides, thus achieving reasonable coverage and sensitivity of the phosphoproteome.
Viewing inhibitor modification as a ‘synthetic’ PTM, a similar approach can be taken with inhibitor-modified peptides, with further complications in the identification of cellular protein targets. First, selective covalent inhibitors and clinical drugs modify a much smaller number of residues in the proteome than a PTM like phosphorylation. Second, by their very nature selective inhibitors are chemically distinct from one another. Therefore, unlike phosphoproteomics where a universal enrichment reagent can be used to bind the entire class of phosphorylated peptides, the hypothetical ‘class’ of covalent inhibitor-modified peptides or proteins cannot be generically enriched across different inhibitors. Therefore, in the case of selective covalent inhibitors, researchers are faced with the daunting task of finding covalently modified peptides which may be present at very low stoichiometry (e.g., ~10% of off-target protein molecules are labeled) on protein targets which may be present at low expression levels in the cell. All in an enormous sea of tryptic peptides that span some 12 orders of magnitude in absolute abundance – the veritable problem of finding one (primary-target) or a few (off-target) peptide-needles in a very large haystack, without a magnet – a general tool to enrich inhibitor modified peptides.
To circumvent these hurdles, reporter probes are often used as surrogates to read out the binding profile of covalent inhibitors. These reagents may be activity-based or designed to react with a specific amino acid side chain (e.g., cysteine thiols). They also include an affinity handle directly or a bio-orthogonal moiety to enable installation of an affinity group at some point in the workflow. Each enrichment reagent biochemically captures, most often covalently, a defined set of cellular proteins, constituting a distinct “addressable chemical space” for each reagent. Importantly, it is the addressable chemical space (ACS) of the reporter probe, and not the inhibitor of interest, which dictates the parameters and the overall scope of the chemoproteomic experiment.
Based on this guiding principle we have organized chemoproteomic methods in this section into three major categories based on the ACS of each probe: (1) amino acid side chains (residue-based probes, RBPs); (2) mechanistically related enzyme classes (classic activity-based probes, ABPs); (3) protein targets of selective inhibitors (inhibitor-based probes, IBPs). [Fig. 4, 5] The reader should note that the term ABP is often used in the literature to refer generically to any of these probes. We propose the use of RBPs, ABPs, and IBPs as more descriptive terms that better capture the distinct biochemical reactivity and analytical requirements associated with each reagent.
Here, it is important to note that while a wider ACS might initially appear more comprehensive and hence preferable over more focused techniques, the choice is more nuanced in practice. A wider ACS means a more complex sample, which complicates MS analysis. In contrast, probes with a narrower ACS may achieve superior sensitivity and reproducibility in terms of the number of on-/off-target proteins quantified across separate LC-MS runs. Finally, mindful of the strengths and weaknesses of each approach, researchers may find it useful to leverage complementarity across the different methods. Hence, researchers must have a clear understanding of the ACS for each enrichment reagent to select the most appropriate methods for each campaign. In most chemoproteomic methods these probes are used in competition format assays with the covalent test compounds. Samples are pre-incubated with vehicle or varying concentrations of the inhibitor, followed by co-incubation of probe (RBP, ABP, or IBP). Relative quantification across the vehicle and inhibitor-treated conditions provides an indirect readout of covalent compound binding against each of the sites within the ACS captured by the probe. Comparison across two or more treatment conditions (e.g. DMSO vs. inhibitor-treated) utilizes one of several MS strategies for relative quantification. Most rely on labelling proteins or peptides with specific isotope tags which provide distinct mass-separated signals whose intensities are used for relative quantification. ‘Label-free’ quantification strategies seek to directly correlate the number of MS/MS spectra acquired for each peptide (‘spectral counts’, roughly equivalent to DNA sequence ‘reads’) or intact peptide signal intensity with the protein quantity.
Residue-based probes (RBPs)
Recent years have seen significant effort invested in the development of RBPs that possess broadly reactive covalent warheads developed to modify side chains on the targeted amino acid residue [Fig. 4b, 5a]. Since the initial report of cysteine-targeted probes in 2010 by Weerapana et al., probes targeting lysine, tyrosine, redox-modified cysteines, and selenocysteine have been developed18-21. Chemoproteomic methods using RBPs can identify targets of covalent compounds as well as illuminate amino acid function, reactivity, and pharmacological accessibility, making them a powerful addition to the drug discovery toolbox.
Identifying protein targets for covalent compounds
RBPs are typically used to identify protein targets for covalent compounds in a cellular proteome. Competing covalent compounds against the probe allows relative quantification of covalent compound binding against each of the amino acid sites captured by the probe. First introduced in 2014 by Wang et al. to quantify inhibitor binding to cellular cysteine residues, competitive RBPs and similar reagents are now commonly applied in target profiling for covalent inhibitors, as well as examining the mechanism and target space of natural products22.
However, the strength of RBPs, monitoring covalent inhibitor modification of specific amino acid side chains across the proteome, is also their primary weakness. Consider the case of cysteine-targeting probes. There are predicted to be more than 200,000 cysteines in the human proteome. The sheer size and enormous complexity inherent to the cellular cysteinome presents a significant challenge for robust detection and quantification of inhibitor targets by these residue-based methods. For example, common cysteine-reactive probes routinely captures only 3000-4000 cysteine sites out of 200,000 (2%) in the proteome14. These problems are further exacerbated for residues that occur more frequently in the proteome, such as lysine [Fig. 5b]. Hence, readers are advised to interpret these residue-based chemoproteomic data as comprising only a subset of peptides targeted by the compound under study, and accompanied by an acknowledgement that additional targets may exist.
Another weakness of RBP-based competitive methods is that they cannot directly confirm covalent bond formation between target and the compound, as they formally report on the covalent attachment of the RBP. As such, blockage of probe binding serves as a proxy for binding of the covalent inhibitor. In principle, noncovalent interaction between the target and small molecule can yield the same competitive binding profile. As a result, follow-up experiments are needed to formally confirm 1:1 covalent complex formation.
Identifying pharmacologically accessible residues and protein targets
Residue-based chemoproteomics was originally developed in a context of target identification to map reactive cysteines in the cellular proteome. In 2010, Weerapana et. al introduced isoTOP-ABPP (isotopic Tandem Orthogonal Proteolysis – Activity-Based Protein Profiling) as a method to enrich and identify accessible (reactive) cysteine residues in the proteome23. isoTOP-ABPP leverages an iodoacetamide probe as a broad cysteine alkylating agent that labels cysteine residues with an appropriate reactivity profile. The probe also includes an alkyne functional group for click chemistry incorporation of a biotin enrichment handle, and a cleavable, isotopically tagged peptide sequence to enable relative quantification of target peptides from multiple samples by mass spectrometry. Proteins with one or more cysteines annotated as ‘reactive’ represent a potential opportunity for covalent inhibitor development. The strategy also allows for identification of hyperreactive cysteine residues, which are enriched at a level that is independent of the probe concentration. This hyperreactivity was interpreted by Weerapana et al. as indicative of specific functional role, such as catalysis23.
More recently, similar studies were carried out to profile reactivity of other amino acids. Ward et al. employed N-hydroxysuccinimide-ester (NHS) ester compounds coupled to alkyne click handles to profile reactive “hotspots” across different amino acids19. The analysis covered lysines (~3000), serines (>1500), threonines (>1500), and, to a lesser extent, tyrosines, arginines, and cysteines. This study highlights potential opportunities in targeting residues outside of cysteine for next-generation covalent therapeutics.
When performed in a competitive assay format against covalent fragment libraries, isoTOP-ABPP method can be used to profile pharmacologically accessible sites across the proteome. This was first demonstrated by Backus et. al, who used a competitive isoTOP-ABPP method to profile a 60-member covalent fragment library, identifying >700 pharmacologically accessible cysteine residues across the proteome24. In an analogous study Hacker et al. identified several hundred hyperreactive lysine sites using a sulfotetrafluorophenyl ester probe25. Fragments profiled in these studies offer ready-made chemical starting points for targeting proteins of interest, with the caveat that the analytical challenges of profiling the large ACS of RBPs may underestimate the proteome-wide reactivity of a given fragment.
In recent years, the Nomura group has utilized chemoproteomics-enabled covalent ligand screening as a discovery platform for potential cancer targets. In this approach, covalent small molecule libraries are screened against cancer cell lines for cell death or other therapeutically relevant phenotypes. This is followed by isoTOP-ABPP to identify the protein targets of hit molecules from the phenotypic screen. These targets are then confirmed by genetic studies. To date, this methodology has identified promising targets in pancreatic, lung, and breast cancer26. That said, hit compounds identified in this manner represent chemical starting points and may require significant medicinal chemistry optimization to achieve suitable potency and selectivity.
Activity-based probes (ABPs)
Since its introduction in the late 1990s, activity-based protein profiling (ABPP) has been used to characterize the function of enzyme families in complex proteomes. ABPP leverages covalent probes (ABPs) which react in a mechanism-based manner with active-site residues of related enzymes [Fig. 4a]. Thus, the ACS of a classical ABP is a family of mechanistically related enzymes [Fig. 5a]. ABPs consist of (1) a reactive group that selectively binds active site residues of the targeted enzyme class, (2) a reporter group, such as a fluorophore or an affinity tag for identification and enrichment purposes, and (3) a target class recognition moiety (i.e. ATP, Ubiquitin), which is connected to the warhead or reporter group. Alternatively, a bioorthogonal chemical handle such as an alkyne can be installed on the probe, allowing the attachment of reporter or affinity tags after the protein target conjugation step. This two-step strategy may enable the use of larger ABPs in live cells. Overall, ABPs offer the most comprehensive profile of inhibitor activity against a given enzyme family of interest and are often used to establish family-wide selectivity profiles of covalent inhibitors or fragments. However, ABPP does not capture compound binding outside of the specified protein target class and ABP binding is not always indicative of enzymes’ activation state in cells. In addition, it is often difficult to identify the ABP-modified amino acid. As a result, specific site binding data are often not available with ABP-based chemoproteomic methods.
Early reports of ABPP focused on serine hydrolases or cysteine proteases27,28. The serine hydrolase (SH) family constitutes a large and functionally diverse family of enzymes, comprising about 1% of the human genome. Early generations of SH ABPs utilized a fluorophosphonate (FP)-based warhead, which showed broad reactivity with this family of enzymes, enabling simultaneous investigation and inhibitor discovery against many SHs28. Indeed, ABPP played a central role in the development of JZL184, a selective covalent inhibitor for monoacyl glycerol lipase (MAGL)29. In this work, mouse brain samples were treated with inhibitor or vehicle, followed by a biotin-FP ABP probe. Tissues were processed for streptavidin-based enrichment of probe-labeled proteins, digestion, and finally multidimensional LC-MS/MS. These ABPP data confirmed that JZL184 completely or partially blocked ABP probe binding to MAGL and FAAH, respectively29. These early studies demonstrated the value of using ABPs to profile large and diverse enzyme families.
The utility of the ABPP approach was further demonstrated in studies of cysteine proteases (CPs), another large and diverse family of enzymes whose members are involved in essential processes such as cell division, cell death and antigen presentation. An interesting example in this context are ABPs targeting deubiquitinating enzymes (DUBs), a gene family that includes six subfamilies of cysteine hydrolases. DUB ABPs consist of electrophilic ubiquitin derivatives containing various Michael acceptors and alkyl halides. We used DUB ABPP to profile in-family selectivity of XL177A, a covalent inhibitor of one of the best studied DUBs, USP7. To maximize DUB family coverage, we employed a cocktail of ABPs: one with a DUB-specific propargyl amide warhead, and another with a vinyl methyl ester warhead. We observed exquisite selectivity of the inhibitor for USP7, which was confirmed by proteome-wide studies30. Note that since the ABP consists of a ubiquitin moiety, other ubiquitin-binding proteins such as E1/E2/E3 enzymes are also enriched by the probe.
In addition to profiling SHs and CPs, ABPP has emerged as a powerful strategy for rapid selectivity profiling of one of the most clinically relevant protein families, protein kinases (PKs). As originally described, an ABP for kinases (or more accurately the broader family of ATP-binding proteins) consisted of an acylphosphate ATP/ADP elaborated with a biotin affinity handle31. The probe was designed to engage the nucleotide-binding site, resulting in the subsequent ligation of the biotin tag to the nearby lysine residue conserved across kinases and many other ATP-binding proteins. This method formed a basis for a commercial platform for selectivity profiling for small molecule kinase inhibitors (KiNativ™). This platform has been broadly applied to profile both noncovalent and covalent kinase inhibitors.
Kinobeads
As described above, capture by the ABP is based on covalent modification of members in the addressable chemical space. An exception to this format is the Kinobead technology, consisting of noncovalent kinase inhibitors attached to a solid support, effectively an immobilized ABP32. In contrast to the ABPs described above, kinobeads do not covalently bind their kinase targets. However, binding affinity is sufficiently strong such that they function as ABPs to assess selectivity within the kinase target class for both reversible and covalent compounds. The general protocol includes treatment of live cells or cell extracts with a covalent inhibitor compound followed by co-incubation with kinobeads, providing a competitive binding assay. Proteins not bound by the test compounds are preferentially enriched on kinobeads. Subsequent isolation of beads, followed by elution, trypsin digest, and LC-MS/MS provides a quantitative readout of compound selectivity across the kinome.
Inhibitor-based probes (IBPs)
Inhibitors can be fashioned into enrichment probes through immobilization, elaboration with an affinity enrichment handle or bioorthogonal chemical moiety for subsequent coupling to an affinity handle. To the extent that these modifications only modestly alter selectivity, the binding behavior of the IBP will closely mimic that of the native inhibitor. In this way, competing parent inhibitor against the IBP for enrichment provides a high-fidelity readout of the pharmacologic binding activity of the native inhibitor.
In principle, repurposing a selective inhibitor as a reporter probe offers several advantages. The ACS of the IBP is dictated by the parent inhibitor and is expected to be modest in size [Fig. 4c, 5a]. Therefore, from an analytical perspective the LC-MS/MS analysis can be more straightforward. These promising aspects of IBPs are accompanied by specific challenges. Unlike the two previous categories where a generic probe can be used across different inhibitors, each native inhibitor is chemically modified to create a paired IBP. Chemical elaboration of the native compound can be challenging, depending on availability of structural data to determine a suitable exit vector along with validation assays to confirm that the IBP retains suitable binding activity. Finally, the size and complexity of IBPs often complicate MS/MS identification of probe-modified peptides.
IBPs can come as biotinylated inhibitor probes or alkynylated inhibitor probes, whose main difference lies in bioavailability. Due to the large size of the biotin tag and linker, probes fashioned from direct installation of biotin usually cannot cross mammalian cell membranes. In contrast, the minimal size of the alkyne handle often allows live-cell treatment. Owing to its small, bioinert chemical footprint, the alkyne handle is less likely to perturb on- and off-target binding compared to a biotin modification. Alkynylated inhibitor probes were first reported by Wright et al. in 2007, where an alkyne handle was appended onto cytochrome P450 inhibitor 2-ethylnaphthalene33. Since then, the aforementioned advantages have led to the widespread application of alkyne functionalization followed by click chemistry and pulldown in off-target profiling for covalent small molecules.
Here again note the importance of competition format assays for use of IBPs. Simply pulling down proteins using the derivatized probe alone cannot provide confident identification of inhibitor targets due to nonspecific binding to the pulldown matrix. Hundreds of proteins might be pulled down, but of those, only a small subset might be specifically competed by the parent compound. In our recent work to identify a potent and selective covalent inhibitor of the DUB USP7, our biotinylated IBP captured some 560 proteins via pulldown followed by LC-MS/MS; however, only USP7 was competitively bound in a dose-dependent manner across a concentration range of 1-10 μM for the parent inhibitor30.
Affinity chromatography
Although presented last in this sequence, affinity chromatography represents one of the earliest methods with which researchers investigated interactions between a small molecule and the cellular proteome. An affinity column is comprised of a crosslinked polymer or gel, such as agarose or sepharose, to which a specific inhibitor has been covalently attached. Cellular extract is then passed through the column, allowing protein targets to be covalently bound and retained by the immobilized inhibitor. After extensive rinsing to remove non-specifically bound components, in situ proteolytic digestion is performed to release peptides from the captured targets [Fig. 4d].
Classical drug affinity chromatography methods have been in use for many decades to purify and investigate the cellular targets of drugs. For example, penicillin-binding proteins (PBPs), bacterial targets of β-lactam antibiotics such as penicillin, have been isolated and purified using immobilized β-lactam matrices34. In terms of generating the immobilized inhibitor column, a variety of commercial activated resins enable the attachment of specific functional groups such as hydroxyl, sulfhydryl, amino or carboxyl. As noted above, the bioactivity of the elaborated or immobilized inhibitor should be independently verified as part of standard practice for affinity chromatography. Finally, immobilization of the covalent inhibitor means that the amino acid modified by the probe is permanently affixed on the column resin and not identified in the mass spectrometry analysis.
CITe-Id
We recently developed Covalent Inhibitor Target-site Identification (CITe-Id) as a powerful platform that leverages desthiobiotin (DTB) affinity-tagged IBPs for enrichment and direct quantification of inhibitor-bound cysteine-containing peptides35. CITe-Id analysis is driven exclusively by detection and quantification of IBP-modified peptides: competition against the parent or native inhibitor reveals the set of competitively bound targets. Selective identification of IBP-modified cysteine sites is enabled by predictable fragmentation behavior of modified peptides during MS/MS6. In addition, we incorporated tunable online 3-dimensional peptide fractionation to accommodate analysis of inhibitors spanning a wide range of selectivity. The strength of CITe-Id lies in the wealth of information it offers in a single platform: direct confirmation of covalent bond formation, specific site of inhibitor modification, and competitive dose response. In this way, CITe-Id can potentially accelerate downstream validation studies. While powerful, CITe-Id is subject to the standard limitations of inhibitor-derived chemoproteomic methods: CITe-Id requires synthesis and validation of a desthiobiotin analog for each profiled compound, which may complicate accessibility on the chemistry front.
We used CITe-Id to characterize the target landscape of the CDK7 inhibitor THZ1 and identified multiple previously unrecognized off-targets, including the understudied kinase, PKN335. This information led to the re-optimization of the parent series into a first-in-class covalent PKN3 inhibitor for this “dark” kinase. When used in combination with a single multitargeted acrylamide covalent kinase inhibitor, CITe-Id identified cysteine sites on fifteen kinases that were pharmacologically addressable, six of which had no previous report for covalent targetability.
Methods leveraging native inhibitors
As noted at the outset of this section, use of ABPs, RBPs, or IBPs is motived in large part by the need to circumvent the difficulties inherent to directly identifying peptides modified by covalent inhibitors (i.e., rare ‘synthetic PTMs’) in the vast complexity of the cellular proteome. While each of these strategies have strengths, one weakness they all share is that their binding selectivity differs to varying degrees compared to the native inhibitor. Notwithstanding the challenges inherent to the overriding complexity of the proteome, it is fair to ask whether other physicochemical properties of native inhibitors may be leveraged in chemoproteomic methods.
Several related techniques seek to leverage the phenomenon that small molecule binding often increases the cellular or physical stability of a protein target [Fig. 6a]. Complex proteomes, either live cells or cell extracts, are treated with an inhibitor and then subject to some destabilizing force, followed by quantitative LC-MS/MS. Those proteins detected with higher signal relative to the control (e.g., vehicle-treated) condition are considered to be bound by the test compound and stabilized. Specific destabilizing forces and methods include: heat in Thermal Proteome Profiling (TPP), oxidation in Stability of Proteins from Rates of Oxidation (SPROX), and pulse proteolysis in Drug Affinity Responsive Target Stability (DARTS)36-38.
Advantages of these approaches include their use of native inhibitors and potential proteome-wide scope. On the other hand, one of the key limitations is very high compound concentration required to impart appreciable protein target stabilization in the context of proteome-scale measurements. In the original reports, compound concentrations are often in the millimolar regime, risking compound aggregation and nonspecific activity. Another challenge results from the lack of enrichment. The ACS of these approaches is essentially the entire proteome, which leads to analytical challenges in terms of achieving reproducible and deep coverage of all potential on- and off-target proteins. Furthermore, in the specific case of covalent compounds, the assumption that small molecule binding leads to protein stabilization may be subject to many exceptions36. In addition, feedback mechanisms resulting from compound treatment may alter abundances of unrelated proteins in experiments that use live cell treatment. Thus, caution is advised when considering stability-based methods to characterize covalent inhibitors.
As an alternative to solution phase stability, one can ask whether covalent inhibitors alter the gas-phase fragmentation behavior of modified peptides during MS/MS in a way that may aid identification. Indeed, there is a rich history of using specific fragment ions as ‘signatures’ to confirm peptide sequence assignment or facilitate identification of PTMs. In the case of PTMs, researchers have leveraged targeted mass spectrometry schemes such as precursor ion scanning to improve the identification of modified peptides in complex proteomes. In precursor ion scanning, peptides of increasing m/z are sequentially selected for fragmentation in the mass spectrometer while the detector monitors only the common or ‘diagnostic’ fragment(s) generated by the modification of interest. By focusing the acquisition bandwidth on PTM-specific ions, precursor ion scanning provides selective detection, or ‘gas phase enrichment’, for modified peptides above the large background of unmodified tryptic peptides derived from complex proteomes.
By applying the same precursor ion scanning framework, we interrogated MS/MS spectra of various peptide-ligand covalent conjugates and showed that the intact inhibitor dissociated from the peptide but retained the cysteine sulfur, forming ‘thiolated’ inhibitor fragment ions. These ions were subject to further dissociation to generate a series of inhibitor-specific fragments. Importantly, several of these primary and secondary fragments are predictable based on the inhibitor structure, and the relative yield of inhibitor-specific fragment ions could be ‘tuned’ via adjustment of the MS/MS collision energy. Collectively, these observations suggest that inhibitor-specific fragments can, similarly to PTMs, be used to build targeted mass spectrometry acquisition methods to selectively identify peptides modified by native covalent inhibitors in complex proteomes [Fig. 6b]. In our initial report we provide proof-of-principle for this concept based on selective detection of ibrutinib-modified peptides in digests of cellular protein lysates6. To be sure, the techniques described in this section based on solution- or gas phase-enrichment of native probe modified targets are in an early stage of development. However, their potential for characterizing native covalent inhibitors directly, thus bypassing the use of binding-surrogates, motivates further investigation and optimization.
In conclusion, chemoprotemic methodology has rapidly advanced over the last two decades and now enables both discovery and development of new covalent inhibitors, as well as new targets. However, as chemoproteomics is still in active development, the field has not settled on a consistent methodology for determining thresholds for probe competition and inhibitor selectivity. As such, reported characterization of probe binding varies widely across studies. The metric for quantifying competition itself is straightforward: fold change in signal across DMSO and inhibitor-competed conditions. However, definitions for significance range from various statistical tests to basic fold-change thresholds14,24,25. There also seems to be little consensus in the field for the number of targets that constitutes a selective probe versus a promiscuous compound. Going forward, the rapid proliferation in number and scale of chemoproteomic studies will provide the foundational data necessary to establish a robust statistical framework for the aforementioned selectivity metrics.
Case Studies
In this section, we will discuss three case studies that focus on compounds that have progressed to the clinic. The first example describes the evolution of EGFR inhibitors across multiple generations. The second example involves the discovery, characterization, and development of KRASG12C inhibitors. The third focuses on the post-hoc characterization of the fatty acid amide hydrolase inhibitor BIA 10-2474 which failed in clinical trials after the death of a subject. In all cases, chemoproteomic strategies featured prominently in the drug discovery pipeline.
EGFR Inhibitors
EGFR is a member of the ErbB family kinases which regulate cellular proliferation and has been associated with a variety of cancers. At present, three generations of EFGR-targeting drugs have been clinically approved for various indications; chemoproteomics have been indispensable in driving the evolution of these important inhibitors.
First-generation EGFR-targeting drugs (e.g. gefitinib) consisted of reversible noncovalent binders to the ATP pocket on the tyrosine kinase domain and were plagued by the T790M EGFR resistant mutant which augmented affinity for ATP over the inhibitor [Fig. 7a]. In 1998, Fry et al. reasoned that nonconserved cysteines in the vicinity of the drug binding pocket (C751, C773, C797) could be targeted for covalent binding and enhance inhibitor binding.2 They synthesized analogs with electrophilic acrylamides appended onto the basic 4-anilinoquinazoline scaffold, in addition to a negative control analog with an inert ethylamide replacing the acrylamide. Using intact protein mass spectrometry, the authors demonstrated 1:1 covalent complex formation by incubating purified EGFR kinase domain with active analog PD168393. Tryptic digest of the resulting complex followed by LC-MS/MS identified covalent modification at C773 on EGFR. The authors further confirmed this proof-of-concept by use of the negative control ethylamide compound as well as the C773S mutant form of EGFR.
The covalent mode of action enables these inhibitors to overcome the enhanced ATP affinity conferred by the T790M mutation, leading to therapeutic benefit44. The second generation EGFR-targeting drug afatinib which targets C797 next to the pocket, was first approved for metastatic non-small cell lung cancer in 2013.
Beginning in the last decade, EGFR covalent inhibitors served as an early test case for emerging inhibitor-based chemoproteomic methods. In 2014, Lanning et al. sought to profile off-targets of covalent kinase inhibitors across the cellular cysteinome45. They appended an alkyne handle onto the afatinib analog PF-6274484 to yield an IBP and performed click chemistry-enabled competitive pulldowns to identify inhibitor targets. While numerous proteins were bound by the probe, only EGFR, ErbB2, BLK, and DUS2L were competed by the parent inhibitor. ErbB2 and BLK were known off-targets from in-vitro kinase panels, but DUS2L was nominated as a novel off-target of afatinib and similar EGFR covalent inhibitors.
After the approval of afatinib, third-generation covalent EGFR inhibitors were further developed for selectivity against pathogenic (L858R, exon19Del) and drug resistant (T790M) mutants over wild-type EGFR to minimize toxicity. Niessen et al. profiled the proteome-wide reactivity of third-generation EGFR inhibitors osimertinib, PF-06747775, and rociletinib using methodology similar to that described by Lanning et al44. Even though the three inhibitors were engineered for a similar EGFR inhibition profile, they displayed distinct target profiles. The three kinases EGFR, ErbB2, and TEC were the only targets shared by all three inhibitors, with two inhibitors having a handful of unique additional targets. In particular, osimertinib was found to strongly compete against probe labelling of multiple lysosomal cathepsin proteases, with potential consequences in immune dysregulation.
This series of studies on EGFR inhibitors show the co-evolution of MS chemoproteomic techniques and covalent inhibitor discovery. Fry et al. embodies a workflow for the development of covalent inhibitors from noncovalent starting points that has become a classic in the two decades since its publication. Targetable cysteine residues in the binding pocket are identified with structural insight, followed by rational design and synthesis of electrophile-bearing analogs. These analogs are then tested for covalent binding, target engagement, and therapeutic benefit. The emergence of ABP, IBP, and RBP-based chemoproteomic methods enables more comprehensive identification of off-targets as in Lanning et al. Enhanced mechanistic understanding then drives development of next-generation inhibitors, as in the case with beneficial covalency being retained for third-generation EGFR inhibitors. As demonstrated in Niessen et al., off-target profiles can vary dramatically across analogs, emphasizing the importance for chemoproteomic characterization in this inhibitor development cycle.
KRASG12C Inhibitors
KRASG12C driver mutations are found in numerous malignancies. Inhibitors targeting the mutated cysteine residue first emerged in 2013. Two related drug candidates, MRTX849 and AMG 510 are now in phase 2 clinical trials11,12. These late stage KRASG12C inhibitors provide an illustrative case study, as mass spectrometry techniques were used throughout the drug discovery pipeline, from hit discovery to lead optimization. AMG 510 was inspired by the first KRASG12C covalent inhibitor discovered by Shokat and colleagues11 [Fig. 7b]. The initial covalent starting point was identified by disulfide tethering, with subsequent optimization enabled by intact protein-ligand mass spectrometry as well as crystallography studies. Structural analysis revealed that the compound extends from a novel Switch-II pocket to react with cysteine 12, resulting from the G12C mutation, locking KRAS protein into an inactive GDP-bound state. After this initial discovery, Patricelli et al. then developed a targeted peptide LC/MS-MS assay to quantify cellular inhibitor target engagement of KRASG12C. Data from this assay was used to guide optimization for cell penetrance. These studies ultimately led to the compound ARS-853 which exhibited potent target engagement in live cells46. Next, competitive isoTOP-ABPP was used to identify two off-target cysteine thiols from a total of 2740 cysteine residues detected across 1584 proteins in two cell lines. Another group leveraged (1) a desthiobiotin-based global cysteine enrichment probe, and (2) alkynylated IBP to profile proteomic off-target cysteines47. Results generally agreed with that of Patricelli et al., and confirmed this scaffold to be selective for KRASG12C. In another independent report, Fell et al. discovered the tetrahydropyridopyrimidine scaffold on which MRTX849 is based by covalent fragment screening12.
This example illustrates the power of MS-based primary screening to discover chemical starting points for covalent binders that target previously unrecognized pockets in important targets. In this case, a single G to C substitution provided an opportunity to selectively target this KRAS mutant in the inactive state. This example also illustrates the versatility of chemoproteomic methods across hit identification, validation, and lead optimization during rounds of iterative synthesis, performing functions such as validation of 1:1 covalent modification, identification of the targeted cysteine residue, confirmation of in-cell target engagement, and profiling of off-targets throughout the proteome.
BIA 10-2474
BIA 10-2474 is a fatty acid amide hydrolase (FAAH) inhibitor designed to interfere with the endocannabinoid system48. BIA 10-2474 contains an electrophilic imidazole urea, which likely covalently modifies the catalytic serine residue on target FAAH. The drug was in development for a variety of indications ranging from obesity to cancer. In early 2016, Phase I studies were halted after acute neurological adverse events led to the death of one subject and hospitalization of two from a total of four participants. Since other FAAH inhibitors such as PF04457845 did not exhibit any safety liabilities, one hypothesis was that off-target binding led to unexpected toxicities. At the time of the fatality, very little information existed concerning the proteome-wide activity landscape of BIA 10-2474.
In 2017, van Esbroeck et. al set out to identify off-targets for BIA 10-2474, along with its redox metabolite BIA 10-2639, within the serine hydrolase superfamily49 [Fig. 7c]. The authors carried out mass spectrometry-based ABPP with a serine hydrolase-directed fluorophosphonate probe. SW620 cells were treated with BIA 10-2474, BIA 10-2639, or the nontoxic FAAH inhibitor PF04457845 across both a time course and concentration gradient, followed by co-incubation with the ABP and then further processing for mass spectrometry analysis. Proteins which displayed differential pulldown greater than a factor of two across BIA 10-2474 and negative control conditions were flagged for follow-up. Representative off-targets were overexpressed and validated by gel-based ABPP, and the authors found that both BIA 10-2474 and BIA 10-2639 cross-reacted with the active site serine residues on several lipid hydrolases that were not targeted by PF04457845. The study identified disruption of neuronal lipid metabolism through PNPLA6 inhibition as a potential toxicity-causing mechanism.
In parallel, Huang et. al sought to investigate potential off-targets outside the serine hydrolase target-class50. In this study, the authors focused on parent BIA 10-2474 and three other metabolites. Alkynylated analogs were synthesized for all three compounds, and equipotent inhibition of FAAH was confirmed. These probe analogs were used against the parent compound in competitive binding assays followed by protein pulldown. Subsequent LC-MS/MS analysis identified a group of cysteine-dependent aldehyde dehydrogenase (ALDH) enzymes as off-targets exclusively bound by two of the desmethyl metabolites. The authors then moved to isoTOP-ABPP with an iodoacetamide probe to validate binding to ALDH family members and further explore off-targets across the cysteinome. Of the approximately 1700 cysteine sites identified, catalytic cysteines on two out of three ALDH enzymes were competitively bound, validating them as likely off-targets. The third ALDH was not detected in the isoTOP-ABPP experiment. Finally, gel-based ABPP was used with both wild-type and mutant C319A ALDH2 to confirm covalent binding to the catalytic cysteine. Since ALDH2 has been reported to protect the brain from toxic aldehyde metabolites formed by reactive oxygen species, the authors nominated ALDH2 inhibition as another potential mechanism for toxicity.
These studies highlight the importance of leveraging complementary mass spectrometry chemoproteomic techniques. The two studies leveraged probes with distinct, non-overlapping ACS. Van Esbroeck et al.’s fluorophosphonate ABP focused the chemoproteomic analysis on the serine hydrolases, while Huang et al. used a combination of IBPs and cysteine-directed RBPs. Together, the two studies identified in-family and proteome-wide off-targets of BIA 10-2474 and metabolites, providing possible explanations for the neurotoxicity of the compound. This example is particularly striking given how both serine-directed and cysteine-directed covalent off-targets are identified for this one molecule and its metabolites.
Methodologically, these examples highlight important practices for experiments performed in complex proteomes. Enrichment probes were competed against a time and/or concentration gradient of the compound under investigation. Thus, off-targets identified can be further triaged by whether they display dose/time response. PF04457845, which did not display the toxic phenotype under investigation, was profiled side-by-side as a negative control. Gel-based experiments were utilized prior to mass spectrometry-based analysis to inform design of an alkynylated probe, and also afterwards to provide orthogonal confirmation of targets identified by chemoproteomics.
Conclusions
Mass spectrometry chemoproteomic techniques have coevolved with covalent drug discovery in the past twenty years. Since the early days of analyzing protein-inhibitor adducts in the 1990s, MS chemoproteomics methods have matured to comprise a versatile suite of techniques that can be leveraged in early stage drug discovery, in addition to their more common use in late-stage characterization. From purified protein to analysis of complex proteomes, from target identification to lead optimization, mass spectrometry-based methods can be used to: (1) identify covalently accessible residues; (2) identify protein targets and/or chemical matter of interest as a primary screen; (3) validate the stoichiometry of adduct formation; (4) identify site of covalent modification; and (5) profile off-targets within the enzyme class or across the broader proteome. The methods described in this review have supported the development of multiple successful drugs in the 2000s, primarily focused on kinase targets.
Going forward we anticipate that growing enthusiasm for covalent drugs and developments in chemoproteomic methods will go hand-in-hand. Innovation in instrumentation, acquisition methods, and data processing will open new doors for enhanced throughput, quantification, and depth of coverage in workflows. These advances, coupled with the inevitable growth in covalent inhibitor and electrophilic fragment libraries, will drive increased use of chemoproteomic platforms. Given the capability of mass spectrometry-based methods to quantitatively interrogate proteome-wide binding, we see significant opportunity to leverage chemoproteomic assays to accelerate characterization of early stage compounds in the drug discovery pipeline. In particular, significant potential lies in the adoption of proteome-wide methods for hit discovery. The near-term challenges in moving chemoproteomic methods earlier in the pipeline revolve around scalability, automation, and cost.
Key learning points.
Mass spectrometry analysis of purified protein-ligand conjugates can inform on the site(s) of covalent attachment, stoichiometry, and specificity of the potential covalent interaction.
Mass spectrometry is inherently biased against low-abundance species – this bias dictates the sensitivity/reliability of detection in complex mixtures, and also necessitates enrichment reagents to focus analytical bandwidth on species of interest.
Proteome-wide chemoproteomic methods use enrichment reagents to capture a defined set of cellular proteins, constituting a distinct “addressable chemical space” for each reagent. This dictates the parameters and the overall scope of the chemoproteomic experiment.
A broader proteome-wide scope is not necessarily better: it comes with analytical challenges that threaten reproducibility and sensitivity.
Mass spectrometry chemoproteomics is not confined to late-stage characterization of advanced compounds; it can also be leveraged to discover active compounds and novel druggable proteins.
Acknowledgements
The authors acknowledge generous support from Fujifilm and the Chleck Family Foundation (to W.C. Chan), The Mark Foundation for Cancer Research (to S.J.B. and J.A.M.), and the National Institutes of Health (CA233800 and CA247671 to S.J.B. and J.A.M.).
Footnotes
Conflicts of interest
J.A.M. serves on the SAB of 908 Devices and receives sponsored research funding from Vertex and AstraZeneca.
Notes and references
- 1.Singh J, Petter RC, Baillie TA and Whitty A, Nat. Rev. Drug Discov, 2011, 10, 307–317. [DOI] [PubMed] [Google Scholar]
- 2.Fry DW, Bridges AJ, Denny WA, Doherty A, Greis KD, Hicks JL, Hook KE, Keller PR, Leopold WR, Loo JA, McNamara DJ, Nelson JM, Sherwood V, Smaill JB, Trumpp-Kallmeyer S and Dobrusin EM, Proc. Natl. Acad. Sci. U. S. A, 1998, 95, 12022–12027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Garland M, Babin BM, Miyashita S-I, Loscher S, Shen Y, Dong M and Bogyo M, ACS Chem. Biol, 2019,14, 76–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bashore C, Jaishankar P, Skelton NJ, Fuhrmann J, Hearn BR, Liu PS, Renslo AR and Dueber EC, ACS Chem. Biol, 2020, 15, 1392–1400. [DOI] [PubMed] [Google Scholar]
- 5.Amico V, Foti S, Saletti R, Cambria A and Petrone G, Biomed. Environ. Mass Spectrom, 1988,16, 431–437. [DOI] [PubMed] [Google Scholar]
- 6.Ficarro SB, Browne CM, Card JD, Alexander WM, Zhang T, Park E, McNally R, Dhe-Paganon S, Seo H-S, Lamberto I, Eck MJ, Buhrlage SJ, Gray NS and Marto JA, Anal. Chem, 2016, 88, 12248–12254. [DOI] [PubMed] [Google Scholar]
- 7.Pettinger J, Le Bihan Y, Widya M, van Montfort RLM, Jones K and Cheeseman MD, Angew. Chem. Int. Ed Engl, 2017, 56, 3536–3540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ward JA, Pinto-Fernandez A, Cornelissen L, Bonham S, Díaz-Sáez L, Riant O, Huber KVM, Kessler BM, Feron O and Tate EW, J. Med. Chem, 2020, 63, 3756–3762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kooij R, Liu S, Sapmaz A, Xin B-T, Janssen GMC, van Veelen PA, Ovaa H, Dijke PT and Geurink PP, J. Am. Chem. Soc, 2020, 142, 16825–16841. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Erlanson DA, Braisted AC, Raphael DR, Randal M, Stroud RM, Gordon EM and Wells JA, Proc. Natl. Acad. Sci, 2000, 97, 9367–9372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ostrem JM, Peters U, Sos ML, Wells JA and Shokat KM, Nature, 2013, 503, 548–551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Fell JB, Fischer JP, Baer BR, Blake JF, Bouhana K, Briere DM, Brown KD, Burgess LE, Burns AC, Burkard MR, Chiang H, Chicarelli MJ, Cook AW, Gaudino JJ, Hallin J, Hanson L, Hartley DP, Hicken EJ, Hingorani GP, Hinklin RJ, Mejia MJ, Olson P, Otten JN, Rhodes SP, Rodriguez ME, Savechenkov P, Smith DJ, Sudhakar N, Sullivan FX, Tang TP, Vigers GP, Wollenberg L, Christensen JG and Marx MA, J. Med. Chem, 2020, 63, 6679–6693. [DOI] [PubMed] [Google Scholar]
- 13.Kathman SG, Xu Z and Statsyuk AV, J. Med. Chem, 2014, 57, 4969–4974. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Resnick E, Bradley A, Gan J, Douangamath A, Krojer T, Sethi R, Geurink PP, Aimon A, Amitai G, Bellini D, Bennett J, Fairhead M, Fedorov O, Gabizon R, Gan J, Guo J, Plotnikov A, Reznik N, Ruda GF, Díaz-Sáez L, Straub VM, Szommer T, Velupillai S, Zaidman D, Zhang Y, Coker AR, Dowson CG, Barr HM, Wang C, Huber KVM, Brennan PE, Ovaa H, von Delft F and London N, J. Am. Chem. Soc, 2019,141, 8951–8968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Campuzano IDG, San Miguel T, Rowe T, Onea D, Cee VJ, Arvedson T and McCarter JD,J. Biomol. Screen, 2016, 21,136–144. [DOI] [PubMed] [Google Scholar]
- 16.Sinclair I, Stearns R, Pringle S, Wingfield J, Datwani S, Hall E, Ghislain L, Majlof L and Bachman M,J. Lab. Autom, 2016, 21, 19–26. [DOI] [PubMed] [Google Scholar]
- 17.Ficarro SB, McCleland ML, Stukenberg PT, Burke DJ, Ross MM, Shabanowitz J, Hunt DF and White FM, Nat. Biotechnol, 2002, 20, 301–305. [DOI] [PubMed] [Google Scholar]
- 18.Bak DW, Gao J, Wang C and Weerapana E, Cell Chem. Biol, 2018, 25, 1157–1167.e4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ward CC, Kleinman JI and Nomura DK, ACS Chem. Biol, 2017, 12, 1478–1483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Shannon DA, Banerjee R, Webster ER, Bak DW, Wang C and Weerapana E, J. Am. Chem. Soc, 2014,136, 3330–3333. [DOI] [PubMed] [Google Scholar]
- 21.Hahm HS, Toroitich EK, Borne AL, Brulet JW, Libby AH, Yuan K, Ware TB, McCloud RL, Ciancone AM and Hsu K-L, Nat. Chem. Biol, 2020,16,150–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wang C, Weerapana E, Blewett MM and Cravatt BF, Nat. Methods, 2014, 11, 79–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Weerapana E, Wang C, Simon GM, Richter F, Khare S, Dillon MBD, Bachovchin DA, Mowen K, Baker D and Cravatt BF, Nature, 2010, 468, 790–795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Backus KM, Correia BE, Lum KM, Forli S, Horning BD, González-Páez GE, Chatterjee S, Lanning BR, Teijaro JR, Olson AJ, Wolan DW and Cravatt BF, Nature, 2016, 534, 570–574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hacker SM, Backus KM, Lazear MR, Forli S, Correia BE and Cravatt BF, Nat. Chem, 2017, 9, 1181–1190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Roberts AM, Miyamoto DK, Huffman TR, Bateman LA, Ives AN, Akopian D, Heslin MJ, Contreras CM, Rape M, Skibola CF and Nomura DK, ACS Chem. Biol, 2017, 12, 899–904. [DOI] [PubMed] [Google Scholar]
- 27.Greenbaum D, Medzihradszky KF, Burlingame A and Bogyo M, Chem Biol, 2000, 7, 569–81. [DOI] [PubMed] [Google Scholar]
- 28.Liu Y, Patricelli MP and Cravatt BF, Proc Natl Acad Sci U A, 1999, 96, 14694–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Long JZ, Li W, Booker L, Burston JJ, Kinsey SG, Schlosburg JE, Pavón FJ, Serrano AM, Selley DE, Parsons LH, Lichtman AH and Cravatt BF, Nat. Chem. Biol, 2009, 5, 37–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Schauer NJ, Liu X, Magin RS, Doherty LM, Chan WC, Ficarro SB, Hu W, Roberts RM, lacob RE, Stolte B, Giacomelli AO, Perera S, McKay K, Boswell SA, Weisberg EL, Ray A, Chauhan D, Dhe-Paganon S, Anderson KC, Griffin JD, Li J, Hahn WC, Sorger PK, Engen JR, Stegmaier K, Marto JA and Buhrlage SJ, Sci. Rep, 2020, 10, 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Patricelli MP, Szardenings AK, Liyanage M, Nomanbhoy TK, Wu M, Weissig H, Aban A, Chun D, Tanner S and Kozarich JW, Biochemistry, 2007, 46, 350–358. [DOI] [PubMed] [Google Scholar]
- 32.Bantscheff M, Eberhard D, Abraham Y, Bastuck S, Boesche M, Hobson S, Mathieson T, Perrin J, Raida M, Rau C, Reader V, Sweetman G, Bauer A, Bouwmeester T, Hopf C, Kruse U, Neubauer G, Ramsden N, Rick J, Kuster B and Drewes G, Nat. Biotechnol, 2007, 25, 1035–1044. [DOI] [PubMed] [Google Scholar]
- 33.Wright AT and Cravatt BF, Chem. Biol, 2007, 14, 1043–1051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Nakagawa J-I, Matsuzawa H and Matsuhashi M, J. Bacteriol, 1979, 138, 1029–1032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Browne CM, Jiang B, Ficarro SB, Doctor ZM, Johnson JL, Card JD, Sivakumaren SC, Alexander WM, Yaron TM, Murphy CJ, Kwiatkowski NP, Zhang T, Cantley LC, Gray NS and Marto JA, J. Am. Chem. Soc, 2019, 141, 191–203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Savitski MM, Reinhard FBM, Franken H, Werner T, Savitski MF, Eberhard D, Molina DM, Jafari R, Dovega RB, Klaeger S, Kuster B, Nordlund P, Bantscheff M and Drewes G, Science, 346, 1255784. DOI: 10.1126/science.1255784. [DOI] [PubMed] [Google Scholar]
- 37.Lomenick B, Hao R, Jonai N, Chin RM, Aghajan M, Warburton S, Wang J, Wu RP, Gomez F, Loo JA, Wohlschlegel JA, Vondriska TM, Pelletier J, Herschman HR, Clardy J, Clarke CF and Huang J, Proc. Natl. Acad. Sci, 2009, 106, 21984–21989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Strickland EC, Geer MA, Tran DT, Adhikari J, West GM, DeArmond PD, Xu Y and Fitzgerald MC, Nat. Protoc, 2013, 8, 148–161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Nguyen C, West GM and Geoghegan KF, in Cancer Gene Networks, eds. Kasid U and Clarke R, Springer, New York, NY, 2017, pp. 11–22. [Google Scholar]
- 40.Lomenick B, Olsen RW and Huang J, ACS Chem. Biol, 2011, 6, 34–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Liu P-F, Kihara D and Park C, J. Mol. Biol, 2011, 408, 147–162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.West GM, Tang L and Fitzgerald MC, Anal. Chem, 2008, 80, 4175–4185. [DOI] [PubMed] [Google Scholar]
- 43.Franken H, Mathieson T, Childs D, Sweetman GMA, Werner T, Tögel I, Doce C, Gade S, Bantscheff M, Drewes G, Reinhard FBM, Huber W and Savitski MM, Nat. Protoc, 2015, 10, 1567–1593. [DOI] [PubMed] [Google Scholar]
- 44.Niessen S, Dix MM, Barbas S, Potter ZE, Lu S, Brodsky O, Planken S, Behenna D, Almaden C, Gajiwala KS, Ryan K, Ferre R, Lazear MR, Hayward MM, Kath JC and Cravatt BF, Cell Chem. Biol, 2017, 24, 1388–1400.e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Lanning BR, Whitby LR, Dix MM, Douhan J, Gilbert AM, Hett EC, Johnson TO, Joslyn C, Kath JC, Niessen S, Roberts LR, Schnute ME, Wang C, Hulce JJ, Wei B, Whiteley LO, Hayward MM and Cravatt BF, Nat. Chem. Biol, 2014, 10, 760–767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Patricelli MP, Janes MR, Li L-S, Hansen R, Peters U, Kessler LV, Chen Y, Kucharski JM, Feng J, Ely T, Chen JH, Firdaus SJ, Babbar A, Ren P and Liu Y, Cancer Discov., 2016, 6, 316–329. [DOI] [PubMed] [Google Scholar]
- 47.Wijeratne A, Xiao J, Reutter C, Furness KW, Leon R, Zia-Ebrahimi M, Cavitt RN, Strelow JM, Van Horn RD, Peng S-B, Barda DA, Engler TA and Chalmers MJ, ACS Med. Chem. Lett, 2018, 9, 557–562. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Kerbrat A, Ferré J-C, Fillatre P, Ronzière T, Vannier S, Carsin-Nicol B, Lavoué S, Vérin M, Gauvrit J-Y, Le Tulzo Y and Edan G, N. Engl. J. Med, 2016, 375, 1717–1725. [DOI] [PubMed] [Google Scholar]
- 49.van Esbroeck ACM, Janssen APA, Cognetta AB, Ogasawara D, Shpak G, van der Kroeg M, Kantae V, Baggelaar MP, de Vrij FMS, Deng H, Allarà M, Fezza F, Lin Z, van der Wel T, Soethoudt M, Mock ED, den Dulk H, Baak IL, Florea BI, Hendriks G, Petrocellis LD, Overkleeft HS, Hankemeier T, Zeeuw CID, Marzo VD, Maccarrone M, Cravatt BF, Kushner SA and van der Stelt M, Science, 2017, 356, 1084–1087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Huang Z, Ogasawara D, Seneviratne UI, Cognetta AB, am Ende CW, Nason DM, Lapham K, Litchfield J, Johnson DS and Cravatt BF, ACS Chem. Biol, 2019, 14, 192–197. [DOI] [PMC free article] [PubMed] [Google Scholar]