Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Mar 21.
Published in final edited form as: Cell Chem Biol. 2023 Oct 9;31(3):534–549.e8. doi: 10.1016/j.chembiol.2023.09.009

An N terminomics toolbox combining 2-pyridinecarboxaldehyde (2PCA) probes and click chemistry for profiling protease specificity

Haley N Bridge 1, William Leiter 2, Clara L Frazier 1, Amy M Weeks 1,2,*
PMCID: PMC10960722  NIHMSID: NIHMS1934332  PMID: 37816350

Summary

Proteomic profiling of protease-generated N termini provides key insights into protease function and specificity. However, current technologies have sequence limitations or require specialized synthetic reagents for N-terminal peptide isolation. Here, we introduce an N terminomics toolbox that combines selective N-terminal biotinylation using 2-pyridinecarboxaldehyde (2PCA) reagents with chemically cleavable linkers to enable efficient enrichment of protein N termini. By incorporating a commercially available alkyne-modified 2PCA in combination with Cu(I)-catalyzed azide-alkyne cycloaddition (CuAAC), our strategy eliminates the need for chemical synthesis of N-terminal probes. Using these reagents, we developed PICS2 (Proteomic Identification of Cleavage Sites with 2PCA) to profile the specificity of subtilisin/kexin-type proprotein convertases (PCSKs). We also implemented CHOPPER (Chemical enrichment Of Protease substrates with Purchasable, Elutable Reagents) for global sequencing of apoptotic proteolytic cleavage sites. Based on their broad applicability and ease of implementation, PICS2 and CHOPPER are useful tools that will advance our understanding of protease biology.

Graphical Abstract

graphic file with name nihms-1934332-f0006.jpg

eTOC Blurb

Bridge et al. introduce an N terminomics strategy that combines 2-pyridinecarboxaldehyde (2PCA) probes, click chemistry, and cleavable linkers for enrichment of protein N termini. This technology enables protease sequence and substrate specificity profiling on a proteome scale using commercially available reagents.

Introduction

Proteolytic cleavage regulates the activity, localization, and lifetime of nearly all human proteins. Although proteases are encoded by ~2% of human genes1 and are an important class of drug targets2, the substrate specificity of many proteases remains unknown or incompletely understood. Proteolytic cleavage of proteins results in formation of new N and C termini. Chemoproteomics methods that combine selective isolation of protein N termini with tandem mass spectrometry (LC-MS/MS) are powerful tools for defining exact sites of proteolytic cleavage, enabling protease sequence specificity profiling3 and identification of cellular protease substrates48. However, current N terminomics methods are limited by sequence specificity, low efficiency, and/or challenges to their implementation that include the need to synthesize specialized reagents. New and complementary reagents are therefore needed to enable more comprehensive study of protease biology.

Positive-enrichment N terminomics methods enable isolation of the prime side peptide product (P’, C-terminal to the scissile bond) that results from protease cleavage and enable identification of the cleaved protein and cleavage site using tandem mass spectrometry (LC-MS/MS)4,7,9. A necessary feature of positive enrichment-based N terminomics workflows is selective biotinylation of the N terminus over other primary amines including Lys side chains3,4,7. Because of their high N-terminal specificity and their low sequence specificity, probes based on 2-pyridinecarboxaldehyde (2PCA) (1) are attractive as N terminomics tools7,10. The specificity of 2PCA reagents for the N terminus over Lys is enforced by the chemical mechanism of modification, which involves initial formation of an N-terminal imine followed by nucleophilic attack of the neighboring amide in the peptide backbone to form a stable cyclic imidazolidinone (Fig. 1A)10. Because Lys side chains lack a neighboring amide group positioned for cyclization, they do not form stable adducts with 2PCA. Biotin-2PCA has been previously applied to enrich substrates of dipeptidyl peptidases in a workflow termed CHOPS (Chemical enrichment Of Protease Sites), providing a foundation for unbiased identification of protease substrates in complex samples7. However, adoption of 2PCA reagents as protease substrate probes is limited by the need to synthesize the required biotinylated probe and by the absence of a cleavable linker for efficient elution of N-terminal peptides prior to LC-MS/MS. Additionally, the sequence specificity of 2PCA modification in complex samples has not been fully characterized.

Figure 1. Deep profiling of 2PCA specificity using proteome-derived peptide libraries.

Figure 1.

(A) 2-pyridinecarboxaldehyde selectively modifies the N terminus of peptides. (B) Workflow for 2PCA specificity profiling. Sequence logos for input libraries are shown in Fig. S1. (C) Open database search to determine the number of 2PCA modifications that occur per peptide. (D) Offset database search to determine the residue specificity of 2PCA modification. (E) Closed database search to determine the sequence specificity of 2PCA modification. Results for individual libraries are shown in Fig. S2 and MS/MS spectra for putative modified P2’ Pro peptides are shown in Fig. S3. (F) Modification efficiency of 2PCA by residue in the P1’ (top) or P2’ position (bottom). See also Fig. S4 and Fig. S5. (G) Impact of P1’-P2’ pairwise residue interactions on the efficiency of 2PCA modification efficiency. Heatmaps for other pairwise interactions are shown in Fig. S6.

Here, we report an expanded toolbox of N terminomics probes based on 2PCA reagents. We fully defined the sequence biases of 2PCA modification and optimized modification conditions using proteome-derived peptide libraries. We designed a biotin-2PCA with a cleavable linker to enhance the efficiency of peptide elution and incorporated a commercially available alkyne-modified 2PCA in combination with Cu(I)-catalyzed azide-alkyne cycloaddition (CuAAC) to enable implementation of 2PCA-based N terminomics with off-the-shelf reagents. We applied these probes to develop a positive-enrichment strategy to define protease sequence specificity in proteome-derived peptide libraries that we term Proteomic Identification of Cleavage Sites with 2PCA reagents (PICS2). We deployed PICS2 to profile the sequence specificity of the subtilisin/kexin-type proprotein convertases (PCSKs) Kex2, furin, and PCSK2, biologically important proteases whose sequence specificity cannot be accurately characterized with existing positive enrichment methods. We also introduce Chemical enrichment Of Protease substrates with Purchasable, Elutable Reagents (CHOPPER), a method for enrichment of protease substrates from cell lysates that uses commercially available, selectively elutable probes. We applied CHOPPER to profile proteolytic cleavage events in etoposide-induced apoptosis, leading to the identification of 112 previously unknown putative caspase cleavage sites in 95 proteins. Because they are easy to implement and broadly applicable, PICS2 and CHOPPER greatly expand the toolbox for proteome-wide study of protease cleavage sites to advance our understanding of proteolytic signaling pathways.

Design

An ideal N terminomics method would enable enrichment of all N-terminal sequences in an unbiased manner; would incorporate features that allow efficient recovery of enriched N termini; and would be based on easy-to-access, commercially available reagents. Existing positive enrichment N terminomics strategies, although powerful, have several limitations to their general applicability and implementation3,4,7. These methods rely on selective N-terminal biotinylation, which can be achieved by three main approaches. In the first, all primary amines, including both N termini and Lys side chains, are blocked3. After protease cleavage, the sample is treated with a biotinylated amine-reactive reagent, such as an NHS ester, to enable enrichment of proteolytic neo-N termini. Because Lys side chains are modified, this workflow precludes characterization of proteases that recognize Lys in their cleavage motifs. The second method takes advantage of the N-terminal selectivity of the peptide ligase subtiligase for N-terminal biotinylation using a peptide ester substrate4. However, subtiligase is not commercially available and the method requires synthesis of a non-standard peptide ester substrate. The third method uses N-terminally selective 2PCA reagents for biotinylation7. This method does not require modification of Lys side chains and is predicted to disfavor only sequences with Pro in the second position based on the mechanism of modification. However, 2PCA sequence specificity has not been extensively characterized, biotinylated 2PCA reagents must be chemically synthesized, and selectively elutable biotinylated 2PCA probes have not been developed.

Based on 2PCA’s inherent N-terminal specificity and proposed broad sequence compatibility, we sought to develop improved 2PCA-based N terminomics methods with well-characterized specificity that use commercially available and selectively elutable reagents. We hypothesized that non-biotinylated 2PCA could be applied for selective N-terminal blocking, precluding the need for Lys side chain modification in protease specificity profiling workflows. Protease-generated N termini could then be modified by a second biotinylated 2PCA reagent. Similarly, a commercially available 2PCA reagent could be combined with click chemistry to cleavable biotin azides for protease substrate enrichment. We envisioned deploying these methods for enrichment of protease substrates from both proteome-derived peptide libraries and from the cellular proteome.

Results

Proteome-derived peptide libraries for deep profiling of 2PCA specificity

Previous characterization of 2PCA specificity focused on varying the N-terminal amino acid of the sequence XADSWAG, where X is a variable amino acid10. These studies found that most peptides were modified with similar efficiencies except for those with X = G or X = P, which were modified to a lower extent. This study did not address the effect of positions beyond the N-terminal amino acid on 2PCA modification efficiency, including the second amino acid, whose backbone amide is proposed to participate in the reaction. To characterize 2PCA specificity in depth, we used a mass spectrometry-based assay inspired by proteomic identification of ligation sites (PILS), which was previously developed for comprehensive and quantitative characterization of peptide ligase N-terminal specificity (Fig. 1B)11. We generated three N-terminally diverse proteome-derived peptide libraries by digesting E. coli protein extracts with three different proteases with distinct P1 specificities: trypsin (P1 = K or R), GluC (P1 = E or D), and chymotrypsin (P1 = F, L, W, or Y). In combination, these libraries represent every possible amino acid in the first six N-terminal positions (P1’-P6’) and nearly all 400 possible amino acid combinations at P1’-P2’ (Fig. S1). We incubated each peptide library with 10 mM 2PCA at pH 7.5 for 4 h at 37°C and then analyzed the 2PCA-treated peptide libraries by LC-MS/MS to determine (1) the number of 2PCA modifications on each modified peptide; (2) the site specificity of the 2PCA modification; (3) the N-terminal sequence specificity of the 2PCA modification; (4) the positional cooperativity of the 2PCA modification; and (5) the efficiency of modification of each N-terminal residue.

The initial report of N-terminal modification by 2PCA suggested that some N termini might harbor more than one 2PCA modification10. To empirically determine the mass modification(s) attributable to 2PCA treatment, we used the open search workflow of the FragPipe proteomics pipeline12,13 for unbiased analysis of modifications across the 2PCA-treated peptide libraries (Fig. 1C). The open search workflow allows a wide precursor mass tolerance (−150 Da to 500 Da), enabling identification of modifications based on the empirical data without the need to specify them during database search. Using this method, we found that Δm = 89.0262 Da was the most abundant mass modification in all three peptide libraries. This finding supports the hypothesis that a single 2PCA modification (Δmcalc = 89.0265 Da) occurs on each modified peptide. No mass modification corresponding to double 2PCA modification was observed.

Although 2PCA possesses an aldehyde that may react reversibly with both N-terminal α amines and lysine side chains, stable 2PCA modification is proposed to be restricted to N termini based on the need for an adjacent amide bond to participate in cyclic imidazolidinone formation (Fig. 1A)10. To test this proposal in the context of a wide variety of N-terminal peptide sequences, we globally analyzed 2PCA selectivity for all amino acid side chains and N and C termini using the offset search strategy of the FragPipe proteomics pipeline (Fig. 1D)12,13. This approach allows the user to specify a mass modification without the need to restrict the residues on which the modification may occur. Offset search of the three 2PCA-modified peptide libraries with a mass modification of 89.0265 Da revealed that 2PCA exhibits absolute chemoselectivity for peptide N termini, with no modifications observed on lysine or any other side chain.

We next evaluated whether 2PCA exhibits N-terminal sequence specificity (Fig. 1E, Fig. S2). Based on the mechanism of cyclic imidazolidinone formation, 2PCA is unable to modify peptides with Pro in the second position as they lack the amide NH group required for cyclization10. Beyond this mechanism-based specificity, the efficiency of 2PCA modification of different peptide sequences has not been investigated in depth. To evaluate 2PCA sequence specificity empirically, we used a closed search strategy in which the 2PCA modification was restricted to the N terminus. In total, we identified 28,167 peptides across the tryptic, GluC, and chymotryptic peptide libraries, of which 14,735 (52%) were 2PCA-modified. We then compared the position-specific frequencies of each amino acid in the 2PCA-modified peptides and in the input peptide libraries (Fig. 1E). Consistent with the proposed mechanism for 2PCA modification, we observed a strong bias against 2PCA modification of peptides with Pro in the second position. Of 900 peptides with Pro in position 2 identified by database searching across the three libraries, only two (0.2%) were identified as 2PCA-modified (Fig. 1F). Each of these peptides had a relatively low cross-correlation (XCorr) score (0.88 and 1.06, respectively) and either zero or one fragment ion matches that are specific to the 2PCA-modified peptide (Fig. S3). Consistent with previous studies10, we also observed a bias against 2PCA modification of peptides with glycine in the first position (Fig. 1E), with 356 of 1,669 (21%) of peptides with glycine at position 1 bearing a 2PCA modification compared to 52% of peptides in the population (Fig. 1F). However, in contrast to previous work that showed that X = P inhibited 2PCA modification of the peptide XADSWAG, we observed no bias against modification of peptides with Pro in position 1. We also observed an apparent bias against labeling of peptides with Trp at position 1. However, based on results from 2PCA modification of a panel of 32 synthetic WXX peptides (67±2% modification on average), we conclude that the apparent low efficiency of N-terminal Trp modification arises from the low abundance of these peptides in proteome-derived peptide libraries rather than an inherent bias against 2PCA modification of Trp (Fig. 1F, Fig. S4, Fig. S5).

In enzymes including proteases and peptide ligases, substrate amino acid subsites often exhibit cooperativity, with the identity of the amino acid in one position of the substrate influencing the enzyme’s ability to act on substrates containing another amino acid in a different position11,14. However, the potential for such interactions in the context of 2PCA has not been considered. We therefore examined how 2PCA modification efficiency is influenced by positional cooperativity. (Fig. 1G, Fig. S6). We analyzed enrichment or de-enrichment of each pairwise sequence combination at positions 1 through 6 of modified peptides compared to the input peptide libraries. We found that the identity of the amino acid at position 2 has a strong influence on whether peptides with Gly at position 1 are efficiently modified (Fig. 1G). Although the total population of peptides with N-terminal Gly is modified with low efficiency (21%), for the subset of peptides that also have Gly in the second position, the efficiency of labeling is much higher. In total, 72 out of 111 peptides (64%) beginning with Gly-Gly were 2PCA modified. Similar but smaller effects were observed for Gly-Ala peptides (58 of 185 peptides modified, 31.4%), Gly-Lys peptides (27 of 85 peptides modified, 31.8%) and Gly-Arg peptides (27 of 74 peptides modified, 36.5%). We also observed that modification of peptides with any amino acid at position 1 is suppressed by the presence of proline at position 2. Consistent with the lack of sequence specificity observed beyond position 2 in position-specific analysis of 2PCA-modified peptides, we did not observe any pairwise effects beyond the second amino acid (Fig. S6).

Optimizing 2PCA modification of proteomic samples

Encouraged by the broad sequence compatibility and N-terminal selectivity of 2PCA, we set out to optimize the efficiency of 2PCA N-terminal modification for proteomics applications (Fig. 2A). We first examined the impact of time, temperature, and buffer concentration on 2PCA modification of a proteome-derived peptide library (Fig. 2BC). We performed the 2PCA modification reaction for either 4 h or 20 h over a range of temperatures between 37°C and 75°C with either 10 mM or 50 mM sodium phosphate, pH 7.5. No substantial effects of buffer concentration were observed. For the 4 h reactions, no substantial difference in labeling efficiency was observed over the range of temperatures (Fig. 2B). However, for the 20 h reaction time, the fraction of peptides modified began to decrease at 65°C and 75°C, a change which may be attributable to reversal of the 2PCA modification (Fig. 2C). Our results demonstrate that 2PCA labeling for 4 h at physiological temperature (37°C) produces a similar extent of modification to labeling at higher temperatures for longer times. Additionally, these experiments show that prolonged reaction times may be disadvantageous for maximizing the extent of 2PCA modification.

Figure 2. Optimization of 2PCA N-terminal modification.

Figure 2.

(A) Workflow for E. coli peptide library modification with variable reaction time, temperature, or 2PCA concentration. (B and C) Fraction of peptide libraries labeled with 2PCA at various times and temperatures. Samples that contained 10 mM phosphate buffer are shown in light colors and samples with 50 mM phosphate buffer are shown in dark colors. Squares represent GluC libraries, circles represent chymotrypsin libraries, and triangles represent trypsin libraries. (B) 2PCA labeling at 4 h. (C) 2PCA labeling at 20 h. (D) 2PCA concentration dependence of peptide library modification. (E) Residue-specific 2PCA concentration dependence of peptide library modification. Data for individual libraries are shown in Fig. S7. Concentration-dependent specificity data are shown in Fig. S8.

We next sought to optimize the concentration of 2PCA for N-terminal modification (Fig. 2D). We measured an EC50 of 0.3 ± 0.1 mM for 2PCA modification of peptide libraries. We also analyzed the concentration dependence of modification of peptides with a specific amino acid at position 1 or position 2 (Fig. 2E, Fig. S7). At position 1, we found that most N-terminal amino acids exhibit similar levels of modification at a given concentration of 2PCA. However, for peptides with Gly at their N termini, the fraction of peptides modified was lower than was observed for other N-terminal sequences at every 2PCA concentration. At position 2, peptides with any amino acid except for Pro and Gly showed similar levels of modification at a given 2PCA concentration. However, peptides with Pro at position 2 could not be modified at any concentration of 2PCA, while peptides with Gly at position 2 were modified somewhat more efficiently than other sequences. N-terminal specificity did not vary substantially across different concentrations of 2PCA (Fig. S8). Together, these experiments illustrate the parameters that affect the efficiency of 2PCA modification of complex samples. Our data show that maximizing the extent of 2PCA modification of a sample may require the use of high millimolar 2PCA concentrations. At the same time, much lower concentrations of 2PCA used at physiological temperature on the timescale of hours are likely to be suitable for applications like enrichment N terminomics, in which quantitative modification of N termini is not required.

Applying 2PCA reagents for ‘catch-and-release’ N terminomics

Biotin-2PCA (2) has been used previously for proteome-wide N-terminal modification, enrichment and identification of protease cleavage sites in a method termed CHemical enrichment Of Protease Substrates (CHOPS)(Fig. 3A)7. However, the high affinity between biotin and avidin limits efficient recovery of biotinylated peptides and biotin-2PCA is not commercially available and must be synthesized by the user. We sought to develop a chemically cleavable biotin-2PCA probe for one-step N-terminal biotinylation to enhance recovery of N-terminal peptides (Fig. 3A, B). We also sought to eliminate the need for probe synthesis by integrating a commercially available, modular alkyne-2PCA probe with clickable, selectively cleavable biotin reagents for enrichment proteomics (Fig. 3A, C).

Figure 3. One-step or modular biotinylation with 2PCA reagents for ‘catch-and-release’ enrichment proteomics.

Figure 3.

(A) Workflow for enrichment proteomics using biotinylated or clickable 2PCA reagents. Data showing the efficiency and specificity of alkyne-2PCA library modification are shown in Fig. S9. (B) One-step N-terminal biotinylation of proteome-derived peptide libraries with biotin-2PCA or biotin-SS-2PCA for enrichment. (C) Modular N-terminal biotinylation of proteome-derived peptide libraries using alkyne-2PCA followed by CuAAC with chemically cleavable biotin azides for enrichment. Data showing CuAAC efficiency are shown in Fig. S10.

We first designed a biotin-disulfide-2PCA (biotin-SS-2PCA, 3) probe to test whether use of a chemically cleavable linker would improve recovery of biotinylated peptides (Fig. 3B). We then treated proteome-derived peptide libraries with either biotin-2PCA or biotin-SS-2PCA, enriched the peptides on immobilized neutravidin, and eluted the peptides by either heating the resin to 70°C in 80% acetonitrile/0.1% formic acid for 10 min (for biotin-2PCA) or by treating the resin with 5 mM TCEP in 20 mM HEPES, pH 7.5 at room temperature for 1 h (for biotin-SS-2PCA) to cleave the disulfide linker. After analyzing the eluted peptides by LC-MS/MS, we found that although the fraction of recovered peptides that were 2PCA modified was similar across samples (84% for biotin-2PCA vs 83% for biotin-SS-2PCA), the number of 2PCA-modified peptides was more than twofold higher in the samples enriched with the chemically cleavable disulfide linker (5,359 peptides for biotin-2PCA vs. 11,927 for biotin-SS-2PCA). Chemically cleavable linkers can therefore be used to improve the efficiency of elution of biotinylated N-terminal peptides, as has been previously noted for other types of chemoproteomic probes1518.

We next sought to develop a catch-and-release 2PCA probe based on commercially available reagents (Fig. 3C). We envisioned a strategy in which 2PCA with an alkyne substituent would be used to introduce a handle for click chemistry with cleavable biotin azides. We tested the ability of 5-ethynylpicolinaldehyde (4, “alkyne-2PCA”) to modify N termini in proteome-derived peptide libraries. We found that alkyne-2PCA modified proteome-derived peptide libraries with similar efficiency and specificity to 2PCA (Fig. S9). We next tested whether alkyne-2PCA-modified peptides are substrates for copper (I)-catalyzed alkyne-azide cycloaddition (CuAAC)19 with biotin azide. We found that 95% of alkyne-2PCA peptides could be biotinylated in this manner (Fig. S10). Alkyne-2PCA modification followed by click chemistry with biotin azide therefore represents an efficient strategy to biotinylate peptide N termini using only commercially available reagents.

Numerous chemically cleavable biotin azides are commercially available. We compared biotin azide (5), biotin-SS-azide (6), biotin-Dde-azide (7), biotin-DADPS-azide (8), and biotin-Diazo-azide (9) to determine which produced the highest recovery of N-terminally biotinylated peptides (Fig. 3C). We performed click chemistry under identical conditions, enriched the biotinylated peptides, and selectively eluted the captured peptides under conditions specific to each linker (as described in STAR Methods). We found that all four chemically cleavable linkers outperformed biotin azide in terms of the number of recovered peptides that were identified in LC-MS/MS experiments. The largest number of peptides were identified in samples enriched with biotin-Dde-azide (7,950 peptides), followed by biotin-DADPS-azide (5,064 peptides), biotin-disulfide-azide (3,890 peptides), biotin-Diazo-azide (2,670 peptides), and biotin azide (2,241 peptides). Alkyne-2PCA modification combined with clickable, cleavable biotin azides is therefore an off-the-shelf system for N-terminal modification and enrichment.

Proteomic Identification of Cleavage Sites with 2PCA reagents (PICS2)

We next applied 2PCA-based catch-and-release reagents for protease sequence specificity profiling (Fig. 4). Based on their diversity and ease of preparation, proteome-derived peptide libraries are useful as pools of substrates for profiling protease specificity3. However, most existing methods for isolating proteolytic neo-N termini from proteome-derived peptide libraries do not provide a straightforward means of characterizing proteases that recognize Lys in their consensus cleavage sequences, including those involved in cancer2022, inflammation22, and viral infection2325. For example, the widely adopted PICS (Proteomic Identification of Cleavage Sites) method requires blocking of N termini and Lys side chains prior to protease digestion to enable biotinylation of proteolytic neo-N termini with an amine-reactive reagent3. We sought to leverage the N-terminally selective reaction of 2PCA reagents to circumvent these limitations. We envisioned a workflow in which N termini would be selectively blocked, leaving lysine ε-amines unblocked. Following treatment with a test protease of interest, proteolytic neo-N termini would be selectively modified with biotin-SS-2PCA for enrichment on neutravidin resin and identification using LC-MS/MS. We termed this strategy Proteomic Identification of protease Cleavage Sites with 2PCA reagents (PICS2) (Fig. 4A).

Figure 4. Proteomic identification of protease cleavage sites with 2PCA reagents (PICS2).

Figure 4.

(A) PICS2 workflow. IceLogos for human proteome-derived peptide libraries are shown in Fig. S11. Data showing the efficiency of N-terminal blocking with 2PCA are shown in Fig. S12. (B) Comparison of PICS and PICS2 libraries and expected trypsin cleavage specificity. (C) IceLogos for trypsin cleavage specificity determined by PICS (top) or PICS2 (bottom). (D) Heatmaps showing the percent amino acid occurrence in each of the peptide subsites P4-P4’ for trypsin PICS (left) and PICS2 (right). Results for individual libraries are shown in Fig. S13. (E) Heatmap showing the percent amino acid occurrence in each of the peptide subsites P4-P4’ for LysargiNase PICS2. Results for individual libraries are shown in Fig. S14. Additional PICS2 data for GluC and chymotrypsin is shown in Fig. S15 and Fig. S16. (F) Heatmap showing the percent amino acid occurrence in each of the peptide subsites P4-P4’ for Kex2 PICS2. Data showing the efficiency and specificity of N-terminal blocking by dimethylation is shown in Fig. S17. Results for individual libraries are shown in Fig. S18. Known Kex2 substrates are shown in Fig. S19. (G) IceLogo for Kex2 cleavage specificity determined by PICS2. (H) Subcellular localization of furin and PCSK2. (I) IceLogo for furin cleavage specificity determined by PICS2. Results for individual libraries are shown in Fig. S20. (J) IceLogo for PCSK2 cleavage specificity determined by PICS2. Results for individual libraries are shown in Fig. S21.

To test the PICS2 strategy, we generated N-terminally blocked proteome-derived peptide libraries by digesting E. coli or human lysates with either chymotrypsin (P1 = F, L, W, or Y) or GluC (P1 = D or E) (Fig. S1, Fig. S11) and treating them with 2PCA. LC-MS/MS analysis demonstrated that 89±0.2% of peptides in the human library and 81±5% of peptides in the E. coli library were N-terminally modified by 2PCA under these conditions (Fig. S12). We first compared the PICS2 workflow to PICS using trypsin, a well characterized protease that cleaves after Lys and Arg (Fig. 4BCD, Fig. S13)3. In total we identified 2,984 neo-N termini from the PICS workflow and 1,011 neo-N termini from the PICS2 workflow. Although both methods enabled enrichment of >1,000 tryptic neo-N termini, the tryptic peptides identified in the PICS experiment were limited to those with Arg at the P1 position based on the inability of trypsin to cleave after dimethylated lysine residues (Fig. 4B). In contrast, PICS2 enabled identification of neo-N termini derived from sequences containing both Lys and Arg in the P1 position, more accurately reflecting trypsin’s well characterized specificity (Fig. 4CD). We found that PICS2 was generalizable and also produced accurate specificity profiles of the commercial proteases LysargiNase26 (P1’ = K or R) (Fig. 4E, Fig. S14), GluC (P1 = D or E) (Fig. S15), and chymotrypsin (P1 = F, L, W, or Y) (Fig. S16).

We also tested a previously reported method for blocking peptide N termini prior to protease digestion, reductive dimethylation under acidic conditions27. We found that this strategy blocked N termini similarly to 2PCA (89%±9% vs 89%±0.2% for 2PCA) (Fig. S17), but also resulted in modification of Lys side chains on 14%±4% of peptides. This strategy is not ideal for biotinylating proteolytic neo-N termini for enrichment because it is expected to result in substantial Lys labeling, limiting characterization of specificity for proteases that recognize Lys. However, reductive dimethylation has the advantage that it requires only 10 min of reaction time while leaving most Lys residues unmodified. Therefore, combining N-terminally selective reductive dimethylation for N-terminal blocking and 2PCA-based biotinylation for N-terminal capture represents an alternative strategy for protease specificity profiling with proteome-derived peptide libraries.

Profiling the sequence specificity of proprotein convertases with PICS2

Subtilisin/kexin-type proprotein convertases (PCSKs) are an important class of eukaryotic serine proteases that are involved in numerous (patho)physiological processes28,29. PCSKs are expressed in the secretory pathway and cleave protein and peptide precursors, including those of peptide hormones, enzymes, and cell surface receptors, to generate their mature, active forms. In humans, there are nine PCSK family members: PCSK1, PCSK2, furin, PCSK4, PCSK5, PCSK6 (also known as PACE4), PCSK7, SKI-1, and PCSK9. The first seven of these cleave proprotein substrates on the C-terminal side of single or paired basic amino acid residues, while SKI-1 cleaves following non-basic residues and PCSK9 cleaves itself following a Gln residue and is its own sole proteolytic substrate28. Based on their roles in human diseases, including hypercholesterolemia, cancer, diabetes, osteoarthritis, and viral and bacterial infection, PCSKs represent attractive drug targets28,29. However, because of their crucial physiological functions and overlapping substrate specificity, development of small-molecule inhibitors of individual PCSKs remains challenging. A detailed view of PCSK sequence specificity could reveal differences in molecular recognition between PCSK family members that could inform inhibitor design. We therefore sought to apply PICS2 for PCSK sequence specificity profiling.

We initially examined S. cerevisiae Kex2, a well-studied PCSK that typifies the family (Fig. 4FG, Fig. S18). Kex2 has two well-characterized physiological substrates: the yeast mating pheromone pro-α-factor and the secreted pore-forming protein K1 killer toxin3032. Pro-α-factor is cleaved at Lys-Arg85, Lys-Arg104, Lys-Arg125, and Lys-Arg146, while K1 killer toxin is cleaved at Pro-Arg44, Arg-Arg149, Lys-Arg188, and Lys-Arg233. Although active Kex2 is required for all four killer toxin cleavage events, it has remained unclear whether Kex2 or a second Kex2-dependent protease cleaves at the Pro-Arg site32. PICS2 analysis of Kex2 revealed that Kex2 has a stringent preference for Arg at P1, but accepts Lys, Arg, or Pro at P2 (Fig. 4FG), supporting the hypothesis that Kex2 directly cleaves the Pro-Arg site in K1 killer toxin. PICS2 analysis also revealed that Kex2 prefers acidic residues on the prime side of the cleavage site, particularly at the P1’, P3’, and P4’ positions. This result is consistent with the prime side sequences of the cleavage sites in Pro-α-factor (P1’-EAEA-P4’ or P1’-EADA-P4’) and K1 killer toxin (P1’-EAPW-P4’, P1’-DIST-P4’, P1’-SDTA-P4’, and P1’-YVYP-P4’) (Fig. S19).

We next turned our attention to the human PCSKs. Human PCSK substrate specificity is largely determined by subcellular localization. While PCSK1 and PCSK2 are restricted to acidic regulated secretory granules28,33, furin and the other PCSKs are localized to the trans-Golgi network (TGN), cell surface, extracellular matrix, and/or endosomes (Fig. 4H)28,34. We used human proteome-derived peptide libraries (Fig. S11) to profile the specificity of PCSK2, a secretory granule-localized proprotein convertase, and furin, which resides in the TGN, at the cell surface, and in endosomes. We found that furin has a stringent preference for Arg at P1, but will accept Lys, Arg, or Tyr at P2 (Fig. 4I, Fig. S20). At P4, furin prefers Arg or Gly. In contrast, PCSK2 accepts both Lys and Arg at P1, but strongly prefers Lys at P2 (Fig. 4J, Fig. S21). At P4, PCSK2 prefers Arg, Ile, or Val. These results reveal that despite their generally similar substrate sequence specificity, furin and PCSK2 each have unique sequence recognition capacities that could be exploited for selective inhibitor design.

Mapping the apoptotic proteome with 2PCA catch-and-release reagents

Global sequencing of cellular N termini using mass spectrometry-based proteomics, or ‘N terminomics’, can provide key insights into how proteins are modified by proteolytic cleavage to alter their biological functions4,5,7,35. Positive enrichment methods for N terminomics rely on specific modification of protein N termini with an affinity handle to directly isolate them from complex samples such as cell lysates4,7, while other methods depend on depletion of internal peptides following trypsin digestion5,36. Existing methods for positive enrichment of protein N termini include subtiligase N terminomics4,11,37,38, which employs the designed peptide ligase subtiligase for selective N-terminal biotinylation, and CHOPS7, which uses biotin-2PCA for N-terminal modification. Subtiligase N terminomics has the advantage that it uses a catch-and-release strategy in which biotinylated N-terminal peptides can be selectively eluted from immobilized avidin using a TEV protease cleavage site built into the biotinylated subtiligase substrates4. However, the method is limited by sequence specificity inherent to subtiligase11 and the need to synthesize a specialized biotinylated peptide ester probe39. In contrast, CHOPS has little sequence specificity, but existing workflows require chemical synthesis of the biotin-2PCA reagent and do not offer an approach for N-terminal catch-and-release7.

We applied commercially available 2PCA reagents to implement a catch-and-release N terminomics strategy that simultaneously overcomes limitations of both subtiligase N terminomics and CHOPS. We refer to this approach as CHOPPER (Chemical enrichment Of Protease substrates with Purchasable, Elutable Reagents) (Fig. 5). We sought to apply CHOPPER to map proteolysis during apoptosis, a programmed cell death pathway executed by the caspases, a family of cysteine proteases that cleave C-terminal to aspartate residues (P1 = D)40,41. During apoptosis, specific caspase cleavages in hundreds of proteins across the proteome lead to changes in protein function that program membrane blebbing, cytoskeletal rearrangements, chromatin condensation, DNA fragmentation, and ultimately, destruction of the dying cell42. Apoptosis has been studied extensively with N terminomics, providing an opportunity to benchmark our approach. However, the limitations of existing N terminomics methods also offer an opportunity for biological discovery with CHOPPER.

Figure 5. Chemical enrichment of protease substrates with purchasable, elutable reagents (CHOPPER).

Figure 5.

(A) Analysis of apoptotic proteolysis with CHOPPER. Images of etoposide- and DMSO-treated cells are shown in Fig. S22. (B) Frequency of each amino acid in the P1’ position for etoposide-treated and DMSO-treated cells. Label-free quantification of peptides is show in Fig. S23. (C) Frequency of each amino acid at the inferred P1 position for etoposide-treated and DMSO-treated cells. (D) Frequency of each amino acid at the P1’ position of unacetylated methionine aminopeptidase substrates. (E) Caspase proteolysis is characterized by P1 = D. (F) Heatmap showing positional enrichment or de-enrichment of each amino acid among N termini enriched from etoposide-treated cells vs. DMSO-treated cells. (G) STRING analysis of putative caspase cleavages identified by CHOPPER and subtiligase N terminomics. Light grey circles represent proteins that were previously known to be caspase substrates, dark grey circles represent proteins that were previously known to be caspase substrates and were also identified by CHOPPER, blue circles represent proteins in which caspase cleavage sites were previously known but a new cleavage site was identified by CHOPPER, and magenta circles represent proteins that were not previously known to be caspase substrates in etoposide-treated Jurkat cells. Full STRING analysis is shown in Fig. S24. A comparison between CHOPPER and subtiligase N terminomics is shown in Fig. S25. A comparison between CHOPPER and HUNTER is shown in Fig. S26.

To test the CHOPPER strategy, we generated lysates from Jurkat cells treated with the apoptosis inducer etoposide or treated with a vehicle control (Fig. 5A). After 8 h, etoposide-treated cells exhibited extensive blebbing characteristic of apoptosis and were harvested for analysis (Fig. S22). Lysates were treated with 1 mM alkyne-2PCA and then subjected to click chemistry with biotin-disulfide-azide. Proteins with biotinylated N termini were enriched on neutravidin resin, digested with trypsin, selectively eluted by reduction of the disulfide bond, and analyzed by LC-MS/MS. In total, we identified 1587 unique N-terminal peptides in the apoptotic samples and 654 unique N-terminal peptides in the vehicle control samples. No unexpected biases at the 2PCA-modified P1’ position were observed, reflecting the broad sequence compatibility of 2PCA reagents (Fig. 5B). However, consistent with induction of caspase activity, we observed a substantial increase in the fraction of peptides with P1 = D in etoposide-treated samples (Fig. 5C).

To benchmark CHOPPER, we first analyzed substrates of the methionine aminopeptidases (MetAPs), whose activity is not known to be affected by apoptosis, in both apoptotic and nonapoptotic samples (Fig. 5D). In human cells, two MetAPs, MetAP1 and MetAP2, are involved in co-translational cleavage of Met1 from proteins. Previous biochemical studies have shown that human MetAPs favor Ala, Cys, Gly, Pro, Val, and Ser at the P1’ position43. Following Met excision, substrates with N-terminal Ala, Cys, Gly, Val, or Ser can be further modified by the N-acetyltransferase (NAT) NatA, resulting in a blocked N terminus inaccessible to 2PCA modification. Six other human NATs modify other N-terminal sequences, and it is estimated that >85% of cytosolic proteins are N-terminally acetylated. However, no human NAT is known that acetylates proteins with N-terminal Pro44. Across our apoptotic and non-apoptotic samples, we identified a total of 73 putative MetAP substrates based on the presence of the 2PCA tag at residue 2 (Fig. 5D). We did not observe a substantial difference in the distribution of P1’ residues captured in apoptotic versus non-apoptotic samples. Interestingly, >50% of putative MetAP substrates in all samples had P1’ = P. While this distribution is likely consistent with the in vivo balance of MetAP activity and NAT activity44, this result differs from a previous examination of MetAP susbtrates using subtiligase N terminomics in which 0-2% of identified MetAP substrates had P1’ = P11. This difference likely arises from subtiligase substrate specificity, which is strongly biased against P1’ = P, highlighting the complementary utility of CHOPPER.

We next analyzed putative caspase substrates with P1 = D in our datasets (Fig. 5E). While nonapoptotic samples had 21/654 (3%) P1 = D peptides, 229/1587 (14%) N-terminal peptides derived from etoposide-treated cells had P1 = D, a clear enrichment consistent with induction of caspase activity (Fig. 5C,F). These 229 peptides correspond to 223 unique P1 = D N termini. To evaluate whether individual P1 = D N termini were upregulated in response to etoposide treatment, we performed label-free quantification on our LC-MS/MS data (Fig. S23). We found that 69 P1 = D N termini were identified in the etoposide-treated samples but had zero abundance in DMSO-treated control samples. The remaining P1 = D peptides could be quantified in both samples and had a mean log2 (etoposide/DMSO) ratio of 3.2 and a median ratio of 3.4. A subset of 35 P1 = D N termini did not have substantial increases in abundance (log2 (etoposide/DMSO) < 1) in etoposide-treated samples. This result raises two hypotheses. The first is that these N termini result from proteolytic processes not related to apoptosis. The second is that these N termini are associated with apoptosis, but that the resulting proteolytic fragment is degraded following caspase cleavage, as has been demonstrated for certain caspase substrates. We also observed 490 non-P1 = D peptides with log2 (etoposide/DMSO) > 1. These peptides may arise from apoptosis-induced protein turnover; cleavage by other proteases activated during apoptosis; or cleavage of caspase-cleaved products by other proteases, such as aminopeptidases. No sequence motif is apparent in these etoposide-induced N termini (Fig. S23), making it difficult to assign their origin.

We compared our datasets to two previously published subtiligase N terminomics datasets that examined etoposide-treated Jurkat cells: one published in Mahrus et al.4 and deposited in the Degrabase (298 unique P1 = D sites)45 and one published in Weeks and Wells, 2018 (1,152 P1 = D sites)11. We found that 111/223 cleavage events in 95 proteins had not been previously reported, including cleavages in 44 proteins that were not previously identified as substrates of etoposide-induced caspase activity in Jurkat cells (Table S1). STRING analysis of cleavages newly identified by CHOPPER and previously reported etoposide-induced caspase cleavages demonstrated that the new cleavage events largely affect the same pathways that are known to be targeted by caspases during apoptosis (Fig. 5G, Fig. S24). These include RNA splicing, DNA repair, transcription, translation, and intracellular transport, among others4. Many of the newly identified cleavages occur between domain boundaries of multidomain proteins (Fig. S24). Many of these substrate proteins contain domains that modulate protein-protein interactions (e.g., SH3, PDZ, BRCT, FHA, and ADD domains), protein-DNA interactions (e.g., Myb-like and bHLH domains), or protein-RNA interactions (e.g., KH and RRM domains)46, suggesting that apoptotic proteolysis leads to disruption or reprogramming of the spatial organization of macromolecular complexes. This observation is consistent with numerous previous studies of specific caspase cleavages4,4750. We attribute the overall larger number of substrates identified with subtiligase N terminomics to the longer etoposide treatment that was used (12 h for subtiligase experiments versus 8 h for CHOPPER experiments).

To understand the features of the CHOPPER workflow that enabled identification of additional caspase cleavage sites, we examined the distribution of amino acids at the P1’ and P2’ positions of P1 = D peptides (Fig. S25). Compared to subtiligase or a cocktail of subtiligase specificity mutants11, CHOPPER performed better on peptides with branched-chain amino acids or Pro at the P1’ position. This difference is likely based on the substrate specificity of subtiligase, for which peptides with branched-chain amino acids or Pro in the first position are poor substrates. CHOPPER also identified a higher fraction of peptides with Lys or Arg at P1’. While subtiligase can efficiently modify substrates with Lys or Arg at P1’, subsequent trypsin digestion is expected to cleave the subtiligase biotin modification from the peptide. We hypothesize that in the CHOPPER workflow, cyclization of the N-terminal Lys or Arg inhibits cleavage by trypsin, enabling enrichment of P1’ = Lys or Arg peptides. In contrast, subtiligase performs better for substrates with P1’ = Gly, consistent with the low modification efficiency for P1’ = Gly observed for 2PCA. CHOPPER outperforms subtiligase for peptides with Gly or Ser at P2’, while subtiligase performs better for peptides with branched-chain amino acids at P2’. These results are consistent with subtiligase’s preference for aromatic or large hydrophobic amino acids at P2’. CHOPPER also captures a larger percentage of P2’ = Lys or Arg substrates, suggesting that cyclization of the N-terminal amino acid may also inhibit trypsin cleavage of nearby Lys and Arg.

We next sought to compare CHOPPER to N terminomics strategies based on depletion of internal tryptic peptides. A widely adopted class of N terminomics methods involve modification of protein N termini and Lys side chains by reductive dimethylation or acetylation followed by tryptic digestion and modification of tryptic N termini with distinct chemical functional groups, such as hydrophobic groups, that enable their removal from the sample5,36,51. We first compared our CHOPPER dataset to a previously published HYdrophobic Tagging-Assisted N termini Enrichment (HYTANE) dataset that examined the N terminome of Jurkat cells treated with etoposide for 12 h51. The dataset contained 312 unique P1 = D neo-N termini. We found that 43 of 223 putative caspase cleavage sites observed in the CHOPPER dataset were also observed by HYTANE, a somewhat surprisingly low amount of overlap. To eliminate potential differences arising from the longer etoposide treatment used in the HYTANE experiment, we generated our own dataset using a next-generation hydrophobic tagging method, High-efficiency Undecanal-based N Termini EnRichment (HUNTER)36, to analyze Jurkat cells treated under identical conditions to those used in the CHOPPER experiment (Fig. S26). In total, we identified 227 P1 = D N termini in the HUNTER experiment compared to 223 P1 = D N termini in the CHOPPER experiment. However, only 44 P1 = D N termini were identified with both HUNTER and CHOPPER (Fig. S26). We attribute the low overlap between these methods to differences in the HUNTER and CHOPPER workflows. These include the Arg-C-like specificity exhibited by trypsin in the context of reductively dimethylated samples, which increases mean peptide length, and the apparent higher sensitivity of CHOPPER for low-abundance N termini (Fig. S26)52. Together, these results suggest that positive and negative N-terminal isolation strategies are highly complementary and can be combined to more comprehensively define the cellular N terminome.

Discussion

Site-specific chemoproteomic probes are valuable tools for analyzing the properties of proteins, including their activity, localization, and post-translational modification state. Probes exist to modify at least nine amino acids side chains and protein N and C termini with varying degrees of efficiency and specificity13. The utility of any probe depends on at least three factors: 1) predictable site and sequence specificity; 2) optimized modification conditions for proteomics applications; and 3) their accessibility to the research community. The expanded 2PCA-based toolbox that we report here addresses these three features through deep profiling of 2PCA specificity, optimization of reaction conditions for modification of proteomic samples, and incorporation of commercially available reagents that remove the need for chemical synthesis of the probes.

The use of proteome-derived peptide libraries for deep profiling of 2PCA specificity overcomes the challenges and limitations associated with other specificity profiling methods. Small synthetic peptide libraries cannot capture how amino acid identity at one position influences modification efficiency at another. Peptide libraries displayed on yeast53,54 or phage5557 provide high sequence diversity but require a different set of conditions and assays that may not be directly relevant to proteomic samples and do not provide single amino acid resolution. Proteome-derived peptide libraries strike a balance, providing a high level of diversity that can be measured with the same LC-MS/MS assays and modified under the same conditions as proteomic samples3,11. Importantly, the results of proteome-derived peptide library specificity experiments directly translated to the ability of 2PCA to modify protein N termini in cell lysates. For example, the ability of 2PCA to modify P1’ Pro sequences enabled us to identify a larger number of human methionine aminopeptidase substrates than had been identified in previous enrichment N terminomics experiments. Additionally, 2PCA specificity profiling in proteome-derived peptide libraries explains why 2PCA probes can capture N-terminal sequences that were not identified by other methods. Finally, by profiling 2PCA specificity using proteome-derived peptide libraries, we confirmed that 2PCA is unable to modify substrates with P2’ Pro regardless of the identity of amino acids in other positions.

Optimization of 2PCA modification using proteome-derived peptide libraries enabled us to identify conditions for selective, near-complete blocking of peptide N termini and conditions that balance a high level of N-terminal modification with use of low millimolar concentrations of biotinylated or alkyne-modified 2PCA reagent. Both insights were important for implementation of PICS2, a protease specificity profiling method that is compatible with proteases containing Lys or Arg in their cleavage sequences. High concentrations of 2PCA produced the N-terminally blocked proteome-derived peptide libraries that are needed for PICS2, while low millimolar concentrations of 2PCA reagents were sufficient for N-terminal alkyne modification and subsequent click chemistry with biotin and capture on neutravidin resin while avoiding competition from excess reagent. Similarly, low millimolar alkyne-2PCA concentrations were used for CHOPPER profiling of apoptotic protease substrates. The incorporation of a chemically cleavable linker between 2PCA and biotin enhances the efficiency of N-terminal peptide elution, providing more comprehensive substrate and specificity profiling and lowering sample requirements.

Incorporation of alkyne-2PCA as a modular means of N-terminal biotinylation improves accessibility of both PICS2 and CHOPPER to protease researchers. Probes in which 2PCA and biotin are covalently linked must be chemically synthesized and purified before use. Although 6-(1-piperazinylmethyl)-2-pyridinecarboxaldehyde10 recently became commercially available, reducing the synthesis to one step, synthesis and purification nonetheless create a barrier to the use of 2PCA reagent for N terminomics applications. In contrast, alkyne-2PCA and a variety of chemically cleavable biotin azides are commercially available, removing the need for chemical synthesis and purification and making the tools accessible to researchers without the specialized equipment and expertise needed for chemical synthesis. Although beyond the scope of this study, alkyne-2PCA is also likely to be useful in N-terminal bioconjugation to proteins, where it can provide a modular means of attaching any azide-modified payload to the protein N terminus.

The PICS2 and CHOPPER workflows are widely applicable for studying the substrates and specificity of a broad array of proteases. These workflows have well defined specificity with few limitations and are easy to implement using commercially available reagents. We anticipate that the expanded 2PCA-based N terminomics toolbox can be widely adopted based on its N-terminal selectivity, broad sequence compatibility, optimized reaction conditions, and ease of use, and will propel discovery in broad areas of protease research.

Limitations of the study

Assessment of alkyne-2PCA as a probe for cellular protease substrate identification was limited to Jurkat cells. Future work should assess the applicability of the probe in other cell lines and sample types. The amount of input protein required to obtain meaningful data from the PICS2 and CHOPPER workflows has not been assessed systematically. To determine the applicability of these workflows in sample-limited settings, it will be essential to define the amount of input required and to optimize the workflows for small sample amounts. Our CHOPPER experiments used trypsin as the digest protease. Application of alternative proteases is likely to expand N terminome coverage, but has not been evaluated. Although we performed label-free quantification on our etoposide-treated and control Jurkat cell dataset, we have not combined CHOPPER with isobaric tagging-based quantitative proteomics methods that would enable sample multiplexing. Development of strategies that integrate isobaric tagging with PICS2 and CHOPPER is likely to expand the utility of these N terminomics methods for studies that require a large number of samples.

STAR Methods

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Amy Weeks (amweeks@wisc.edu).

Materials availability

This study did not generate new unique reagents.

Data and code availability

  • Mass spectrometry data have been deposited at ProteomeXchange and are publicly available as of the date of publication. Accession numbers are listed in the key resources table.

  • All original code has been deposited at Zenodo and is publicly available as of the date of publication. The DOI is listed in the key resources table.

  • Data from Mahrus, et al., 20184 and Crawford, et al. 201345 is available at https://https://wellslab.ucsf.edu/degrabase/.

  • Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

Experimental Model and Study Participant Details

Microbes

E. coli XL10-Gold (Agilent) were cultured in LB medium at 37°C with shaking at 200 rpm in baffled flasks.

Cell lines

HEK293T cells (ATCC; human; female) were grown in high glucose DMEM supplemented with 10% fetal bovine serum (FBS) at 37°C in a 5% CO2 incubator. Jurkat cells (ATCC; human; male) were grown in RPMI-1640 supplemented with 10% FBS at 37°C in a 5% CO2 incubator. Cell lines were purchased from ATCC and were not authenticated.

Method details

Key chemicals and materials

2-Pyridinecarboxaldehyde (1) and 6-(1-piperazinylmethyl)-2-pyridinecarboxaldehyde bistosylate salt were purchased from Sigma-Aldrich. 5-Ethynylpicolinaldehyde (“alkyne-2PCA”) (4) was purchased from Ambeed. NHS-biotin, NHS-SS-biotin, SOLA HRP SPE cartridges, Pierce Snap-Cap spin columns, and High-Capacity Neutravidin Agarose were purchased from Thermo Fisher Scientific. Biotin azide (5) was purchased from Cayman Chemical. Azide-SS-biotin (6) was purchased from Broadpharm. Biotin-Diazo-azide (9), biotin-Dde-azide (7), and biotin-DADPS-azide (8) were purchased from Click Chemistry Tools. Full chemical structures of key reagents are shown in Fig. S27.

E. coli proteome-derived peptide libraries

LB media (1 L) was inoculated with a single colony of E. coli XL10. Cells were grown overnight at 37°C with shaking at 200 rpm. Cells were harvested by centrifugation at 4,000 × g for 10 min at 4°C. The cell pellet was resuspended in 50 mL of lysis buffer (10 mM HEPES, 1 mM PMSF, 0.5 mM EDTA) and cells were lysed by three passes through a homogenizer at 15,000 psi. Cell debris was pelleted by centrifugation at 10,000 × g for 20 min at 4°C. The supernatant was adjusted to 100 mM HEPES, pH 7.5. DTT was added to a final concentration of 5 mM and the lysate was incubated for 1 h at room temperature. Iodoacetamide was then added to a final concentration of 10 mM and the sample was incubated in the dark for 1 h at room temperature. Protein was precipitated by addition of trichloroacetic acid (TCA) to 15% (w/v) followed by a 2-6 h incubation at −20°C. Proteome-derived peptide libraries were then produced as described previously11,58.

Human proteome-derived peptide libraries

HEK293T cells (ATCC #CRL-3216) were grown to 90% confluence in DMEM media supplemented with 10% (v/v) fetal bovine serum at 37°C under a 5% CO2 atmosphere. Cells were washed once with PBS and versene (10 mL) was added. The flask was incubated for 10 min at 37°C. Cells were resuspended in versene, transferred to a conical tube, and harvested by centrifugation at 300 × g for 5 min. The cell pellet was resuspended in 800 μL hot lysis buffer (100 mM Tris-HCl, pH 8.5, 6 M guanidine hydrochloride, 5 mM TCEP, 10 mM chloroacetamide) that had been preheated to 95°C and the suspension was heated at 95°C for 10 min. Lysis was completed using 10 cycles of probe ultrasonication (20% amplitude; 5 s on / 5 s off). Insoluble material was removed by centrifugation at 20,000 × g at 4°C for 10 min. Protein was then precipitated by addition of TCA to a final concentration of 15% (w/v). Proteome-derived peptide libraries were then produced as described previously11,58. HEK293T cells used for proteome-derived peptide library generation were tested every six months for mycoplasma contamination using the LookOut Mycoplasma PCR Detection Kit (Sigma-Aldrich) according to the manufacturer’s instructions (Fig. S28).

Modification of peptide libraries with 2PCA

E. coli or human proteome-derived peptide library modification reactions contained 1 mg/mL proteome-derived peptide library, the appropriate concentration of 2PCA (10 μM-50 mM), and phosphate buffer, pH 7.5 (10 mM or 50 mM). Temperature dependence reactions contained 50 mM 2PCA and were incubated at 37°C-75°C for the indicated amount of time (4 h or 20 h). For 2PCA specificity profiling (Fig. 1), reactions contained 1 mg/mL peptide library, 10 mM 2PCA, and 50 mM sodium phosphate, pH 7.5 and were incubated for 4 h at 37°C. For N-terminal blocking of peptide libraries, reactions contained 1 mg/mL peptide library, 50 mM 2PCA, and 50 mM sodium phosphate, pH 7.5 and were incubated for 20 h at 37°C. After 2PCA modification, samples were desalted using the single-pot, solid-phase-enhanced sample-preparation (SP3) protocol59. Beads were prepared by mixing hydrophilic and hydrophobic Sera-Mag speedbeads (Cytiva Cat. #45152105050250 and Cat. #65152105050250) in a 1:1 ratio. Tubes were placed on a magnetic stand and supernatants were removed. Bead mixtures were then washed three times with water (1 mL). Supernatants were removed and tubes were removed from the stand. Acetonitrile was added to peptide samples to a final concentration of 95% (v/v) and samples were transferred to tubes containing the prewashed beads in ratio of 10 μg beads to 1 μg peptide. Tubes were vortexed briefly and then incubated at 30°C on a Thermomixer (Eppendorf) at 1000 rpm for 10 min. Samples were placed on a magnetic stand and supernatant was removed. Beads were then washed three times with 1 mL acetonitrile. After the final wash, beads were air-dried for two minutes. A volume of water equivalent to ten times the original bead volume was added for elution. Beads were sonicated for 1 min in a water bath sonicator and then incubated at 30°C for 5 min on a Thermomixer at 1000 rpm. Tubes were placed on a magnetic stand and eluted peptides were collected by pipetting into a fresh tube.

Modification of synthetic peptides with 2PCA

Synthetic peptides were purchased from Peptide2.0. Reactions (30 μL) contained 1 mM peptide, 10 mM 2PCA, and 50 mM phosphate buffer, pH 7.5 and were performed in a 384-well plate. Reactions were incubated for 4 h at 37°C and then quenched with an equal volume of 6% formic acid in water. Samples (0.25 μL) were analyzed on an Agilent 6230B time-of-flight (TOF) mass spectrometer equipped with an Agilent JetStream Ionization source. Samples were separated on an Agilent Zorbax Extend C18 column (2.1 × 50 mm, 1.8 μm particle size) heated to 50°C at 0.6 mL/min using the following gradient: 0-0.2 min, 98% A/2% B; 0.21-1.5 min, linear gradient from 98% A to 100% B; 1.5-2.5 min, hold 100% B. Solvent A was water with 0.1% formic acid and solvent B was 100% acetonitrile. Ion chromatograms for reactant and product ions were extracted using a symmetrical window of 0.1 m/z.

Synthesis of biotin-2PCA

Biotin-2PCA (2) was synthesized as described previously10. To a solution of 6-(1-Piperazinylmethyl)-2-pyridinecarboxaldehyde bistosylate salt (25 mg, 0.045 mmol, 1 equiv.) in N, N-dimethylformamide (DMF) was added triethylamine (13.6 mg, 18.8 μL, 0.135 mmol, 3 equiv.) and NHS-biotin (17 mg, 0.05 mmol, 1.1 equiv.). The reaction mixture was stirred for 1 h at room temperature. The mixture was injected onto an Agilent Eclipse XDB-C18 column (9.4 x 250 mm, 5 μm) and purified by reverse-phase HPLC (0-100% B over 90 min at 2 mL/min; A: 0.1% aqueous TFA, B: 0.1% TFA in acetonitrile) using an Agilent 1260 quaternary pump coupled to a diode-array detector. Fractions (1 mL) were flash frozen in liquid nitrogen and lyophilized. The lyophilized biotin-2PCA was dissolved in DMSO and characterized using an Agilent G6230B time-of-flight mass spectrometer (TOF-MS) (m/z (M+H+), 432.2081; m/z (M+H+calc), 432.2069).

graphic file with name nihms-1934332-f0007.jpg

Synthesis of biotin-SS-2PCA

Biotin-SS-2PCA (3) was synthesized using the general protocol described previously10. To a solution of 6-(1-Piperazinylmethyl)-2-pyridinecarboxaldehyde bistosylate salt (25 mg, 0.045 mmol, 1 equiv.) in N, N-dimethylformamide (DMF) was added triethylamine (13.6 mg, 18.8 μL, 0.135 mmol, 3 equiv.) and NHS-SS-biotin (25 mg, 0.05 mmol, 1.1 equiv.). The reaction mixture was stirred for 1 h at room temperature. The mixture was injected onto an Agilent Eclipse XDB-C18 column (9.4 x 250 mm, 5 μm) and purified by reverse-phase HPLC (0-100% B over 90 min at 2 mL/min; A: 0.1% aqueous TFA, B: 0.1% TFA in acetonitrile) using an Agilent 1260 quaternary pump coupled to a diode-array detector. Fractions (1 mL) were flash frozen in liquid nitrogen and lyophilized. The lyophilized biotin-SS-2PCA was dissolved in DMSO and characterized using an Agilent G6230B time-of-flight mass spectrometer (TOF-MS) (m/z (M+Na+), 617.2026; m/z (M+Na+calc), 617.2014).

graphic file with name nihms-1934332-f0008.jpg

Biotin-2PCA modification of peptide libraries

Biotin-2PCA (2) and biotin-SS-2PCA (3) N-terminal modification reactions (50 μL total volume) contained 1 mg/mL (~1 mM) proteome-derived peptide library (25 μL of 2 mg/mL stock), 0.5 mM biotin-2PCA or biotin-SS-2PCA (2.5 μL of 10 mM stock in DMSO), and 50 mM sodium phosphate, pH 7.5 (6.3 μL of 400 mM stock). Reactions were initiated by addition of biotin-2PCA or biotin-SS-2PCA and were incubated at 37°C for 2 h.

Alkyne-2PCA modification of peptide libraries

Alkyne-2PCA (4) N-terminal modification reactions (50 μL total volume) contained 1 mg/mL (~1 mM) proteome-derived peptide library (25 μL of 2 mg/mL stock), 0.5 mM alkyne-2PCA (2.5 μL of 10 mM stock in DMSO), and 50 mM sodium phosphate, pH 7.5 (6.3 μL of 400 mM stock). Reactions were initiated by addition of alkyne-2PCA and were incubated at 37°C for 2 h.

Biotinylation of alkyne-2PCA peptides

Click reactions (100 μL total volume) for biotinylation of alkyne-2PCA modified peptides contained 50 μL of alkyne-2PCA modification reaction mixture, 0.25 mM biotin-linker-azide (2.5 μL of a 10 mM DMSO stock; linkers were either DADPS (8), Dde (7), Diazo (9), disulfide (6), or no linker (5)), 0.4 mM CuSO4 and 0.8 mM BTTAA (3.2 μL of a solution containing 12.5 mM CuSO4 and 25 mM BTTAA), and 0.5 mg/mL sodium ascorbate (10 μL of a freshly made 5 mg/mL stock), and ddH2O to 100 μL. Components were added to the reaction in the following order: alkyne-modified peptide library, biotinylated azide, CuSO4/BTTAA, sodium ascorbate. Reactions were incubated for 1 h at 37°C.

Enrichment of biotinylated peptides

Biotinylated peptides were enriched on High-Capacity Neutravidin Agarose resin (ThermoFisher) (250 μL of 50% resin slurry per reaction). Click reactions were diluted with 4 volumes (400 μL) of 4 M guanidine hydrochloride and added to resin pre-equilibrated with 4 M guanidine hydrochloride. Samples were incubated for 30 min at room temperature on a rotisserie mixer. The resin slurry was transferred into a spin column (Pierce #69725) and attached to the QIAvac 24 plus vacuum manifold (Qiagen) for washing. The resin was washed ten times with 800 μL of 4 M guanidine hydrochloride and ten times with 800 μL of LC-MS grade water. Columns were removed from the vacuum manifold, the bottoms were capped, resin was resuspended in 225 μL of the appropriate elution buffer, and samples were transferred to 1.5 mL microcentrifuge tubes for elution. For biotin azide with no cleavable linker, peptides were eluted twice with 80% acetonitrile, 0.1% formic acid, first at 30°C with shaking for 10 min, second at 72°C with shaking for 10 min. The eluate was dried in a vacuum concentrator and redissolved in 1% TFA before desalting. For biotin-disulfide-azide, peptides were eluted with 5 mM TCEP for 1 h at room temperature. For biotin-DADPS-azide, peptides were eluted with 10% formic acid for 1 h at room temperature. For biotin-Diazo-azide, peptides were eluted with 25 mM dithionite in PBS for 1 h at room temperature. For biotin-Dde-azide, peptides were eluted with 2% aqueous hydrazine for 1 h at room temperature. Biotin-linker-azide samples were adjusted to pH <3 with TFA before desalting. Desalting was performed using SOLA HRP C18 columns (10 mg, Thermo Scientific). Columns were conditioned with 100% acetonitrile (500 μL) and equilibrated with 0.1% TFA (2 × 1 mL) before loading the sample. After sample loading, columns were washed with 0.1% TFA (2 × 1 mL), 0.1% TFA, 5% MeOH (1 mL), and 0.1% formic acid, 2% acetonitrile (1 mL). Peptides were eluted with 50% acetonitrile, 0.1% formic acid (2 × 150 μL). The eluted peptides were dried in a vacuum concentrator and then redissolved in 0.1% formic acid prior to LC-MS/MS analysis.

Reductive dimethylation of peptide libraries

N-terminal dimethylation reactions were carried out using a general method described previously27. N-terminal reductive dimethylation reactions (200 μL) contained 1 mg/mL peptide library, 30 mM formaldehyde (0.6 μL of a 10 M stock solution), 30 mM sodium cyanoborohydride (60 μL of a 100 mM stock solution), and 1% acetic acid (2 μL of glacial acetic acid). Reactions were allowed to proceed for 10 min at 37°C and were quenched by loading onto a pre-conditioned and equilibrated SOLA HRP C18 cartridge by centrifugation at 100 × g for 1 min. The cartridge was then washed with 2 × 1 mL of 0.1% TFA and eluted with 300 μL 0.1% formic acid/50% acetonitrile. The solution was evaporated to dryness in a vacuum centrifuge and peptides were resuspended in water.

PICS2 experiments

PICS2 reactions were performed with 100-400 μg of N-terminally blocked proteome-derived peptide library at a concentration of 1 mg/mL. Reaction buffer and protease amount varied. For trypsin, chymotrypsin, GluC, and LysargiNase PICS2 experiments, reactions were performed in 100 mM HEPES, pH 7.5 with a 1:100 (w/w) protease:peptide ratio. For Kex2 PICS2 experiments, reactions were performed in 100 mM Tris-HCl, pH 8.5 at a 1:20 protease:peptide ratio. Furin reactions were performed in 100 mM HEPES, pH 7.5, 1 mM CaCl2 at a 1:50 protease:peptide ratio. PCSK2 experiments were performed in 50 mM sodium acetate, pH 5.0, 100 mM NaCl, 1 mM CaCl2 at a 1:100 protease:peptide ratio. After 16 h, reactions were stopped by heating to 95°C for 10 min. Reactions were allowed to cool and biotin-SS-2PCA was added to a final concentration of 0.5 mM from a 10 mM DMSO stock solution. The biotinylation reaction was allowed to proceed for 4-16 h at 37°C. The reaction mixture was then diluted fivefold with 4 M guanidine hydrochloride and enriched on High-Capacity Neutravidin Agarose resin (1000 μL of 50% resin slurry per reaction) as described above. Biotin-SS-2PCA-modified peptides were eluted with 5 mM TCEP for 1-2 h at room temperature. Samples were desalted on SOLA HRP C18 cartridges prior to analysis by LC-MS/MS. A detailed PICS2 protocol is provided in Supplemental Item 1. PICS experiments performed for comparison were carried out as described previously58.

Induction of apoptosis in Jurkat cells

Jurkat E6.1 (ATCC#TIB-152) cells were grown in RPMI-1640 media supplemented with 10% fetal bovine serum, 2 mM L-glutamine, and 1% penicillin-streptomycin at 37°C under 5% CO2 atmosphere. One T225 flask of cells containing 250 mL of media with cells at a density of 106 cells per mL was used for each replicate. Cells were treated with either 50 μM etoposide (from a 50 mM DMSO stock solution) or an equal volume of DMSO for 8 h. Cells were harvested by centrifugation at 300 x g for 5 min and washed once with PBS. Jurkat cells were tested every six months for mycoplasma contamination using the LookOut Mycoplasma PCR Detection Kit (Sigma-Aldrich) according to the manufacturer’s instructions (Fig. S28).

CHOPPER sample preparation

Each cell pellet was resuspended in 800 μL hot lysis buffer (100 mM Tris-HCl, pH 8.5, 6 M guanidine hydrochloride, 5 mM TCEP, 10 mM chloroacetamide) that had been preheated to 95°C and the suspension was heated at 95°C for 10 min. Lysis was completed using 10 cycles of probe ultrasonication (20% amplitude; 5 s on / 5 s off). Insoluble material was removed by centrifugation at 20,000 × g at 4°C for 10 min. After cooling, alkyne-2PCA (100 mM stock solution in DMSO) was added to the supernatants to a final concentration of 1 mM and the lysate was incubated overnight at 37°C. Protein was then precipitated by addition of 15% (w/v) trichloroacetic acid and incubation for 16 h at −20°C. Precipitated protein was pelleted by centrifugation at 20,000 × g for 5 min and the supernatant was removed. Pellets were washed two times with 200 μL ice-cold acetone, centrifuged at 20,000 × g for 5 min, and air dried. Pellets were redissolved in 800 μL 20 mM NaOH. Cu(I)-catalyzed azide-alkyne cycloaddition (CuAAC) was then performed by mixing 257.2 μL of the protein solution with 80 μL 1 M HEPES, pH 7.5 (200 mM final concentration), 10 μL 10 mM biotin-SS-azide (0.25 mM final concentration), 12.8 μL of a pre-mixed solution containing 12.5 mM CuSO4 and 25 mM BTTAA (0.4 mM CuSO4 and 0.8 mM BTTAA final concentration), and 40 μL of 5 mg/mL freshly prepared sodium ascorbate. Click reactions were incubated for 1 h at 37°C. After 1 h, the reaction mixture was diluted with three volumes (1.2 mL) of 4 M GdnHCl and added to 250 μL of High-Capactiy Neutravidin Agarose resin that had been pre-equilibrated with 4 M GdnHCl. Samples were incubated for 30 min at room temperature on a rotisserie mixer. The resin slurry was transferred into a spin column (Pierce #69725) and attached to the QIAvac 24 plus vacuum manifold (Qiagen) for washing. The resin was washed ten times with 800 μL of 4 M guanidine hydrochloride and ten times with 800 μL of 20 mM HEPES, pH 7.5.

Columns were removed from the vacuum manifold, the bottoms were capped, resin was resuspended in 500 μL of 20 mM HEPES, pH 7.5, and samples were transferred to 1.5 mL microcentrifuge tubes. Trypsin (20 μg) was added and on-bead digestion was allowed to proceed overnight at room temperature on a rotisserie mixer. Following digestion, the slurry was transferred into a spin column and attached to the vacuum manifold for washing. The resin was washed ten times with 800 μL of 4 M guanidine hydrochloride and ten times with 800 μL of 20 mM HEPES, pH 7.5. Columns were removed from the vacuum manifold, the bottoms were capped, resin was resuspended in 500 μL of 20 mM HEPES, pH 7.5, and samples were transferred to 1.5 mL microcentrifuge tubes. TCEP was added to a final concentration of 5 mM and samples were incubated for 2 h at room temperature on a rotisserie mixer. After TCEP reduction, the resin slurry was transferred to a spin column and eluted N-terminal peptides were collected by centrifugation at 500 × g for 2 min. Samples were acidified by addition of TFA to 1% and desalted using SOLA HRP C18 desalting cartridges. Desalted peptides were dried in a vacuum centrifuge. A detailed CHOPPER protocol is provided in Supplemental Item 1.

HUNTER sample preparation

HUNTER samples were prepared according to a previously reported method36. Each cell pellet was resuspended in 500 μL lysis buffer: 1% sodium dodecyl sulfate, 2x protease and phosphatase inhibitor cocktail (Thermo Scientific, cat no. 78442) in 50 mM HEPES, pH 8.0. The lysate was transferred to a 1.5 mL LoBind tube (Fisher Scientific, cat. No. 13-698-794), heated at 95 °C for 5 minutes, then chilled on ice for 5 minutes. Following addition of 1 μL benzonase (Sigma, cat. No. 70664-3), samples were incubated for 30 minutes at 37°C. Protein concentration was determined using the Pierce BCA protein assay kit. Prior to reduction and alkylation, lysate was diluted in the appropriate volume of lysis buffer to achieve a final concentration of 2.2 mg/mL. DTT was added to a final concentration of 10 mM and samples were incubated at 37°C for 30 minutes. 2-Chloroacetamide was added to a final concentration of 50 mM and samples incubated in the dark for 20 minutes at room temperature. For each sample, a volume containing 100 μg of protein was aliquoted into a fresh 1.5 mL LoBind tube for labeling and enrichment. Samples were diluted in lysis buffer to approximately 0.4 mg/mL before SP3 sample cleanup. A mixture of 1:1 hydrophobic to hydrophilic (Cytiva Cat. #45152105050250 and Cat. #65152105050250) magnetic beads at 50 mg/mL was added at a ratio of 10:1 (w/w) beads to protein. Absolute ethanol was added to 80% (v/v) to initiate binding. Samples were incubated at room temperature for 18 minutes before binding to a magnetic rack. Supernatant was removed and lysates washed twice with 400 μL 90% ethanol. Beads were pelleted at 1000 × g for 1 minute to remove remaining ethanol and resuspended in 30 μL 200 mM HEPES, pH 7.0. Formaldehyde and sodium cyanoborohydride were added to final concentrations of 30 and 15 mM, respectively, and samples were incubated at 37°C for 1 hour. Following addition of fresh reagents to the same final concentrations, samples were incubated again at 37°C for 1 hour. The dimethylation reaction was quenched by addition of 4 M Tris, pH 6.8 to a final concentration of 600 mM. The 50 mg/mL bead stock was added at a 5:1 w/w bead to protein ratio. Absolute ethanol was added to 80% (v/v) to initiate binding. Samples were incubated at room temperature for 15 minutes before binding to a magnetic rack. Supernatant was removed and lysates washed twice with 400 μL 90% ethanol. Beads were pelleted at 1000 × g for 1 minute to remove remaining ethanol and resuspended in 30 μL trypsin in 200 mM HEPES, pH 7.0 at a trypsin to protein ratio of at least 1:100. Samples were incubated overnight (at least 13 hours) at 37°C. Absolute ethanol was added to the remaining proteome digest to 40% (v/v). Undecanal was added at a ratio of 20:1 (w/w) undecanal to peptide, along with sodium cyanoborohydride to a final concentration of 30 mM. Samples were incubated at 37°C for 1 h before binding to a magnetic rack for 1 minute. The supernatant was transferred to a fresh LoBind tube and acidified with 0.5% TFA (in 40% ethanol) to a final pH of 3-4. Depletion of undecanal-modified peptides was performed using 50 mg Sep-Pak C18 Vac cartridges. Columns were conditioned with 700 μL 100% methanol, followed by 700 μL 0.1% TFA in 40% ethanol. Sample volume was adjusted to 500 μL with 0.1% TFA in 40% ethanol and samples were loaded onto the columns. N-terminal peptides were eluted into fresh 1.5 mL LoBind tubes. Ethanol was removed by vacuum evaporation and peptides were desalted using SOLA HRP C18 desalting cartridges. Desalted peptides were dried in a vacuum centrifuge.

LC-MS/MS data collection

Dried peptide samples were redissolved in 0.1% formic acid prior to LC-MS/MS analysis. Peptide concentration was estimated by absorbance at 280 nm assuming that 1 absorbance unit = 1 mg/mL. For each experiment, 500 ng of peptides were analyzed with an Orbitrap Exploris 480 hybrid quadrupole-Orbitrap mass spectrometer coupled to an UltiMate 3000 RSLCnano liquid chromatography system (ThermoFisher Scientific). Peptides were loaded onto an Acclaim PepMap RSLC column (75 μm × 15 cm, 2 μm particle size, 100 A pore size, ThermoFisher Scientific) over 15 min in 97% mobile phase A (0.1% formic acid) and 3% mobile phase B (0.1% formic acid, 80% acetonitrile) at 0.5 μL/min. Peptides were eluted at 0.3 μL/min using a linear gradient from 3% mobile phase B to 50% mobile phase B over 120 min. Peptides were electrosprayed through a nanospray emitter tip connected to the column by applying 2000 V through the ion source’s DirectJunction adapter. Full MS scans were performed at a resolution of 60,000 at 200 m/z over a range of 300-1,200 m/z with an AGC target of 300% and the maximum injection time set to ‘auto’. The top 20 most abundant precursors with a charge state of 2-6 were selected for MS/MS analysis with an isolation window of 1.4 m/z and a precursor intensity threshold of 5 × 103. A dynamic exclusion window of 20 s with a precursor mass tolerance of ± 10 ppm was used. MS/MS scans were performed using HCD fragmentation using a normalized collision energy of 30% and a resolution of 15,000 with a fixed first mass of 110 m/z. The AGC target was set to ‘standard’ with a maximum injection time of 22 ms.

Mass spectrometry data analysis

Open and offset searches were performed in FragPipe 17.1. For open searches, the precursor mass tolerance was set to −150-500 Da. The initial fragment mass tolerance was set to 0.02Da. Calibration and Optimization was set to ‘Mass calibration, parameter optimization’. Isotope error was set to 0. Cleavage was set to ‘Enzymatic’ with clip N term enabled and enzyme name was set to trypsin, chymotrypsin, or gluc as appropriate. Up to two missed cleavages were allowed. Peptide length was set to 7-50 and peptide mass range was set to 500-5,000 Da. No variable modifications or fixed modifications were specified. Crystal-C was enabled and PeptideProphet was run with the following settings: ‘–nonparam –expectscore –decoyprobs –masswidth 1000.0 –clevel −2’. PTMProphet was disabled. The numbers of peptide-spectrum matches (PSMs) reported in the global.modsummary.tsv file were then plotted against the reported mass shift. Results can be found in Dataset S2.

For offset search in FragPipe, the precursor mass tolerance was set to 10 ppm and the fragment mass tolerance was set to 0.02 Da. Calibration and Optimization was set to ‘Mass calibration, parameter optimization’. Isotope error was set to 0/1/2/3. Cleavage was set to ‘Enzymatic’ with clip N term enabled and enzyme name was set to trypsin, chymotrypsin, or gluc as appropriate. Up to two missed cleavages were allowed. Peptide length was set to 7-50 and peptide mass range was set to 500-5,000 Da. Oxidation at Met (+15.9949) and acetylation at protein N termini (+42.0106) were searched as variable modification and Cys carbamidomethylation (+57.02146) was searched as a fixed modification. Mass offsets of 0 and 89.0265 were included with Restrict delta mass set to all. Crystal-C was disabled. PeptideProphet was run with the following settings” ‘–decoyprobs –ppm –accmass –nonparam –expectscore –database databasepath’, where database path points to the fasta database that was searched. Results can be found in Dataset S2.

Thermo RAW files were searched against either the human or E. coli SwissProt database using the SEQUEST algorithm in Proteome Discoverer 2.4 (Thermo). The precursor mass tolerance was set at 10 ppm and the fragment mass tolerance was set at 0.02 Da. Search parameters included carbamidomethylation at Cys (+57.021 Da) as a constant modification and the following dynamic modifications: oxidation at Met (+15.995 Da), acetylation at protein N termini (+42.011 Da), Met loss at protein N termini (−131.040 Da), Met loss+acetylation at protein N termini (−89.030 Da). Additional dynamic modifications at the peptide N terminus were included as appropriate for the experiment. Structures of chemical modifications are shown in Fig. S29. Modification masses are listed in Table S2. Up to two missed cleavages were allowed. For PICS and CHOPPER data analysis, cleavage specificity was set to semi (C-term), requiring that the C terminus of each peptide have the specificity of the digest protease but allowing the N terminus to vary. The Percolator node of Proteome Discoverer was used for PSM validation at a false discovery rate of 1%. Raw datafiles, peak lists and results have been deposited in the ProteomeXchange repository under the accession numbers listed in Table S3. Results are also provided in Microsoft Excel format as Datasets S1S29 as described in Table S3.

Specificity data analysis

Enrichment scores for 2PCA specificity were calculated using custom Python scripts available in the Zenodo repository under the DOI listed in the key resources table. These scores correspond to the standard score (z score) comparing frequencies of amino acids in the total population of peptides to frequencies of amino acids in the 2PCA-modified peptide population and were calculated using the equation

Z=f2PCAftotalσ

where f2PCA is the frequency of the amino acid among the 2PCA-modified population, ftotal is the frequency of the amino acid in the total population of peptides, and σ is the population standard deviation.

PICS2 data analysis

The frequencies of each amino acid at each position in PICS samples were calculated using a custom Python script that is available in the Zenodo repository under the DOI listed in the key resources table. This script filters Proteome Discoverer search results for peptides that are modified with cleaved biotin-SS-2PCA and whose cleavage sites do not match the specificity of the protease used for library generation. It then retrieves the protein sequence from a SwissProt xml file containing proteome sequences for either human or E. coli and uses this information to infer the nonprime side sequence of the measured peptide (i.e., the sequence N-terminal to the cleavage site). The script then counts the number of times each amino acid is observed in each position and calculates a frequency by dividing by the total number of observations. IceLogos were constructed using iceLogo1.260 with either the human or E. coli SwissProt database as a reference.

CHOPPER data analysis

CHOPPER data analysis was performed using a custom Python script that is available in the Zenodo repository under the DOI listed in the key resources table. This script filters Proteome Discoverer search results for peptides that are modified with alkyne-2PCA clicked to cleaved biotin-2PCA. It then retrieves the protein sequence from a SwissProt61 xml file containing all human proteome protein sequences to infer the nonprime side residues of the cleavage site. Cleavage sites were plotted on domain boundaries using a custom Python script that retrieves domain boundaries from the SwissProt xml file in combination with matplotlib. STRING analysis was performed using Cytoscape 3.9.162 with a confidence score cutoff of 0.99. All proteins known to be cleaved with P1 = D specificity upon etoposide treatment of Jurkat cells were used in the analysis. Figure 5G shows only clusters that contain at least one cleavage identified in the CHOPPER dataset. Fig. S23 shows the full network of STRING interactions.

Label-free quantification

Label-free quantification was performed using MS1 extracted ion chromatograms in Skyline (http://proteome.gs.washington.edu/software/skyline)63. Database search results were imported from pdResult files generated by Proteome Discoverer 2.4 (Thermo). Chromatograms were extracted from Thermo RAW files. Integrated peaks were inspected manually to ensure retention time consistency between samples. The total MS1 area for each 2PCA tagged peptide was exported. Data were normalized across samples based on the areas of eight 2PCA-tagged, non-P1 = D peptides derived from tubulin (Q71436), actin (P60709), RNA-binding protein 14 (Q8C2Q3), stathmin (P16949), and PDZ and LIM domain protein 1 (O00151) The mean normalized areas from etoposide-treated and DMSO-treated samples were used to calculate area ratios. P values were calculated in Prism 9 (GraphPad) using unpaired t-tests. Peak areas are provided as Dataset 30.

Comparison of N terminomics methods

Unique N termini identified by CHOPPER were compared to previously published subtiligase11,45 and HYTANE51 datasets, which were recoded to identify each N terminus by its Uniprot ID and the position of the P1 residue. These lists of N termini are provided as Datasets 31 and 32. Overlapping and unique peptides in the datasets were identified using standard set functions available in Python.

Quantification and statistical analysis

Enrichment scores for 2PCA modification of peptide libraries were calculated using custom Python scripts that have been deposited in Zenodo under the DOI listed in the key resources table. The enrichment score corresponds to the standard score (z score) comparing 2PCA-modified peptide sequences to the total population. Three independent replicates were performed for each experiment using peptides libraries generated with trypsin, chymotrypsin, and Glu-C. The number of peptides analyzed in each replicate can be found in Figure S2 and in the corresponding supplementary datasets. Additional details are given in Method Details. For analysis of label-free quantification data, P values were calculated using a two-tailed unpaired t test using Prism 9 (GraphPad). For Figure S22B, two-way ANOVA was performed in Prism 9 (GraphPad). A p value of less than 0.05 (*, p < 0.05; **, p < 0.01; ***, p < 0.001; ****, p < 0.0001) was considered statistically significant. For CHOPPER experiments, two cell culture replicates were performed for each condition. Error bars correspond to the mean ± SD.

Supplementary Material

1
2

Table S3. List of ProteomeXchange accession numbers and experimental information, related to STAR Methods.

3

Supplemental Item 1. Detailed protocols for PICS2 and CHOPPER, related to STAR Methods.

4

Dataset S1. E coli proteome-derived peptide libraries made with trypsin, chymotrypsin and GluC, related to Figure 1.

5

Dataset S2. E coli proteome derived peptide libraries treated with 2PCA for 4 h at 37°C, related to Figure 1.

6

Dataset S3. Temperature dependence of modification of E coli proteome derived-peptide libraries with 50 mM 2PCA for 4 h, related to Figure 2.

7

Dataset S4. Temperature dependence of modification of E coli proteome derived-peptide libraries with 50 mM 2PCA for 20 h, related to Figure 2.

8

Dataset S5. Concentration dependence of 2PCA modification of E coli proteome-derived peptide libraries, related to Figure 2.

9

Dataset S6. Human proteome-derived peptide library generated with trypsin; treated with 10 mM alkyne-2PCA for 20 h at 37°C, related to Figure 3.

10

Dataset S7. Human proteome-derived peptide library generated with trypsin; treated with 10 mM alkyne-2PCA for 20 h at 37°C; clicked with biotin azide, related to Figure 3.

11

Dataset S8. Biotin-SS-2PCA enrichment with E coli proteome-derived peptide libraries, related to Figure 3.

12

Dataset S9. Biotin-2PCA enrichment with E coli proteome-derived peptide libraries, related to Figure 3.

13

Dataset S10. Alkyne-2PCA clicked to biotin-azide enrichment with E coli proteome-derived peptide libraries, related to Figure 3.

14

Dataset S11. Alkyne-2PCA clicked to biotin-disulfide-azide enrichment with E coli proteome derived peptide libraries, related to Figure 3.

15

Dataset S12. Alkyne-2PCA clicked to biotin-DADPS-azide enrichment with E coli proteome derived peptide libraries, related to Figure 3.

16

Dataset S13. Alkyne-2PCA clicked to biotin-Dde-azide enrichment with E coli proteome derived peptide libraries, related to Figure 3.

17

Dataset S14. Alkyne-2PCA clicked to biotin-Diazo-azide enrichment with E coli proteome derived peptide libraries, related to Figure 3.

18

Dataset S15. Human proteome-derived peptide libraries generated with trypsin, chymotrypsin, and GluC, related to Figure 4.

19

Dataset S16. Human and E coli proteome-derived peptide libraries N-terminally blocked with 2PCA, related to Figure 4.

20

Dataset S17. Trypsin PICS with E coli proteome-derived peptide libraries, related to Figure 4.

21

Dataset S19. LysargiNase PICS2 with E coli proteome-derived peptide libraries, related to Figure 4.

22

Dataset S20. GluC PICS2 with E coli proteome-derived tryptic peptide library, related to Figure 4.

23

Dataset S21. Chymotrypsin PICS2 with E coli tryptic proteome-derived peptide library, related to Figure 4.

24

Dataset S22. N-terminally dimethylated E coli proteome-derived peptide libraries, related to Figure 4.

25

Dataset S23. Kex2 PICS with N-terminally dimethylated E coli proteome-derived peptide libraries, related to Figure 4.

26

Dataset S24. Furin PICS2 with N-terminally 2PCA-modified human proteome-derived peptide libraries, related to Figure 4.

27

Dataset S25. PCSK2 PICS2 with N-terminally dimethylated human proteome-derived peptide libraries, related to Figure 4.

28

Dataset S26. CHOPPER with etoposide-treated Jurkat cells, related to Figure 5.

29

Dataset S27. CHOPPER with DMSO-treated Jurkat cells, related to Figure 5.

30

Dataset S28. HUNTER with etoposide-treated Jurkat cells, related to Figure 5.

31

Dataset S29. HUNTER with DMSO-treated Jurkat cells, related to Figure 5.

32

Dataset S30. Label-free quantification of CHOPPER results, related to Figure 5.

33

Dataset S31. Putative caspase neo-N termini identified by HYTANE, related to Figure 5.

34
35

Dataset S32. Putative caspase neo-N termini identified by subtiligase N terminomics, related to Figure 5.

KEY RESOURCES TABLE

REAGENT or RESOURCE SOURCE IDENTIFIER
Bacterial and Virus Strains
E. coli XL10-Gold Agilent Cat#200315
Biological Samples
HEK293T cells ATCC ATCC CRL-3216; RRID:CVCL_0063
Jurkat E6.1 ATCC ATCC TIB-152; RRID:CVCL_0367
Chemicals, Peptides, and Recombinant Proteins
2-pyridinecarboxaldehyde Sigma-Aldrich Cat#P62003
6-(1-piperazinylmethyl)-2-pyridinecarboxaldehyde bistosylate salt Sigma-Aldrich Cat#808571
biotin N-hydroxysuccinimide ester ThermoFisher Scientific Cat#439302500
NHS-SS-biotin ThermoFisher Scientific Cat#21441
5-ethynylpicolinaldehyde Ambeed Inc. Cat#A301822
DADPS biotin azide Click Chemistry Tools Cat#1330
Dde biotin azide Click Chemistry Tools Cat#1136
trypsin, sequencing grade modified Promega Cat#V5113
chymotrypsin, sequencing grade Promega Cat#V1061
Glu-C, sequencing grade Promega Cat#V1651
LysargiNase Huesgen et al., 2015 N/A
recombinant yeast Kex2 PeproTech Cat#450-45
recombinant human furin R&D Systems Cat#1503-SE-010
recombinant human PCSK2 R&D Systems Cat#6018-SE-010
Synthetic peptides Peptide2.0 Custom peptide synthesis; sequences given in Figure S4.
etoposide Sigma-Aldrich Cat#E1383
formaldehyde Sigma-Aldrich Cat#47608
sodium cyanoborohydride Sigma-Aldrich Cat#156159
Critical Commercial Assays
Pierce BCA Protein Assay Kit Thermo Scientific Cat#23227
Deposited Data
2PCA mass spectrometry datasets This study Accession#: PXD040044, PXD040045, PXD040046, PXD040047, PXD040053, PXD040048, PXD040049, PXD040050, PXD040051, PXD040052, PXD040054, PXD040055, PXD040056, PXD040057, PXD040058, PXD040059, PXD040060, PXD040061, PXD040062, PXD040063, PXD040064, PXD040065, PXD040066, PXD040067, PXD040068, PXD040069, PXD040070, PXD043176, PXD043177
HYTANE apoptotic N termini dataset Chen, et al., 2016 Accession# PXD004690
Subtiligase and subtiligase mutant apoptotic N termini dataset Weeks and Wells, 2018 Accession#: PXD007023, PXD007025, PXD007026, PXD007024
Subtiligase apoptotic N termini dataset Mahrus, et al., 2018; Crawford, et al., 2013 https://https://wellslab.ucsf.edu/degrabase/
Experimental Models: Cell Lines
HEK293T cell line ATCC ATCC CRL-3216; RRID:CVCL_0063
Jurkat E6.1 cell line ATCC ATCC TIB-152; RRID:CVCL_0367
Experimental Models: Organisms/Strains
model organism: E. coli XL-10 Gold: endA1 glnV44 recA1 thi-1 gyrA96 relA1 lac Hte Δ(mcrA)183 Δ(mcrCB-hsdSMR-mrr)173 tetR F’[proAB lacIqZΔM15 Tn10(TetR Amy CmR)] Agilent Cat#200315
Software and Algorithms
Python data analysis scripts This study DOI: 10.5281/zenodo.7606565

Significance.

N terminomics technologies enable global profiling of proteolytic cleavage sites using mass spectrometry-based proteomics. Although existing N terminomics technologies have yielded significant insights into protease function and specificity, they are limited by sequence specificity, low efficiency, sensitivity, and/or the need to synthesize specialized chemical reagents. Here, we report an expanded N terminomics toolbox based on the N-terminally selective reagent 2PCA. We integrate 2PCA reagents with click chemistry-based biotinylation and cleavable linkers for selective N-terminal peptide elution, enabling N terminomics with commercially available reagents. Using these reagents, we develop a positive-enrichment strategy for defining protease sequence specificity in proteome-derived peptide libraries and a method for enrichment of protease substrates from cell lysates. These technologies enable specificity profiling of proteases that are inaccessible to existing positive enrichment methods and discovery of new substrates of apoptotic proteolysis based on the unique advantages of 2PCA. Importantly, we find that 2PCA-based methods are highly complementary to other N terminomics technologies and can identify distinct cohorts of protease substrates, expanding the breadth of the known proteolytic proteome.

Highlights.

  • 2-Pyridinecarboxaldehyde (2PCA) is a broad-specificity N-terminal probe

  • 2PCA probes rapidly define protease sequence specificity

  • Clickable 2PCA probes enable proteomic profiling of protease sites

  • 2PCA-based enrichment is highly complementary to other N terminomics technologies

Acknowledgements

We thank S. Coyle, L. Mazurkiewicz, K. Radziwon, M. Ravalin, and D. Sashital for helpful discussions. This work was supported in part by startup funds from the University of Wisconsin-Madison Department of Biochemistry, by a David and Lucille Packard Fellowship for Science and Engineering, and by a Career Award at the Scientific Interface from the Burroughs Wellcome Fund (1017065) (to A.M.W.). H.N.B was supported in part by a William R. and Dorothy E. Sullivan Wisconsin Distinguished Graduate Fellowship. W. L. was supported in part by the UW-Madision Chemistry-Biology Interface Training Program under grant number NIH T32 GM008505. C.L.F. was supported in part by the UW-Madison Biotechnology Training Program under grant number NIH 5 T32 GM135066.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Declaration of interests

The authors declare no competing interests.

Inclusion and diversity

We support inclusive, diverse, and equitable conduct of research.

References

  • 1.Puente XS, Sánchez LM, Overall CM, and López-Otín C (2003). Human and mouse proteases: a comparative genomic approach. Nat Rev Genet 4, 544–558. 10.1038/nrg1111. [DOI] [PubMed] [Google Scholar]
  • 2.Drag M, and Salvesen GS (2010). Emerging principles in protease-based drug discovery. Nat Rev Drug Discov 9, 690–701. 10.1038/nrd3053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Schilling O, and Overall CM (2008). Proteome-derived, database-searchable peptide libraries for identifying protease cleavage sites. Nat Biotechnol 26, 685–694. 10.1038/nbt1408. [DOI] [PubMed] [Google Scholar]
  • 4.Mahrus S, Trinidad JC, Barkan DT, Sali A, Burlingame AL, and Wells JA (2008). Global sequencing of proteolytic cleavage sites in apoptosis by specific labeling of protein N termini. Cell 134, 866–876. 10.1016/j.cell.2008.08.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Kleifeld O, Doucet A, auf dem Keller U, Prudova A, Schilling O, Kainthan RK, Starr AE, Foster LJ, Kizhakkedathu JN, and Overall CM (2010). Isotopic labeling of terminal amines in complex samples identifies protein N-termini and protease cleavage products. Nat Biotechnol 28, 281–288. 10.1038/nbt.1611. [DOI] [PubMed] [Google Scholar]
  • 6.Staes A, Impens F, Damme PV, Ruttens B, Goethals M, Demol H, Timmerman E, Vandekerckhove J, and Gevaert K (2011). Selecting protein N-terminal peptides by combined fractional diagonal chromatography. Nat Protoc 6, 1130–1141. 10.1038/nprot.2011.355. [DOI] [PubMed] [Google Scholar]
  • 7.Griswold AR, Cifani P, Rao SD, Axelrod AJ, Miele MM, Hendrickson RC, Kentsis A, and Bachovchin DA (2019). A chemical strategy for protease substrate profiling. Cell Chem Biol 26, 901–907.e6. 10.1016/j.chembiol.2019.03.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Weeks AM, Byrnes JR, Lui I, and Wells JA (2021). Mapping proteolytic neo-N termini at the surface of living cells. Proc National Acad Sci 118, e2018809118. 10.1073/pnas.2018809118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Schechter I, and Berger A (1967). On the size of the active site in proteases. I. Papain. Biochem Bioph Res Co 27, 157–162. 10.1016/s0006-291x(67)80055-x. [DOI] [PubMed] [Google Scholar]
  • 10.MacDonald JI, Munch HK, Moore T, and Francis MB (2015). One-step site-specific modification of native proteins with 2-pyridinecarboxyaldehydes. Nat Chem Biol 11, 326–331. 10.1038/nchembio.1792. [DOI] [PubMed] [Google Scholar]
  • 11.Weeks AM, and Wells JA (2018). Engineering peptide ligase specificity by proteomic identification of ligation sites. Nat Chem Biol 14, 50–57. 10.1038/nchembio.2521. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kong AT, Leprevost FV, Avtonomov DM, Mellacheruvu D, and Nesvizhskii AI (2017). MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat Methods 14, 513–520. 10.1038/nmeth.4256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zanon PRA, Yu F, Musacchio P, Lewald L, Zollo M, Krauskopf K, Mrdović D, Raunft P, Maher TE, Cigler M, et al. (2021). Profiling the proteome-wide selectivity of diverse electrophiles. 10.33774/chemrxiv-2021-w7rss-v2. [DOI] [Google Scholar]
  • 14.Schilling O, and Overall CM (2007). Proteomic discovery of protease substrates. Curr Opin Chem Biol 11, 36–45. 10.1016/j.cbpa.2006.11.037. [DOI] [PubMed] [Google Scholar]
  • 15.Verhelst SHL, Fonović M, and Bogyo M (2007). A mild chemically cleavable linker system for functional proteomic applications. Angew. Chem. Int. Ed 46, 1284–1286. 10.1002/anie.200603811. [DOI] [PubMed] [Google Scholar]
  • 16.Yang Y-Y, Grammel M, Raghavan AS, Charron G, and Hang HC (2010). Comparative analysis of cleavable azobenzene-based affinity tags for bioorthogonal chemical proteomics. Chem Biol 17, 1212–1222. 10.1016/j.chembiol.2010.09.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Szychowski J, Mahdavi A, Hodas JJL, Bagert JD, Ngo JT, Landgraf P, Dieterich DC, Schuman EM, and Tirrell DA (2010). Cleavable biotin probes for labeling of biomolecules via azide-alkyne cycloaddition. J Am Chem Soc 132, 18351–18360. 10.1021/ja1083909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Rabalski AJ, Bogdan AR, and Baranczak A (2019). Evaluation of chemically-cleavable linkers for quantitative mapping of small molecule-cysteinome reactivity. Acs Chem Biol 14,1940–1950. 10.1021/acschembio.9b00424. [DOI] [PubMed] [Google Scholar]
  • 19.Sletten EM, and Bertozzi CR (2009). Bioorthogonal chemistry: fishing for selectivity in a sea of functionality. Angew. Chem. Int. Ed 48, 6974–6998. 10.1002/anie.200900942. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Xiao H, He M, Xie G, Liu Y, Zhao Y, Ye X, Li X, and Zhang M (2019). The release of tryptase from mast cells promote tumor cell metastasis via exosomes. Bmc Cancer 19, 1015. 10.1186/s12885-019-6203-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Yamashita K, Mimori K, Inoue H, Mori M, and Sidransky D (2003). A tumor-suppressive role for trypsin in human cancer progression. Cancer Res 63, 6575–6578. [PubMed] [Google Scholar]
  • 22.Ramachandran R, Noorbakhsh F, DeFea K, and Hollenberg MD (2012). Targeting proteinase-activated receptors: therapeutic potential and challenges. Nat Rev Drug Discov 11, 69–86. 10.1038/nrd3615. [DOI] [PubMed] [Google Scholar]
  • 23.Belouzard S, Chu VC, and Whittaker GR (2009). Activation of the SARS coronavirus spike protein via sequential proteolytic cleavage at two distinct sites. Proc National Acad Sci 106, 5871–5876. 10.1073/pnas.0809524106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Millet JK, and Whittaker GR (2014). Host cell entry of Middle East respiratory syndrome coronavirus after two-step, furin-mediated activation of the spike protein. Proc National Acad Sci 111, 15214–15219. 10.1073/pnas.1407087111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Benton DJ, Wrobel AG, Xu P, Roustan C, Martin SR, Rosenthal PB, Skehel JJ, and Gamblin SJ (2020). Receptor binding and priming of the spike protein of SARS-CoV-2 for membrane fusion. Nature 588, 327–330. 10.1038/s41586-020-2772-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Huesgen PF, Lange PF, Rogers LD, Solis N, Eckhard U, Kleifeld O, Goulas T, Gomis-Ruth FX, and Overall CM (2015). LysargiNase mirrors trypsin for protein C-terminal and methylation-site identification. Nat Methods 12, 55–58. 10.1038/nmeth.3177. [DOI] [PubMed] [Google Scholar]
  • 27.Qin H, Wang F, Zhang Y, Hu Z, Song C, Wu R, Ye M, and Zou H (2012). Isobaric cross-sequence labeling of peptides by using site-selective N-terminus dimethylation. Chem Commun 48, 6265–6267. 10.1039/c2cc31705b. [DOI] [PubMed] [Google Scholar]
  • 28.Seidah NG, and Prat A (2012). The biology and therapeutic targeting of the proprotein convertases. Nat Rev Drug Discov 11, 367–383. 10.1038/nrd3699. [DOI] [PubMed] [Google Scholar]
  • 29.Seidah NG (2011). Proprotein convertases. Methods Mol Biology 768, 23–57. 10.1007/978-1-61779-204-5_3. [DOI] [PubMed] [Google Scholar]
  • 30.Julius D, Brake A, Blair L, Kunisawa R, and Thorner J (1984). Isolation of the putative structural gene for the lysine-arginine-cleaving endopeptidase required for processing of yeast prepro-α-factor. Cell 37, 1075–1089. 10.1016/0092-8674(84)90442-2. [DOI] [PubMed] [Google Scholar]
  • 31.Fuller RS, Brake A, and Thorner J (1989). Yeast prohormone processing enzyme (KEX2 gene product) is a Ca2+-dependent serine protease. Proc National Acad Sci 86, 1434–1438. 10.1073/pnas.86.5.1434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zhu Y, Zhang X, Cartwright CP, and Tipper DJ (1992). Kex2-dependent processing of yeast K1 killer preprotoxin includes cleavage at ProArg-44. Mol Microbiol 6, 511–520. 10.1111/j.1365-2958.1992.tb01496.x. [DOI] [PubMed] [Google Scholar]
  • 33.Day R. (1992). Distribution and regulation of the prohormone convertases PC1 and PC2 in the rat pituitary. Mol Endocrinol 6, 485–497. 10.1210/me.6.3.485. [DOI] [PubMed] [Google Scholar]
  • 34.Thomas G. (2002). Furin at the cutting edge: From protein traffic to embryogenesis and disease. Nat Rev Mol Cell Bio 3, 753–766. 10.1038/nrm934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Luo SY, Araya LE, and Julien O (2019). Protease substrate identification using N-terminomics. Acs Chem Biol 14, 2361–2371. 10.1021/acschembio.9b00398. [DOI] [PubMed] [Google Scholar]
  • 36.Weng SSH, Demir F, Ergin EK, Dirnberger S, Uzozie A, Tuscher D, Nierves L, Tsui J, Huesgen PF, and Lange PF (2019). Sensitive determination of proteolytic proteoforms in limited microscale proteome samples. Mol Cell Proteomics 18, 2335–2347. 10.1074/mcp.tir119.001560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Weeks AM, and Wells JA (2020). N-terminal modification of proteins with subtiligase specificity variants. Curr Protoc Chem Biology 12, e79. 10.1002/cpch.79. [DOI] [PubMed] [Google Scholar]
  • 38.Weeks AM, and Wells JA (2019). Subtiligase-catalyzed peptide ligation. Chem Rev 120, 3127–3160. 10.1021/acs.chemrev.9b00372. [DOI] [PubMed] [Google Scholar]
  • 39.Yoshihara HAI, Mahrus S, and Wells JA (2008). Tags for labeling protein N-termini with subtiligase for proteomics. Bioorg Med Chem Lett 18, 6000–6003. 10.1016/j.bmcl.2008.08.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Nagata S. (2018). Apoptosis and clearance of apoptotic cells. Annu Rev Immunol 36, 1–29. 10.1146/annurev-immunol-042617-053010. [DOI] [PubMed] [Google Scholar]
  • 41.Alnemri ES, Livingston DJ, Nicholson DW, Salvesen G, Thornberry NA, Wong WW, and Yuan J (1996). Human ICE/CED-3 protease nomenclature. Cell 81, 171. 10.1016/s0092-8674(00)81334-3. [DOI] [PubMed] [Google Scholar]
  • 42.Galluzzi L, Vitale I, Aaronson SA, Abrams JM, Adam D, Agostinis P, Alnemri ES, Altucci L, Amelio I, Andrews DW, et al. (2018). Molecular mechanisms of cell death: recommendations of the Nomenclature Committee on Cell Death 2018. Cell Death Differ 25, 486–541. 10.1038/S41418-017-0012-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Xiao Q, Zhang F, Nacev BA, Liu JO, and Pei D (2010). Protein N-terminal processing: substrate specificity of Escherichia coli and human methionine aminopeptidases. Biochemistry 49, 5588–5599. 10.1021/bi1005464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Ree R, Varland S, and Arnesen T (2018). Spotlight on protein N-terminal acetylation. Exp Mol Medicine 50, 1–13. 10.1038/s12276-018-0116-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Crawford ED, Seaman JE, Agard N, Hsu GW, Julien O, Mahrus S, Nguyen H, Shimbo K, Yoshihara HAI, Zhuang M, et al. (2013). The DegraBase: a database of proteolysis in healthy and apoptotic human cells. Mol Cell Proteomics 12, 813–824. 10.1074/mcp.o112.024372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Paysan-Lafosse T, Blum M, Chuguransky S, Grego T, Pinto BL, Salazar GA, Bileschi ML, Bork P, Bridge A, Colwell L, et al. (2022). InterPro in 2022. Nucleic Acids Res 51, D418–D427. 10.1093/nar/gkac993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Dix MM, Simon GM, and Cravatt BF (2008). Global mapping of the topography and magnitude of proteolytic events in apoptosis. Cell 134, 679–691. 10.1016/j.cell.2008.06.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Agard NJ, Mahrus S, Trinidad JC, Lynn A, Burlingame AL, and Wells JA (2012). Global kinetic analysis of proteolysis via quantitative targeted proteomics. Proc National Acad Sci 109, 1913–1918. 10.1073/pnas.1117158109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Dix MM, Simon GM, Wang C, Okerberg E, Patricelli MP, and Cravatt BF (2012). Functional interplay between caspase cleavage and phosphorylation sculpts the apoptotic proteome. Cell 150, 426–440. 10.1016/j.cell.2012.05.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Wiita AP, Ziv E, Wiita PJ, Urisman A, Julien O, Burlingame AL, Weissman JS, and Wells JA (2013). Global cellular response to chemotherapy-induced apoptosis. Elife 2, e01236. 10.7554/elife.01236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Chen L, Shan Y, Weng Y, Sui Z, Zhang X, Liang Z, Zhang L, and Zhang Y (2016). Hydrophobic tagging-assisted N-termini enrichment for in-depth N-terminome analysis. Anal. Chem 88, 8390–8395. 10.1021/acs.analchem.6b02453. [DOI] [PubMed] [Google Scholar]
  • 52.Wang M, Herrmann CJ, Simonovic M, Szklarczyk D, and Mering C (2015). Version 4.0 of PaxDb: Protein abundance data, integrated across model organisms, tissues, and cell-lines. Proteomics 15, 3163–3168. 10.1002/pmic.201400441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Yi L, Gebhard MC, Li Q, Taft JM, Georgiou G, and Iverson BL (2013). Engineering of TEV protease variants by yeast ER sequestration screening (YESS) of combinatorial libraries. Proc National Acad Sci 110, 7229–7234. 10.1073/pnas.1215994110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Li Q, Yi L, Hoi KH, Marek P, Georgiou G, and Iverson BL (2017). Profiling protease specificity: combining yeast ER sequestration screening (YESS) with next generation sequencing. Acs Chem Biol 12, 510–518. 10.1021/acschembio.6b00547. [DOI] [PubMed] [Google Scholar]
  • 55.Matthews DJ, and Wells JA (1993). Substrate phage: selection of protease substrates by monovalent phage display. Science 260, 1113–1117. 10.1126/science.8493554. [DOI] [PubMed] [Google Scholar]
  • 56.Deng S-J, Bickett DM, Mitchell JL, Lambert MH, Blackburn RK, Carter HL, Neugebauer J, Pahel G, Weiner MP, and Moss ML (2000). Substrate specificity of human collagenase 3 assessed using a phage-displayed peptide library. J Biol Chem 275, 31422–31427. 10.1074/jbc.m004538200. [DOI] [PubMed] [Google Scholar]
  • 57.Zhou J, Li S, Leung KK, O’Donovan B, Zou JY, DeRisi JL, and Wells JA (2020). Deep profiling of protease substrate specificity enabled by dual random and scanned human proteome substrate phage libraries. Proc National Acad Sci 117, 25464–25475. 10.1073/pnas.2009279117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Schilling O, Huesgen PF, Barré O, auf dem Keller U, and Overall CM (2011). Characterization of the prime and non-prime active site specificities of proteases by proteome-derived peptide libraries and tandem mass spectrometry. Nat Protoc 6, 111–120. 10.1038/nprot.2010.178. [DOI] [PubMed] [Google Scholar]
  • 59.Hughes CS, Moggridge S, Muller T, Sorensen PH, Morin GB, and Krijgsveld J (2019). Single-pot, solid-phase-enhanced sample preparation for proteomics experiments. Nat Protoc 14, 68–85. 10.1038/S41596-018-0082-x. [DOI] [PubMed] [Google Scholar]
  • 60.Colaert N, Helsens K, Martens L, Vandekerckhove J, and Gevaert K (2009). Improved visualization of protein consensus sequences by iceLogo. Nat Methods 6, 786–787. 10.1038/nmeth1109-786. [DOI] [PubMed] [Google Scholar]
  • 61.Consortium TU, Bateman A, Martin M-J, Orchard S, Magrane M, Ahmad S, Alpi E, Bowler-Barnett EH, Britto R, Bye-A-Jee H, et al. (2022). UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res 51, D523–D531. 10.1093/nar/gkac1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, and Ideker T (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13, 2498–2504. 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Schilling B, Rardin MJ, MacLean BX, Zawadzka AM, Frewen BE, Cusack MP, Sorensen DJ, Bereman MS, Jing E, Wu CC, et al. (2012). Platform-independent and label-free quantitation of proteomic data using MS1 extracted ion chromatograms in Skyline: application to protein acetylation and phosphorylation. Mol. Cell. Proteom 11, 202–214. 10.1074/mcp.m112.017707. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2

Table S3. List of ProteomeXchange accession numbers and experimental information, related to STAR Methods.

3

Supplemental Item 1. Detailed protocols for PICS2 and CHOPPER, related to STAR Methods.

4

Dataset S1. E coli proteome-derived peptide libraries made with trypsin, chymotrypsin and GluC, related to Figure 1.

5

Dataset S2. E coli proteome derived peptide libraries treated with 2PCA for 4 h at 37°C, related to Figure 1.

6

Dataset S3. Temperature dependence of modification of E coli proteome derived-peptide libraries with 50 mM 2PCA for 4 h, related to Figure 2.

7

Dataset S4. Temperature dependence of modification of E coli proteome derived-peptide libraries with 50 mM 2PCA for 20 h, related to Figure 2.

8

Dataset S5. Concentration dependence of 2PCA modification of E coli proteome-derived peptide libraries, related to Figure 2.

9

Dataset S6. Human proteome-derived peptide library generated with trypsin; treated with 10 mM alkyne-2PCA for 20 h at 37°C, related to Figure 3.

10

Dataset S7. Human proteome-derived peptide library generated with trypsin; treated with 10 mM alkyne-2PCA for 20 h at 37°C; clicked with biotin azide, related to Figure 3.

11

Dataset S8. Biotin-SS-2PCA enrichment with E coli proteome-derived peptide libraries, related to Figure 3.

12

Dataset S9. Biotin-2PCA enrichment with E coli proteome-derived peptide libraries, related to Figure 3.

13

Dataset S10. Alkyne-2PCA clicked to biotin-azide enrichment with E coli proteome-derived peptide libraries, related to Figure 3.

14

Dataset S11. Alkyne-2PCA clicked to biotin-disulfide-azide enrichment with E coli proteome derived peptide libraries, related to Figure 3.

15

Dataset S12. Alkyne-2PCA clicked to biotin-DADPS-azide enrichment with E coli proteome derived peptide libraries, related to Figure 3.

16

Dataset S13. Alkyne-2PCA clicked to biotin-Dde-azide enrichment with E coli proteome derived peptide libraries, related to Figure 3.

17

Dataset S14. Alkyne-2PCA clicked to biotin-Diazo-azide enrichment with E coli proteome derived peptide libraries, related to Figure 3.

18

Dataset S15. Human proteome-derived peptide libraries generated with trypsin, chymotrypsin, and GluC, related to Figure 4.

19

Dataset S16. Human and E coli proteome-derived peptide libraries N-terminally blocked with 2PCA, related to Figure 4.

20

Dataset S17. Trypsin PICS with E coli proteome-derived peptide libraries, related to Figure 4.

21

Dataset S19. LysargiNase PICS2 with E coli proteome-derived peptide libraries, related to Figure 4.

22

Dataset S20. GluC PICS2 with E coli proteome-derived tryptic peptide library, related to Figure 4.

23

Dataset S21. Chymotrypsin PICS2 with E coli tryptic proteome-derived peptide library, related to Figure 4.

24

Dataset S22. N-terminally dimethylated E coli proteome-derived peptide libraries, related to Figure 4.

25

Dataset S23. Kex2 PICS with N-terminally dimethylated E coli proteome-derived peptide libraries, related to Figure 4.

26

Dataset S24. Furin PICS2 with N-terminally 2PCA-modified human proteome-derived peptide libraries, related to Figure 4.

27

Dataset S25. PCSK2 PICS2 with N-terminally dimethylated human proteome-derived peptide libraries, related to Figure 4.

28

Dataset S26. CHOPPER with etoposide-treated Jurkat cells, related to Figure 5.

29

Dataset S27. CHOPPER with DMSO-treated Jurkat cells, related to Figure 5.

30

Dataset S28. HUNTER with etoposide-treated Jurkat cells, related to Figure 5.

31

Dataset S29. HUNTER with DMSO-treated Jurkat cells, related to Figure 5.

32

Dataset S30. Label-free quantification of CHOPPER results, related to Figure 5.

33

Dataset S31. Putative caspase neo-N termini identified by HYTANE, related to Figure 5.

34
35

Dataset S32. Putative caspase neo-N termini identified by subtiligase N terminomics, related to Figure 5.

Data Availability Statement

  • Mass spectrometry data have been deposited at ProteomeXchange and are publicly available as of the date of publication. Accession numbers are listed in the key resources table.

  • All original code has been deposited at Zenodo and is publicly available as of the date of publication. The DOI is listed in the key resources table.

  • Data from Mahrus, et al., 20184 and Crawford, et al. 201345 is available at https://https://wellslab.ucsf.edu/degrabase/.

  • Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

RESOURCES