Abstract
Recent studies have revealed diverse amino acid, post-translational, and noncanonical modifications of proteins in diverse organisms and tissues. However, their unbiased detection and analysis remain hindered by technical limitations. Here, we present a spectral alignment method for the identification of protein modifications using high-resolution mass spectrometry proteomics. Termed SAMPEI for spectral alignment-based modified peptide identification, this open-source algorithm is designed for the discovery of functional protein and peptide signaling modifications, without prior knowledge of their identities. Using synthetic standards and controlled chemical labeling experiments, we demonstrate its high specificity and sensitivity for the discovery of substoichiometric protein modifications in complex cellular extracts. SAMPEI mapping of mouse macrophage differentiation revealed diverse post-translational protein modifications, including distinct forms of cysteine itaconatylation. SAMPEI’s robust parametrization and versatility are expected to facilitate the discovery of biological modifications of diverse macromolecules. SAMPEI is implemented as a Python package and is available open-source from BioConda and GitHub (https://github.com/FenyoLab/SAMPEI).
Keywords: post-translational chemical modification, spectral alignment, mass spectrometry, functional regulation, signaling, software, itaconate, macrophages, metabolism, SAMPEI
Graphical Abstract
INTRODUCTION
Post-translational modifications (PTMs) control diverse biological processes, including activities of most cellular proteins. This involves reversible enzymatic modifications as part of biologic signaling, as well as various nonenzymatic chemical modifications during development, aging, and disease. As a result, identification of various protein modifications has propelled the discovery of numerous biological phenomena, molecular mechanisms, and disease processes.
Historically, PTM discovery followed the development of specific biochemical assays and affinity reagents, such as for example those for protein tyrosine phosphorylation.1 The development of high-resolution mass spectrometry provided a versatile and in principle unbiased approach for the identification of diverse protein modifications.2,3 This has revealed not only diverse protein PTMs, but also increasingly unanticipated modifications.4–6 For example, recent mass spectrometry studies have reported lactylation and dopaminylation of histone residues in the regulation of gene expression.7,8 However, the extent and function of most protein modifications observed in biological tissues remain poorly defined, largely due to the technical challenges of their accurate and sensitive detection.
Most commonly, mass spectrometry proteomics is based on the analysis of proteolytic peptides.9–11 Tandem fragmentation of peptides and their corresponding high-resolution mass spectra, when sufficiently complete and unique, enable their identification and localization of distinct chemical modifications and substitutions. Most current approaches for peptide spectral matching rely on statistical scoring of the similarity between the observed and theoretical spectra predicted based on their sequence and chemical modifications.12 In practice, this strategy limits the number of potential chemical modifications considered to a relatively small number of known adducts on specific residues, whose spectra are computed in modified and unmodified forms. Inclusion of multiple possible modifications expands the theoretical search space exponentially, thereby precluding the routine use of such approaches for unbiased PTM discovery. Indeed, in current experiments, most conventional methods for peptide spectral match (PSM) assignment fail to match the majority of observed fragmentation spectra, at least in part due to diverse peptide modifications.13–15
Early attempts to circumvent this problem relied on the iterative analysis of unassigned spectra with different subsets of possible modifications.16–18 However, only a relatively small number of known PTMs could be considered, with imprecise control of false discovery.19,20 Restriction of the possible sequence search space using de novo sequence tags expanded the repertoire of variable PTMs that could be considered while significantly improving accuracy, but is restricted to spectra that have nearly complete fragment ion series.21–25 Search space reduction, as currently implemented in various algorithms,26–29 only partially overcame these limitations and remains computationally expensive and prone to false identifications. Finally, recent implementations of fragment ion indexing and other efficiency concepts support fast assignment of modified mass spectra.5,6,30–32
Protein modifications that occur during biologic signaling, such as those produced by enzymes or metabolites, are necessarily differentially abundant across different functional states and substoichiometric. As such, spectral alignment techniques, which have been used for the identification of metabolites33 and peptides17,34,35 are particularly suited for their identification, insofar as sampling of both modified and unmodified states allows for specific detection.36–39 Here, we extend this concept to develop SAMPEI, an open-source algorithm designed for the discovery of functional protein and peptide signaling modifications, without prior knowledge. We present its parametrization to optimize sensitivity and specificity using controlled chemical labeling experiments and synthetic peptides in complex cellular extracts and demonstrate its utility by mapping protein signaling during mouse macrophage differentiation. Thus, SAMPEI and improved methods for mass spectrometry proteomics should enable comprehensive studies of protein signaling and chemical biology.
EXPERIMENTAL SECTION
Software Availability
SAMPEI code and documentation are available as open source at https://github.com/FenyoLab/SAMPEI.
Reagents
Iodoacetamide (IAA, ≥99%, NMR grade), acrylamide (ACL, certified reference grade), itaconate (ITA, ≥96.0%, analytical grade) and lipopolysaccharides (LPS, from Escherichia coli O111:B4), guanidinium chloride, ammonium bicarbonate (ABC), dithiothreitol (DTT) were obtained from Sigma-Aldrich. LC-MS grade H2O and acetonitrile (ACN) were from Fisher Scientific. LC-MS grade formic acid (FA) and trifluoroacetic acid (TFA) of >99% purity was obtained from Thermo Scientific. LysC (mass spectrometry grade) was obtained from FUJIFILM Wako Chemicals. Trypsin (modified, sequencing grade), GluC (sequencing grade) and elastase were obtained from Promega. Bovine serum albumin (BSA) was obtained from Sigma-Aldrich. Human recombinant Kelch-like ECH-associated protein 1 (KEAP1, His and GST Tag) was obtained from Sino Biological, and the lyophilized formulation was prepared in 20 mM Tris, 500 mM NaCl, 10% glycerol, pH 7.4, 5% trehalose, 5% mannitol. Protease inhibitors 4-(2-aminoethyl)-benzenesulfonyl fluoride hydrochloride (AEBSF) and pepstatin were obtained from Santa Cruz, bestatin from Alfa Aesar, and leupeptin from EMD Millipore.
Synthetic DFSAFILVEFCR peptides were produced by solid phase synthesis and purified to 95% purity by liquid chromatography (New England Peptides) in three chemoforms, all with carbamidomethylated C12: (1) with unmodified phenylalanines, (2) with fluoro-F2, and (3) with fluoro-F11. Peptide composition and chemical modifications were confirmed using mass spectrometry.
Cell Culture
Human OCI-AML2 and mouse RAW264.7 cells were obtained from the American Type Culture Collection (ATCC). The identity of cells was verified by STR analysis (MSKCC Integrated Genomics Operation). The absence of Mycoplasma sp. contamination was determined using Lonza MycoAlert (Lonza). OCI-AML2 cells were cultured in RPMI 1640 (Corning) supplemented with 10% fetal bovine serum (FBS), 1% penicillin/streptomycin and 1% L-glutamine. RAW264.7 cells were cultured in DMEM with sodium pyruvate (Corning) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin/streptomycin. Cells were cultured in 5% CO2 in a humidified atmosphere at 37 °C.
Preparation of OCI-AML2 Proteome Alkylated with Iodoacetamide (IAA) or Acrylamide (ACL)
4 × 106 of OCI-AML2 cells were pelleted by centrifugation at 400g for 5 min at 4 °C. The cell pellets were washed twice with PBS, resuspended in cold lysis buffer (6 M guanidinium chloride, 100 mM ABC, pH 8.4), and lysed using S220 adaptive focused sonicator (Covaris). Protein concentration was determined using the bicinchoninic acid (BCA) assay according to the manufacturer’s instructions (Pierce). Proteomes were reduced by dithiothreitol (DTT, 10 mM) at 56 °C for 45 min and then divided into two equal fractions, and alkylated by 55 mM iodoacetamide (IAA, 55 mM) and acrylamide (ACL, 55 mM), respectively (room temperature, 30 min in the dark). Alkylation was quenched by the addition of DTT (10 mM, 56 °C, 30 min). Lysates were diluted with 100 mM aqueous ABC, pH 8.4, to a final concentration of 0.6 M guanidinium chloride and the proteomes were proteolyzed using 1:100 w/w (protease:proteome) LysC endopeptidase at 37 °C for 6 h. The solution was further diluted with 100 mM aqueous ABC to 0.3 M guanidinium chloride, and the proteomes were digested using 1:50 w/w modified trypsin at 37 °C overnight. Digestion was stopped by acidifying the reactions to pH 3 using aqueous trifluoroacetic acid (TFA), peptides were desalted using solid-phase extraction with C18 MacroSpin columns (Nest Group), and lyophilized by vacuum centrifugation (Genevac EZ-2 Elite).
Preparation of RAW264.7 Proteome under LPS Stimulation
RAW264.7 cells were seeded 10 × 106 cells per 150 mm dish (6 dishes), and allowed to adhere for 24 h. LPS was added to the medium at 50 ng/mL, and after 24 h of treatment cells were collected by gently scraping, and then pelleted by centrifugation at 400g for 5 min at 4 °C. Proteomes were extracted, alkylated with IAA only, proteolyzed, and desalted as described above. Peptides were resolved by strong-cation exchange chromatography (SCX) using a Protein Pak Hi-Res SP 7 μm 4.6 × 100 mm column (Waters), using an Alliance liquid chromatograph (Waters) equipped with an automated fraction collector. Peptides were resuspended in 100% buffer A (5% ACN/0.1% FA) and loaded at 0.5 mL/min constant flow rate. Keeping column temperature at 30 °C and flow rate at 0.5 mL/min, peptides were eluted using a segmented gradient of buffer B (1 M potassium chloride (KCl) in 5% ACN/0.1% FA) as follows: 0–5 min (0% B), 5–35 min (0–15% B), 35–50 min (15–50% B), 50–60 min (100% B), 60–75 min (0% B). In total 18 fractions were obtained, which were concentrated by vacuum centrifugation and desalted using solid-phase extraction with C18 MacroSpin columns (Nest Group). Peptides were lyophilized and stored at −80 °C until MS analysis.
Preparation of BSA/KEAP1 Modified by Itaconate
Bovine serum albumin (BSA) and recombinant human Kelch-like ECH-associated protein 1 (KEAP1) were dissolved in MS-grade H2O and mixed at equimolar amounts at a concentration of 2 μM. The mixture was reduced by 1 mM tris(2-carboxyethyl)phosphine) (TCEP) at room temperature for 1 h, and dialyzed against 100 mM ABC, 1 μM TCEP at 4 °C overnight. The BSA/KEAP1 solution was divided into 3 equal fractions, and incubated at 4 °C overnight with itaconate at 50, 5, and 0 mM concentrations, respectively. Itaconate-reacted proteins were concentrated by vacuum centrifugation, denatured using 6 M guanidinium chloride, 100 mM ABC, pH 8.4, and alkylated with IAA as described above. For LysC/trypsin digestion, the solution was diluted with 100 mM aqueous ABC, pH 8.4 to a final concentration of 0.6 M guanidinium chloride and digested using 1:50 w/w LysC at 37 °C for 6 h. The solution was further diluted to 0.3 M guanidinium chloride, and digested using 1:25 w/w modified trypsin at 37 °C overnight. For GluC or elastase digestion, the solution was diluted to 0.3 M guanidinium chloride, and digested using either 1:20 w/w GluC or elastase at 37 °C overnight. Digested peptides were purified as described above.
Liquid Chromatography and Peptide Mass Spectrometry
For analysis of proteomes reacted with iodoacetamide and acrylamide, the LC system consisted of a vented trap-elute configuration (Ekspert nanoLC 400, SCIEX) coupled to an Orbitrap Lumos mass spectrometer (Thermo Fisher Scientific, San Jose, CA) via a nano electrospray DPV-565 PicoView ion source (New Objective). The trap column was fabricated with a 5 cm × 150 μm internal diameter silica capillary with a 2 mm silicate frit,40 and pressure loaded with Poros R2-C18 10 μm particles (Life Technologies). The analytical column consisted of a 50 cm × 75 μm internal diameter column packed with ReproSil-Pur C18-AQ 3 μm particles (Dr. Maisch), and connected to a 3 cm electrospray emitter with 3 μm terminal inner diameter.41 Five μL of peptide solution were resolved over 150 min 3%–45% linear gradient of acetonitrile/0.1% formic acid (buffer B) in a water/0.1% formic acid (buffer A), delivered at 300 nL/min. Precursor ions in the 400–2000 m/z range were filtered using the quadrupole and their spectra were recorded every 3 s using the Orbitrap (240 000 resolution), with an automatic gain control target set at 105 ions and a maximum injection time of 100 ms. Data-dependent MS2 selection was enforced, limiting fragmentation to monoisotopic ions with charge 2–6, and dynamically excluding already fragmented precursors for 30 s (10 ppm tolerance). Selected precursors were isolated (Q1 isolation window 1.2 Th) and HCD fragmented (normalized stepped collision energy 24, 32, 40%) using the top speed algorithm. Fragmentation spectra were recorded in the Orbitrap at 50 000 resolution (AGC 50 000 ions, maximum injection time 86 ms).
For analysis of SCX-fractionated RAW264.7 proteomes, peptides were resolved using an Easy-nLC 1000 chromatograph (Thermo Fisher Scientific) in line with quadrupole-Orbitrap MS (Q-Exactive HF, Thermo Fisher Scientific). Peptides were injected on a EasySpray reversed phase column (50 cm × 75 μm internal diameter) with integrated electrospray emitter (Thermo Fisher Scientific) and resolved with a constant flow rate of 300 nL/min using 2–45% linear gradient of acetonitrile in water (both containing 0.1% v/v formic acid) over 150 min, followed by a 45–90% gradient over 3 and 25 min at constant 90% acetonitrile. Eluting peptides were transferred into the mass spectrometer via an EasySpray nano electrospray ion source. Precursor ions in the 400–1500 m/z range were filtered using the quadrupole and recorded every 3 s using the Orbitrap (60 000 resolution, with 445.1200 ion used as lockmass), with an automatic gain control target set at 106 ions and a maximum injection time of 50 ms. Data-dependent MS2 selection was enforced, limiting fragmentation to monoisotopic ions with charge 2–5 and MS1 intensity greater than 5 × 104, and dynamically excluding already fragmented precursors for 30 s (10 ppm tolerance). Selected precursors were isolated (Q1 isolation window 1.2 Th) and HCD fragmented (normalized collision energy 30). Product ion spectra were recorded in the Orbitrap at 15 000 resolution (AGC 5 × 104 ions, maximum injection time 54 ms).
BSA and KEAP1 reacted with itaconate peptides were resolved using an Easy-nLC 1000 chromatograph (Thermo Fisher Scientific) in line with quadrupole-Orbitrap MS (Fusion Lumos Orbitrap, Thermo Fisher Scientific). Peptides were loaded on a EasySpray reversed phase column (50 cm × 75 μm internal diameter) with integrated electrospray emitter (Thermo Fisher Scientific) and resolved with a constant flow rate of 300 nL/min using 2–45% linear gradient of acetonitrile in water (both containing 0.1% v/v formic acid) over 60 min, followed by a 45–90% gradient over 3 and 25 min at constant 90% acetonitrile. Eluting peptides were transferred into the mass spectrometer via an EasySpray nano electrospray ion source. Precursor ions in the 375–3000 m/z range were filtered using the quadrupole and recorded every 3 s using the Orbitrap (60 000 resolution, with 445.1200 ions used as lockmass), with an automatic gain control target set at 106 ions and a maximum injection time of 50 ms. Data-dependent MS2 selection was enforced, limiting fragmentation to monoisotopic ions with charge 2–5 and MS1 intensity greater than 5 × 104, and dynamically excluding already fragmented precursors for 30 s (10 ppm tolerance). Selected precursors were isolated (Q1 isolation window 0.7 Th) and HCD fragmented (normalized collision energy 30) using the top speed algorithm. Product ion spectra were recorded in the Orbitrap at 30 000 resolution (AGC 8 × 104 ions, maximum injection time 54 ms).
Data Analysis
Raw files were converted to MGF open format using MSConvert within Proteowizard42 stable release 3.0. Spectra were matched to the reference human proteome as retrieved from UniProt43 on May 23rd 2019, supplemented with contaminant proteins from the cRAP database44 and decoy sequences generated using the generate-peptides script from the Crux mass spectrometry analysis toolkit.45 Peptide spectral matching was performed using MSFragger v20190523 with FragPipe v9.3 and GUI v3.0k,5 or ByOnic v2.7.84,26,27 or MaxQuant v1.6.0.1628,46 or X! Tandem (version Alanine)47 with SAMPEI. Precursor mass tolerance was set to 5 ppm for conventional searches (MSFragger “closed” search, ByOnic without wildcard, and X! Tandem), and to 200 Da for agnostic PTM identification (MSFragger “open” search, ByOnic with wildcard function enabled, MaxQuant with “dependent peptides” function enabled, and SAMPEI). All other search parameters were kept homogeneous across searches, as permitted by the respective algorithms. Specifically, fragment ion mass tolerance was set to 20 ppm, trypsin with up to 3 missed cleavages was set as protease, M-oxidation and N/Q deamidation were set as variable modifications. All spectra were analyzed with Cys fixed modifications set to either carbamidomethylation or propionylation, dependent on the experimental conditions. PSMs were initially ranked by score (MSFragger expectscore and hyperscore, ByOnic score) and filtered to obtain a PSM level 1% FDR. Carbamidomethylated and propionylated cysteine-containing PSMs were then counted, considering both detected mass shifts corresponding to the specific adducts (i.e., +57.02 and +71.03 Da for carbamidomethylation and propionylation, respectively, defined to maintain a mass tolerance <5 ppm at m/z 2000) and apparent mass shifts produced by the difference between variable and agnostically discovered PTM. In the case of carbamidomethylated Cys detected while setting propionylation as variable modification, a mass shift of −14.0 Da was used, corresponding to the difference between the observed 57.02 Da modification and the predicted 71.02 Da mass shift. In the case of propionylated Cys detected while setting carbamidomethylation as variable modification, a mass shift of +14.0 Da was used, corresponding to the difference between the observed 71.02 Da modification and the predicted 57.02 Da mass shift. For speed comparisons, all programs were executed using 4 Intel Xeon E5–4620 processors (8 CPU cores, 2.2 GHz), with 512 GB of RAM. Calculations were limited to 16 logical cores, with no restriction on memory usage.
Prediction of chemical composition based on accurate molecular weight was obtained using the ChemCalc suite.48
Metabolomics
RAW264.7 cells were cultivated and treated as above. After 24 h of LPS treatment, metabolites were extracted and analyzed by liquid chromatography-MS (LC-MS) and gas chromatography–mass spectrometry (GC-MS). For LC-MS, metabolites were extracted from cell pellets with ice-cold 80:20 methanol:-water. After overnight incubation at −80 °C, samples were vortexed and cleared by centrifugation at 20 000g for 20 min. Supernatants were dried in a vacuum evaporator (Genevac EZ-2 Elite). Dried extracts were resuspended in 40 μL of 97:3 water:methanol containing 10 mM tributylamine and 15 mM acetic acid. Samples were vortexed, incubated on ice for 20 min, and clarified by centrifugation at 20 000g for 20 min at 4 °C. LC-MS analysis used a Zorbax RRHD Extend-C18 column (150 mm × 2.1 mm, 1.8 μm particle size, Agilent Technologies). Solvent A was 10 mM tributylamine, 15 mM acetic acid in 97:3 water:methanol, and solvent B was 10 mM tributylamine and 15 mM acetic acid in methanol, prepared according to the manufacturer’s instructions (MassHunter Metabolomics dMRM Database and Method, Agilent Technologies). LC separation was coupled to a 6470 triple quadrupole mass spectrometer (Agilent Technologies) which was operated in dynamic MRM scan type and negative ionization mode. Itaconate was identified at a retention time of ~13.4 min with an MRM transition of m/z 129 to 85.1 (primary transition used for quantitation), and m/z 129 to 41.3 (confirmatory). Retention time was also confirmed by injection of pure itaconate and the pure standard spiked into a pooled sample.
For GC-MS, metabolites were extracted from cell pellets with ice-cold 80:20 methanol:water containing 2 mM deuterated 2-hydroxyglutarate (d-2-hydroxyglutaric-2,3,3,4,4-d5 acid; deuterated-2HG) as an internal standard. After overnight incubation at −80 °C, samples were vortexed and cleared by centrifugation at 21 000g for 20 min at 4 °C. Extracts were then dried in a vacuum evaporator (Genevac EZ-2 Elite). Dried extracts were resuspended by addition of 50 mL of methoxyamine hydrochloride (40 mg/mL in pyridine) and incubated at 30 °C for 90 min with agitation. Metabolites were further derivatized by the addition of 80 mL of N-methyl-N-(trimethylsilyl) trifluoroacetamide (MSTFA) + 1% 2,2,2-trifluoro-N-methyl-N-(trimethylsilyl)-acetamide, chlorotrimethylsilane (TCMS; Thermo Scientific) and 70 mL of ethyl acetate (Sigma) and incubated at 37 °C for 30 min. Samples were diluted 1:2 with 200 mL of ethyl acetate, then analyzed using an Agilent 7890A GC coupled to Agilent 5977 mass spectrometer. The GC was operated in splitless mode with constant helium carrier gas flow of 1 mL/min and with a HP-5MS column (Agilent Technologies). The injection volume was 1 mL and the GC oven temperature was ramped from 60 to 290 °C over 25 min. Peaks representing compounds of interest were extracted and integrated using MassHunter vB.08.00 (Agilent Technologies) and then normalized to both the internal standard (deuterated-2HG) peak area and cell number or protein content as applicable. Ions used for quantification of metabolite levels were itaconate m/z 259 (confirmatory ion m/z 215) and deuterated-2HG m/z 252 (confirmatory ion m/z 354). Peaks were manually inspected and verified relative to known spectra for each metabolite. Pure itaconate (Sigma), both alone and spiked into a pooled sample, was used for confirmation.
Western Blot Analysis
LPS-stimulated RAW264.7 cells were washed twice with PBS, resuspended in RIPA lysis buffer (150 mM NaCl, 50 mM Tris-HCl, 1% NP-40, 0.5% sodium deoxycholate, 0.1% SDS, pH 8.0), supplemented with protease inhibitor cocktail (0.5 mM AEBSF, 0.01 mM bestatin, 0.1 mM leupeptin, and 0.001 mM pepstatin). Cells were mechanically disrupted using S220 adaptive focused sonicator (Covaris) and the lysate was cleared by centrifugation at 15 000g for 10 min. The protein concentration of the clarified lysates was determined using BCA assay according to the manufacturer’s instructions (Pierce). Thirty μg of proteome were resolved by denaturing electrophoresis (SDS-PAGE) through a NuPAGE 10% Bis-Tris polyacrylamide gel (Invitrogen) with MOPS SDS running buffer, and electroeluted onto Immobilon FL polyvinylidene difluoride (PVDF) membrane (EMD Millipore). The membrane was blocked with 5% w/v nonfat dry milk for 1 h at room temperature and probed overnight at 4 °C with rabbit anti-IRG1 (1:1000, D6H2Y, Cell Signaling) and anti-β actin (1:20 000, 13E5, Cell Signaling) antibodies. After washing with TBST buffer for 3 times, the membrane was then incubated with secondary antibody (donkey antirabbit, horseradish peroxidase (HRP) linked, 1:20 000) for 1 h at room temperature. Membranes were washed 5 times with TBST buffer and twice with TBS. Membranes were incubated 5 min at room temperature with SuperSignal west femto maximum sensitivity substrate (Pierce), and peroxidase activity was recorded using a BioRad imager.
Data Availability
Mass spectrometry data have been deposited to the ProteomeXchange Consortium via the PRIDE49 partner repository with the data set identifiers PXD019793, PXD019827, PXD019858, and PXD019853. Processed files are available from Zenodo (10.5281/zenodo.4313808).
RESULTS AND DISCUSSION
Differential display and spectral alignment have a long history in biology and physics for specific signal detection.50,51 To extend this concept to regulatory protein signaling and chemical modifications, we reasoned that biologically functional protein modifications are present in modified and unmodified forms in different functional states. To implement this for mass spectrometric detection of differentially modified peptides, we developed the spectral alignment-based modified peptides identification (SAMPEI) algorithm. SAMPEI is based on the rationale that biologically regulated proteins with substoichiometric modifications produce mass spectra from both the unmodified and modified peptides. These fragmentation spectra contain fragment ion series that are partly coincident and partly shifted by diagnostic mass shifts, corresponding to the masses of specific modifications.52 To illustrate this concept, consider a model peptide DF2SAFINLVEF11CR with two possible modifications, either at the F2 or the F11 residues (Figure 1a). Collision-induced dissociation of F2-modified peptide is expected to produce fragment ion spectra with coincident y-ions, and a- and b-ions with specific mass shifts, corresponding to the F2 residue modification.53 Conversely, modification of F11 residue is expected to produce y-ions with specific mass shifts as compared to those of the unmodified peptide, and the observable b- and a-ions that are largely coincident between modified and unmodified peptides (Figure 1a). Thus, given a set of high-resolution tandem mass spectra that have been matched to unmodified sequences, as obtained from current statistical database matching algorithms, SAMPEI performs pairwise alignment of fragmentation mass spectra that are partly matched to unmodified sequences. This leads to the agnostic identification of possible PTMs based on the presence fragment ion series with specific alignment features (Figure 1b).
Figure 1.
Outline and parametrization of spectral alignment-based modified peptide identification (SAMPEI) algorithm, designed for agnostic discovery of peptide modifications. (a) Overlay of specific alignment of fragment ions in high-resolution mass spectra of synthetic peptide DFSAFINLVEFCR unmodified (black) or containing fluorinated phenylalanine (blue and red, fluorination mass shift of +17.9905) at position 11. Fragment ions from the modified peptides that coincide with those from the unmodified form are marked in blue. Fragment ion mass shifts produced by phenylalanine fluorination are marked in red. (b) Schematic of the SAMPEI algorithm: tandem mass spectra identified using peptide-spectral matching algorithms are used by SAMPEI as queries for partial matching against unmatched spectra, which often result from unpredicted chemical modifications and mutations. On the basis of the presence of partially coincident ion series, SAMPEI agnostically identifies sequences and mass shifts of modified peptides. (c) Experimental schematic for the parametrization of sensitivity and specificity of SAMPEI using specific differentially alkylated proteomes. (d,e) Heatmaps showing optimization of matched proportional intensity MPI and largest gap percentage LGP parameters to maximize precision (d) and recall (e), with carbamidomethylation as a known modification. Red symbols denote the default user-specified values.
We implemented SAMPEI using the Python programming language, openly available from Github using conda or pip (https://github.com/FenyoLab/SAMPEI). SAMPEI requires as input a set of high-resolution fragmentation spectra in MGF format and a list of high-confidence peptide-spectra matches (PSMs) defining a set of query spectra. By processing mass spectra in the open MGF format, SAMPEI is compatible with spectra recorded using all current high-resolution mass spectrometers, and analyzed using most current algorithms for PSM assignment. Tandem mass spectra are initially assigned to peptide sequences using any algorithm for peptide-spectral matching, and the resulting high-confidence PSMs define a set a of query spectra for SAMPEI. Here, we used X!Tandem for the initial implementation of SAMPEI, given its open source nature and ease of implementation on diverse computer systems, including high-performance cluster computers.47 Using X! Tandem expectation values, we selected high-confidence PSMs (1% false discovery rate) as the query spectra. The remaining unmatched spectra were then considered by SAMPEI as candidates for agnostic identification of modified peptides, serving as targets for aligning the spectra of query PSMs. Users specify the molecular mass range of possible peptide modifications, which by default is set to include molecular weight shifts smaller than 200 Da and greater than 10 Da. This range includes most currently known functional PTMs, as compiled in Unimod (http://www.unimod.org),54 but excludes mass shifts from natural abundance isotopes. Each candidate target-query match is sequentially scored based on two orthogonal measures to evaluate their similarity (pseudocode provided as Document S1). The matched peptide intensity (MPI) is the ratio of MS2 ion intensity of the target scan matched to the theoretical fragments of a given peptide sequence over the total MS2 ion intensity of the target scan. The largest gap percentage (LGP) is the ratio of the largest consecutive b/y ion missing over the length of the peptide sequence. First, SAMPEI aligns discrete m/z ranges of the query and target spectra, and ranks putative matches based on the fraction of total fragment ion current measured in the query as compared to the target spectra, a metric defined as matched proportional intensity (MPI). Next, for each target-query match SAMPEI calculates the largest gap percentage (LGP) metric, defined as the fraction of the length of the peptide sequence consisting of consecutive undetected b- and y-ions. To localize the identified adduct on the peptide sequence, SAMPEI consider sequentially each residue as potential modification site. For each position, SAMPEI generates a theoretical fragment spectrum and aligns it with the empiric target spectrum. Matched peptide intensity for each theoretic spectrum is then calculated as described above, and the modification is assigned based on the match which maximizes the MPI. When no solution provides a maximum, usually because the corresponding diagnostic ions are not present in the fragment spectrum, SAMPEI conventionally localizes the modification to the first residue of the peptide.
We empirically calibrated these parameters to establish the sensitivity and specificity of SAMPEI for the agnostic discovery of specific peptide modifications in complex cellular proteomes in a controlled labeling experiment. We generated extracts of human OCI-AML2 cells and specifically modified their tryptic peptides with either iodoacetamide or acrylamide (Figure 1c). This generated otherwise identical complex mixtures with differentially alkylated peptides, bearing either 57.02 or 71.03 Da modifications, corresponding to cysteine carbamidomethylation or propionylation, respectively. Thus, we calculated the apparent sensitivity/recall and specificity/precision of detecting differentially alkylated peptides in complex mixtures as a function of variable MPI and LGP parameters. We used SAMPEI to compare high-resolution fragmentation spectra derived from human cell extracts containing peptides with either carbamidomethylated or propionylated cysteine residues, or their equimolar mixtures (Figure 1d,e; Figure S1a,b). This allowed us to specify default MPI and LGP values that optimize SAMPEI sensitivity and specificity (marked by red symbols). These parameters can be adjusted by users to increase sensitivity or specificity, depending on the specific experimental needs. SAMPEI uses the fraction of matched target and query ion series to assign putative modifications to specific peptide residues. For spectra with incomplete ion series, putative modification site localization is conventionally assigned by default to the first residue (N-terminus) in the peptide sequence. These features should allow for specific and sensitive detection of peptide modifications in complex cellular proteomes.
To test the ability of SAMPEI to discover unanticipated modifications, we designed a synthetic peptide bearing a specific chemical modification not observed naturally. From the analysis of currently described protein modifications curated in UniMod, we identified +17.99 Da as a unique mass shift not detected among any naturally occurring protein modifications thus far.54 Thus, we used the DF2SAFINLVEF11CR peptide derived from human PRKDC protein that is normally expressed by all human cells, and generated its two synthetic versions, containing fluorinated phenylalanine residues in position 2 or 11, expected to produce fragment ions series partly shifted by +17.99 Da due to fluorination. We confirmed the accurate composition of synthetic peptides by mass spectrometry in neat solvent and acquired high-resolution mass spectra of these peptides diluted in human OCI-AML2 cell extracts (Figure 2a; Figure S2a). For both F2- and F11-fluorinated peptides, SAMPEI alignment correctly identified the +17.99 Da mass shift produced by this modification of the synthetic as compared to endogenous peptides. As the data set contained spectra from both fluorination isomers, as well as the unmodified peptide, SAMPEI matched the spectra from the PSM of the unmodified peptides to spectra of the fluorinated peptides, obtained from precursors with m/z shifted by 8.99 Th (z = 2), corresponding to the 17.99 Da shift in the peptide molecular weight. In addition, SAMPEI matched spectra from the unmodified peptides that had not been assigned by X!Tandem (visible in the plots as dots at Δ m/z = 0). This led to PSMs of unmodified peptides with spectra not previously matched by X!Tandem but nearly isobaric to the unmodified peptides (such as those from isotopologues or deamidated peptides). At the same time, spectra of fluorophenylalanine-containing peptides were not matched when the unmodified chemoform was absent in the queries pool, as modeled by a set of spectra recorded under identical experimental conditions from E. coli extracts (Figure 2b; Figure S2b). These results indicate that SAMPEI is suitable for unbiased discovery of specific peptide modifications in complex biological proteomes.
Figure 2.
SAMPEI achieves sensitive and specific unbiased discovery of peptide modifications in complex cellular proteomes. (a,b) Fraction of matched spectra as a function of modification mass shift (m/z) demonstrating accurate identification of synthetically fluorinated DFSAFINLVE(fluoro-F)CR peptide (17.99 Da mass shift, corresponding to 8.995 Th for +2 charged ions, as marked by dotted gray line) using as queries human cell extract containing the unmodified peptide (a) or an E. coli proteome not containing the unmodified chemoform, and thus serves as a negative control (b). Red symbols denote query PSMs matched by SAMPEI to target spectra. Gray symbols denote matches of the fluorinated peptide with unrelated unmodified spectra. (c,d) Comparison of sensitivity (c) and specificity (d) of agnostic identification of specific IAA or ACL cysteine modifications using SAMPEI, ByOnic, MSFragger, and MaxQuant. Bars represent the mean values obtained for agnostic identification of carbamidomethylated and propionylated cysteines, respectively. For each experiment, the two symbols represent the values obtained in two experiments, where cysteine fixed modification was set either to carbamidomethylation or propionylation.
To establish the relative sensitivity and specificity of SAMPEI, we compared its performance with MSFragger (open search), MaxQuant (dependent peptide function), and ByOnic (wild card function) algorithms that are compatible with open format spectral analysis for unbiased PTM identification.5,26–28,46 We generated chemically modified model proteomes by reacting human OCI-AML2 cell extracts with iodoacetamide or acrylamide. This produced carbamidomethylated or propionylated cysteine residues, respectively. The two proteomes were analyzed individually and as 1:1 mixture, with RAW files converted using uniform parameters into MGF format, and analyzed using SAMPEI-X!Tandem, MSFragger, ByOnic, and MaxQuant to detect specific forms of cysteine alkylation (the experimental design is shown in Figure 1c). To determine the relative sensitivity and specificity, we analyzed the spectra setting either cysteine carbamidomethylation or propionylation as a fixed modification and calculating the agnostic recovery of the alternative modification. We observed that the fraction of PSMs from specifically modified peptides, which is a measure of sensitivity, was slightly greater in SAMPEI-X!Tandem analyses (44%) than those calculated and ByOnic, (39%), nearly identical with MSFragger (44%) but lower than MaxQuant (53%, Figure 2c, Figure S3). The sets of agnostically identified modified peptides were largely shared among all search engines (Figure S2a,b). The absolute numbers of PSMs from agnostically identified alkylated peptides were 556, 783, 1001, and 1203 for SAMPEI-X!Tandem, ByOnic, MSFragger and MaxQuant, respectively (Figure S2c). These differences reflect the smaller number of PSMs produced by X!Tandem as compared to the other algorithms used for conventional searches with variable modifications. This is consistent with the design of SAMPEI, which requires matching spectra of modified and unmodified peptides. To deconvolve the sensitivity of SAMPEI from that of X!Tandem, and to confirm its compatibility with other search engines, we used the output from ByOnic, MSFragger, and MaxQuant (all set to perform conventional searches with variable alkylation) as input for agnostic discovery by SAMPEI. SAMPEI correctly assigned 77% of spectra corresponding to the identifications from ByOnic wild-card, 63% of MSFragger open search, and 47% of MaxQuant with dependent peptide search (Figure S2d). The majority of peptides identified by SAMPEI was a subset of those identified by the native functions for agnostic searches (Figure S2e–g).
To test the specificity of agnostic identification by SAMPEI, we reasoned that spectra produced by peptides with cysteines that are exclusively carbamidomethylated or propionylated can be used to determine the rate of false identification, as for example the detection of apparent propionylation from peptides that are actually carbamidomethylated. Thus, we defined PSMs with apparent cysteine propionylation in samples labeled solely with iodoacetamide (where carbamidomethylation was expected) as false identifications, and vice versa. This analysis demonstrated that the default parametrization of SAMPEI has a mean false unbiased identification rate for modified peptides in complex human cell extracts of 0.51%, as compared to 0.57%, 0%, and 0.77% by MSFragger, ByOnic, and MaxQuant, respectively (Figure 2d). Thus, SAMPEI demonstrates excellent specificity and sensitivity for the unbiased discovery of protein modifications in complex proteomes. Efficiency of calculation was estimated based on the duration of the analysis of 48 161 MS2 spectra from mixed carbamidomethylated and propionylated peptides, setting either of the two modification as fixed. SAMPEI (including the first pass X!Tandem analysis) completed the task in 7552 s (2 h 6 min), as compared to 1612, 232 911, and 14 390 s of MSFragger (open search), ByOnic (wild card search), and MaxQuant (dependent peptide search), respectively (Figure S4).
SAMPEI is motivated by the hypothesis that unanticipated regulatory protein modifications can be discovered from comprehensive analysis of biological systems. To explore this idea, we sought to map protein modifications induced during mammalian cell differentiation. We chose to study the response of mouse RAW264.7 macrophage cells to lipopolysaccharide (LPS), a potent inducer of macrophage activation and differentiation that involves extensive protein and metabolic signaling.55–58 We treated RAW264.7 cells with LPS using established methods and analyzed the resultant proteomes by two-dimensional nanoscale liquid chromatography and high-resolution mass spectrometry proteomics.59 Using SAMPEI parameters designed to maximize the specificity of unbiased discovery (MPI ≥ 0.5, LGP ≤ 0.4), we identified 21 846 unique PSMs with putative peptide modifications producing mass shift alignments between 10 and 200 Da (Figure 3b, Figure S5, Table S1). SAMPEI assigned approximately 60% of putative modifications to specific peptide residues. Putative modifications identified by SAMPEI include several commonly observed ones, such as methionine oxidation (+15.99 Da), lysine acetylation (+42.01 Da) and serine and threonine phosphorylation (+79.97 Da). Consistent with the known reactivity of high concentrations of iodoacetamide as part of sample preparation, we also observed putative carbamidomethylation (+57.02 Da) of noncysteine residues, in agreement with prior reports4,6 (Figure 3c). SAMPEI also identified peptide modifications with a wide range of diverse mass shifts (Figure S5). It is probable that many apparent amino acid modifications, in particular those on aliphatic residues, represent amino acid substitutions due to somatic nucleotide mutations acquired in cultured cells, as recently documented using proteogenomic approaches.59 Many of the putative modifications have also been observed in prior studies, including transpeptidation, and chemical transformations of various amino acid side chains by glycans and ion coordination.5,6 While some of them arise from biochemical reactions upon cell lysis and proteome processing, we anticipate that many of them occur biologically and represent unanticipated regulatory processes.
Figure 3.
Agnostic protein signaling discovery in LPS-stimulated macrophage differentiation. (a) Schematic of RAW264.7 macrophage cells LPS-induced differentiation, followed by proteome extraction, proteolysis, and peptide chromatography and high-resolution tandem mass spectrometry. (b) Histogram of peptide-spectrum matches (PSM) as a function of peptide modification mass shifts (21 846 peptides with MPI ≥ 0.7). Selected spectral matches with putative modifications are labeled with arrows. (c) Frequency and putative amino acid localization of chemical adduct molecular weights discovered by SAMPEI.
The known reactivity of cysteine residues in proteins led us to examine their modifications upon LPS-induced macrophage activation.60–62 Thus, we examined the most abundant putative modifications assigned by SAMPEI to cysteine residues upon LPS treatment (Figure 4a). We attempted to deduce their identity from the analysis of high-accuracy mass measurements, as additionally modified by single or double carbamidomethylation induced by iodoacetamide. While some of the observed mass shift alignments could be attributed to known modifications annotated by UniMod, we also observed 306 spectra with unanticipated +130.03 Da and +146.02 Da mass shifts (Figure 4a). We confirmed these SAMPEI identifications by manual inspection of the corresponding high-resolution fragment ion spectra from several modified and unmodified peptide pairs (Figure 4b,c, Figures S6–S7). The observed high-accuracy mass measurements of the putative modifications are consistent with the C5H6O4 and C5H6O5 elemental composition for the +130.03 Da and +146.02 Da modifications, respectively.
Figure 4.
Noncanonical cysteine modifications discovered by SAMPEI. (a) Ten most prevalent noncanonical cysteine PTMs discovered by SAMPEI, with annotation of putative adducts based on high-accuracy mass measurements. Concurrent carbamidomethylation was considered in the two rightmost columns. Red denotes modification molecular weights not annotated in UniMod. (b,c) Representative mass spectra pairs of SPCS2 peptides from containing either unmodified, experimentally carbamidomethylated cysteine, or bearing 130.03 Da (b) and 146.02 Da (c) modifications. Ions shared by unmodified and modified peptides (blue), those exclusive for unmodified peptides (black), and those specific for +130.03 Da or +146.02 Da modified peptides (red) are labeled accordingly.
To confirm the prevalence of the 130.02 and 146.02 Da adducts in the proteome of activated macrophages, we analyzed the data using conventional peptide-spectral matching using X! Tandem with carbamidomethylation of cysteine set as fixed modification, but allowing on the same residue additional modification of 73.0051 or 89.0000 Da, corresponding to replacement of the 57.0214 Da iodoacetamide adduct with the 130.0266 and 146.0214 Da adducts, respectively. This analysis assigned 47 007 spectra from peptides containing cysteine residues (FDR < 0.01), 770 (1.6%) of which were modified with the 130.02 Da adduct, and 226 (0.4%) with the 146.02 Da (Table S2). Identification of chemically modified peptides using SAMPEI assigned 30.7% of the spectra which X!Tandem assigned to peptide bearing the 130.03 and 146.02 Da adducts. Among the spectra that SAMPEI failed to match, 209 were assigned by X!Tandem to peptides that were not detected in their unmodified form, and were thus not expected to be detected based on the software fundamental rationale. Furthermore, peptides uniquely detected as modified with the 130.02 or 146.02 Da adducts were mostly identified from only one spectrum, and the corresponding PSMs had lower statistical confidence compared to peptides detected as multiple isoforms, suggesting that these analytes produced less complete fragmentation spectra (Figure S8). Conversely, 164 out of 170 (96.5%) peptides that X!Tandem identified as multiple modification isomers had also been previously identified by SAMPEI.
Having confirmed the correct identification of the 130.03 and 146.02 Da modifications, we sought to determine the adducts producing the observed mass shifts. Recently, LPS-stimulated macrophages were reported to produce itaconate, an electrophilic dicarboxylic acid that is similar to succinate and fumarate TCA cycle intermediates. Itaconate can chemically react with nucleophilic amino acids.63–66 Indeed, we confirmed the generation of free itaconate and the expression of ACOD1 (IRG1) aconitate decarboxylase enzyme responsible for its production in RAW264.7 cells upon LPS stimulation using small molecule mass spectrometry and Western immunoblotting, respectively (Figure 5a–c, Figures S9–S10). Gene Ontology (GO) annotation of all unique peptides with putative itaconate modifications revealed enrichment for mitochondrion and ribosome localization, and for processes related to glycolysis, protein translation and stabilization (Figure S11). Interestingly, we observed cysteine itaconatylation of peptides corresponding to gamma-interferon-inducible lysosomal thiol reductase (GILT) and protein disulfide-isomerase A3 (PDIA3), both of which regulate cellular thiol/disulfide redox balance. We also observed itaconatylation of GAPDH and ALDOA, similar to a recent study of thiol reactivity using chemical profiling.67 While the +130.03 Da modified proteins can be explained by itaconate, we reasoned that the +146.02 Da modified peptides may be due to the modification by oxidized itaconate or oxidation of itaconatylated cysteine to sulfoxide, an oxidized +15.99 Da form of cysteine induced by redox cellular signaling. Given that redox signaling can also contribute to macrophage activation,68,69 these results suggest that LPS-induced mouse macrophage differentiation involves cysteine-dependent signaling by apparent itaconatylation and oxidation of cysteine.
Figure 5.
Validation of itaconate-induced cysteine modifications. (a) RAW 264.7 macrophages were unstimulated (left) or stimulated with LPS (right), followed by metabolite extraction and GC-MS analysis. Extracted ion chromatograms (XIC) for m/z 259 are shown, demonstrating induction of itaconate in response to LPS stimulation. Inset: structure of bis(trimethylsilyl)-itaconate. (b) Quantification of itaconate production by RAW264.7 macrophages upon LPS stimulation as measured by LC-MS in cells (n = 3 biologic replicates). (c) Representative Western blot of Irg1 protein expression in RAW264.7 macrophages upon LPS-stimulation, with actin as loading control. (d) Itaconate reacts with purified BSA and KEAP1 in vitro to produce specific cysteine modifications, as shown by the distribution of identified spectra as a function of increasing itaconate concentration (pink and red), as compared to PBS control (gray). Blue arrowheads mark concentration-dependent induction of +130 Da and +146 Da modified peptides. (e) Putative site localization of observed modifications in BSA and KEAP1, with specific itaconate induction of cysteine modifications. (f) Schematic representation of KEAP1 protein functional domains, with itaconate modified cysteine residues.
To test this hypothesis and directly confirm protein cysteine itaconatylation detected by SAMPEI, we examined purified recombinant proteins upon exposure to physiologically relevant concentrations of itaconate in vitro. We chose BSA given its established suitability for mass spectrometry analysis, as well as KEAP1, a known protein sensor of cellular redox homeostasis.70 We treated purified BSA and KEAP1 proteins with either 5 or 50 mM synthetic itaconate, as compared to mock control reactions, and analyzed the resultant chemical products by independent analysis using digestions with trypsin, GluC, and elastase proteases, followed by high-resolution tandem mass spectrometry (Figure S12). As predicted, we observed concentration-dependent formation of both +130.03 Da and +146.02 Da modifications on cysteine residues, specifically in the presence of itaconate but not in mock control treated reactions (Figure 5d,e, Figures S13–S15). We confirmed that the two unanticipated cysteine modifications identified in LPS-activated macrophage were indeed due to itaconate. Specifically, we detected four KEAP1 peptides modified with the +130.03 Da adducts on cysteine residues 38, 288, and 489 (Figure 5f). While alkylation of Cys288 was previously observed,65 Cys38 and Cys489 represent additional regulatory alkylation sites. We verified the presence of identical itaconatylated cysteine residues in independent, differentially digested experimental replicates of both BSA and KEAP1, such as for example Cys489 itaconatylation in both trypsin and elastase-digested samples. Thus, physiologically relevant concentrations of itaconate can induce cysteine itaconatylation in diverse proteins.
CONCLUSIONS
Covalent modifications of diverse macromolecules contribute to multiple forms of biological signaling. In the case of protein signaling, recent studies have revealed an increasingly diverse and unanticipated repertoire of enzymatic and nonenzymatic amino acid modifications. The ability to comprehensively map and identify protein modifications is critical to understand biochemical mechanisms of biological signaling. SAMPEI is designed for the identification of peptide modifications from high-resolution mass spectrometry data without prior knowledge.
The advantage of SAMPEI is its explicit detection of pairs of modified and unmodified peptides, thereby enabling the identification of potential biochemical signaling events that occur differentially in distinct biological states. Globally, SAMPEI exhibits excellent sensitivity while maintaining high specificity. While false discovery remains a challenge for open and PTM tolerant searches, the dependence of SAMPEI on nearly complete ion series and high-confidence PSMs leads to accurate identifications, as confirmed by manual inspection of representative spectra. We anticipate that extension of future versions of SAMPEI to incorporate high-resolution MS3 fragmentation spectra will enable direct analysis of the composition amino acid adducts and noncanonical amino acids, as enabled by the development of small molecule spectral libraries such as METLIN.71 While we used X!Tandem for the identification of unmodified peptides, SAMPEI is compatible with all currently used peptide spectral matching algorithms, including open format input. This feature distinguishes SAMPEI from other programs for agnostic search that are integrated with, and consequently restricted to, specific scoring functions and algorithms, and permits reanalysis of PSMs generated by any workflow, such as those leveraging multiple search engines and spectral library-based or de novo sequencing software.
We expect that improved peptide fragmentation methods can be used to boost the sensitivity of protein modification mapping by SAMPEI by generating informative ion series that contain chemical modifications that currently alter the properties of collision-induced peptide fragmentation. Similarly, improved scoring functions can be integrated to enhance the detection of multiple combined modifications. In addition, the use of proteogenomic strategies for the initial database-driven identification of unmodified peptides can be used to assign spectra with mass shifts produced by amino acid substitutions, thereby improving the sensitivity and specificity of PTM identifications. SAMPEI can be directly integrated with programs for quantitative analysis such as MaxQuant,46 which can be used for precise studies of biological PTM dynamics. For example, quantitation of putative PTMs can aid in the identification of mechanisms of their regulation.32
Most importantly, SAMPEI and other methods designed for PTM identification without prior knowledge are ideally suited for biological discovery. We demonstrate this by the identification of unanticipated forms of protein cysteine itaconatylation during LPS-induced macrophage activation, as confirmed by direct labeling experiments. Some of the itaconatylated proteins induced upon macrophage activation are involved in cellular redox homeostasis, suggesting a mechanistic link between TCA cycle reprogramming and redox signaling, as supported by the identification of oxidized forms of itaconatylated cysteine upon macrophage activation. Further studies will be needed to define the mechanisms and functions of this unanticipated signaling interaction. Similarly, future studies will be needed to define the contribution of KEAP1 Cys38 and Cys489 oxidation and itaconatylation to its control of macrophage function.
The extent and function of most protein modifications observed in biological tissues remain largely undefined, largely due to the technical challenges of their accurate and sensitive detection. In addition to enzymatically derived PTMs, recent studies have also revealed protein modifications produced by the spontaneous reaction of cellular metabolites.67,72 Enzymatic and metabolite-derived PTMs may signal by regulating enzymatic activities or protein interactions, as exemplified by the direct link between energetic metabolites and post-translational histone modifications in cells.73 As sensitivity of mass spectrometry now approaches near-complete direct analysis of complex cellular and tissue proteomes,74,75 improved methods for the discovery of protein and macromolecular modifications without prior knowledge, such as SAMPEI, should enable comprehensive studies of biological signaling.
Supplementary Material
ACKNOWLEDGMENTS
We thank Henrik Molina, Soeren Heissel, and Caitlin Streckler for technical assistance. This research was supported by the NIH R01 CA204396, R01 CA214812, R21 CA235285, P30 CA008748, St. Baldrick’s Foundation, Hyundai Hope on Wheels, Burroughs Wellcome Fund, Damon Runyon-Richard Lumsden Foundation, Rita Allen Foundation, Leukemia and Lymphoma Society, the Starr Cancer Consortium, Pershing Square Sohn and Mathers Foundations, and Mr. William H. and Mrs. Alice Goodwin and the Commonwealth Foundation for Cancer Research and the Center for Experimental Therapeutics at MSKCC. AK is a consultant to Novartis and Rgenta.
ABBREVIATIONS
- PTM
post-translational modification
- PSM
peptide-spectrum match
- IAA
iodoacetamide: ACL acrylamide
- ACN
acetonitrile
- FA
formic acid
- TFA
trifluoroacetic acid
- BSA
bovine serum albumin
- KEAP1
Kelch-like ECH-associated protein 1
- LPS
lipopolysaccharide
- ABC
ammonium bicarbonate
- DTT
dithiothreitol
- MS
mass spectrometry
- LC
liquid chromatography
- GC
gas chromatography
- SCX
strong cation exchange
- HCD
higher-energy C-trap dissociation
- AGC
automatic gain control
- MRM
multiple reaction monitoring
- 2-HG
2, hydroxyglutarate
- TBS(T)
Tris buffered saline (Tween)
Footnotes
Complete contact information is available at: https://pubs.acs.org/10.1021/acs.jproteome.0c00638
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jproteome.0c00638.
Figure S1: SAMPEI parametrization using specific differentially alkylated proteomes; Figure S2: SAMPEI achieves sensitive and specific unbiased discovery of peptide modifications in complex cellular proteomes; Figure S3: Absolute sensitivity of SAMPEI, ByOnic, MSFragger, and MaxQuant; Figure S4: Computational efficiency of SAMPEI; Figure S5: Distribution of putative modified sites for agnostically identified PTMs; Figure S6: Representative mass spectra pairs for peptides with either unmodified experimentally induced +57.02 Da modification or biologic +130.02 Da modification of peptide cysteines; Figure S7: Representative mass spectra pairs for peptides with either unmodified experimentally induced +57.02 Da modification or biologic +146.02 Da modification of peptide cysteines; Figure S8: Spectral quality of chemically modified peptides detected as multiple or single modification isoforms; Figure S9: Quantification of itaconate production by LPS-stimulated macrophages; Figure S10: IRG1 immunoblot–entire membrane; Figure S11: GO term enrichment of proteins with +130.02 Da and +146.02 Da peptide modifications; Figure S12: Schematic of itaconate reactions with purified BSA and KEAP1 proteins, followed by proteolysis and peptide chromatography and high-resolution tandem mass spectrometry; Figure S13: Agnostic PTM profiling of itaconate-reacted purified BSA and KEAP1 in vitro, as studied by tryptic peptide mass spectrometry; Figure S14: Agnostic PTM profiling of itaconate-reacted purified BSA and KEAP1 in vitro, as studied by elastase peptide mass spectrometry; Figure S15: Agnostic PTM profiling of itaconate-reacted purified BSA and KEAP1 in vitro, as studied by GluC peptide mass spectrometry; Document S1: Pseudocode of the key spectral matching scoring functions in SAMPEI (PDF)
Table S1: List of chemically modified peptides agnostically identified using SAMPEI in the proteome of LPS stimulated RAW264.7 cells (XLSX)
Table S2: List of peptides with putative itaconate adducts identified in RAW264.7 proteomes using variable modification search (XLSX)
The authors declare no competing financial interest.
All identification results used to generate figures and tables are available from Zenodo (10.5281/zenodo.4313808).
Contributor Information
Paolo Cifani, Molecular Pharmacology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York 10021, United States;.
Zhi Li, Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York 10016, United States.
Danmeng Luo, Molecular Pharmacology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York 10021, United States.
Mark Grivainis, Institute for Systems Genetics, NYU Grossman School of Medicine, New York 10016, United States.
Andrew M. Intlekofer, Human Oncology & Pathogenesis Program and Department of Medicine, Memorial Sloan Kettering Cancer Center, New York 10021, United States
David Fenyö, Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York 10016, United States;.
Alex Kentsis, Molecular Pharmacology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York 10021, United States; Tow Center for Developmental Oncology, Department of Pediatrics, Memorial Sloan Kettering Cancer Center, and Departments of Pediatrics, Pharmacology, and Physiology & Biophysics, Weill Medical College of Cornell University, New York 10021, United States;.
REFERENCES
- (1).Aebersold R; et al. How many human proteoforms are there? Nat. Chem. Biol 2018, 14, 206–214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (2).Aebersold R; Mann M Mass spectrometry-based proteomics. Nature 2003, 422, 198–207. [DOI] [PubMed] [Google Scholar]
- (3).Aebersold R; Mann M Mass-spectrometric exploration of proteome structure and function. Nature 2016, 537, 347–355. [DOI] [PubMed] [Google Scholar]
- (4).Chick JM; et al. A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides. Nat. Biotechnol 2015, 33, 743–749. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (5).Kong AT; Leprevost FV; Avtonomov DM; Mellacheruvu D; Nesvizhskii AI MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods 2017, 14, 513–520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (6).Devabhaktuni A; et al. TagGraph reveals vast protein modification landscapes from large tandem mass spectrometry datasets. Nat. Biotechnol 2019, 37, 469–479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (7).Zhang D; et al. Metabolic regulation of gene expression by histone lactylation. Nature 2019, 574, 575–580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (8).Lepack AE; et al. Dopaminylation of histone H3 in ventral tegmental area regulates cocaine seeking. Science 2020, 368, 197–201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (9).MacCoss MJ; et al. Shotgun identification of protein modifications from protein complexes and lens tissue. Proc. Natl. Acad. Sci. U. S. A 2002, 99, 7900–7905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (10).Mann M; Jensen ON Proteomic analysis of post-translational modifications. Nat. Biotechnol 2003, 21, 255–261. [DOI] [PubMed] [Google Scholar]
- (11).Smith LM; Kelleher NL Consortium for Top Down Proteomics. Proteoform: a single term describing protein complexity. Nat. Methods 2013, 10, 186–187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (12).Eng JK; McCormack AL; Yates JR An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom 1994, 5, 976–989. [DOI] [PubMed] [Google Scholar]
- (13).Keller A; Nesvizhskii AI; Kolker E; Aebersold R Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem 2002, 74, 5383–5392. [DOI] [PubMed] [Google Scholar]
- (14).Nielsen ML; Savitski MM; Zubarev RA Extent of modifications in human proteome samples and their effect on dynamic range of analysis in shotgun proteomics. Mol. Cell Proteomics 2006, 5, 2384–2391. [DOI] [PubMed] [Google Scholar]
- (15).Michalski A; Cox J; Mann M More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data-dependent LC-MS/MS. J. Proteome Res 2011, 10, 1785–1793. [DOI] [PubMed] [Google Scholar]
- (16).Jensen ON; Podtelejnikov AV; Mann M Identification of the components of simple protein mixtures by high-accuracy peptide mass mapping and database searching. Anal. Chem 1997, 69, 4741–4750. [DOI] [PubMed] [Google Scholar]
- (17).Creasy DM; Cottrell JS Error tolerant searching of uninterpreted tandem mass spectrometry data. Proteomics 2002, 2, 1426–1434. [DOI] [PubMed] [Google Scholar]
- (18).Huang X; et al. ISPTM: an iterative search algorithm for systematic identification of post-translational modifications from complex proteome mixtures. J. Proteome Res 2013, 12, 3831–3842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (19).Everett LJ; Bierl C; Master SR Unbiased statistical analysis for multi-stage proteomic search strategies. J. Proteome Res 2010, 9, 700–707. [DOI] [PubMed] [Google Scholar]
- (20).Savitski MM; et al. Measuring and managing ratio compression for accurate iTRAQ/TMT quantification. J. Proteome Res 2013, 12, 3586–3598. [DOI] [PubMed] [Google Scholar]
- (21).Mann M; Wilm M Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem 1994, 66, 4390–4399. [DOI] [PubMed] [Google Scholar]
- (22).Searle BC; et al. High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/MS de novo sequencing results. Anal. Chem 2004, 76, 2220–2230. [DOI] [PubMed] [Google Scholar]
- (23).Na S; Paek E Prediction of novel modifications by unrestrictive search of tandem mass spectra. J. Proteome Res 2009, 8, 4418–4427. [DOI] [PubMed] [Google Scholar]
- (24).Dasari S; et al. TagRecon: high-throughput mutation identification through sequence tagging. J. Proteome Res 2010, 9, 1716–1726. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (25).Han X; He L; Xin L; Shan B; Ma B PeaksPTM: Mass spectrometry-based identification of peptides with unspecified modifications. J. Proteome Res 2011, 10, 2930–2936. [DOI] [PubMed] [Google Scholar]
- (26).Bern M; Cai Y; Goldberg D Lookup peaks: a hybrid of de novo sequencing and database search for protein identification by tandem mass spectrometry. Anal. Chem 2007, 79, 1393–1400. [DOI] [PubMed] [Google Scholar]
- (27).Bern MW; Kil YJ Two-dimensional target decoy strategy for shotgun proteomics. J. Proteome Res 2011, 10, 5296–5301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (28).Cox J; et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res 2011, 10, 1794–1805. [DOI] [PubMed] [Google Scholar]
- (29).Shortreed MR; et al. Global Identification of Protein Post-translational Modifications in a Single-Pass Database Search. J. Proteome Res 2015, 14, 4714–4720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (30).Chi H; et al. Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine. Nat. Biotechnol 2018, 36, 1059. [DOI] [PubMed] [Google Scholar]
- (31).Bittremieux W; Laukens K; Noble WS Extremely Fast and Accurate Open Modification Spectral Library Searching of High-Resolution Mass Spectra Using Feature Hashing and Graphics Processing Units. J. Proteome Res 2019, 18, 3792–3799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (32).Kahnert K; Schlaffner CN; Muntel J FLEXIQuant-LF: Robust Regression to Quantify Protein Modification Extent in Label-Free Proteomics Data. bioRxiv, May 13, 2020. DOI: 10.1101/2020.05.11.088492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (33).Garg N; et al. Mass spectral similarity for untargeted metabolomics data analysis of complex mixtures. Int. J. Mass Spectrom 2015, 377, 719–727. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (34).Craig R; Cortens JC; Fenyö D; Beavis RC Using annotated peptide mass spectrum libraries for protein identification. J. Proteome Res 2006, 5, 1843–1849. [DOI] [PubMed] [Google Scholar]
- (35).Yates JR; Morgan SF; Gatlin CL; Griffin PR; Eng JK Method to compare collision-induced dissociation spectra of peptides: potential for library searching and subtractive analysis. Anal. Chem 1998, 70, 3557–3565. [DOI] [PubMed] [Google Scholar]
- (36).Pevzner PA; Mulyukov Z; Dancik V; Tang CL Efficiency of database search for identification of mutated and modified proteins via mass spectrometry. Genome Res. 2001, 11, 290–299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (37).Tsur D; Tanner S; Zandi E; Bafna V; Pevzner PA Identification of post-translational modifications by blind search of mass spectra. Nat. Biotechnol 2005, 23, 1562–1567. [DOI] [PubMed] [Google Scholar]
- (38).Bandeira N; Tsur D; Frank A; Pevzner PA Protein identification by spectral networks analysis. Proc. Natl. Acad. Sci. U. S. A 2007, 104, 6140–6145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (39).Na S; Bandeira N; Paek E Fast multi-blind modification search through tandem mass spectrometry. Mol. Cell Proteomics 2012, 11, M111.010199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (40).Dhabaria A; Cifani P; Kentsis A Fabrication of Capillary Columns with Integrated Frits for Mass Spectrometry. Protoc. Exch 2015, DOI: 10.1038/protex.2015.049. [DOI] [Google Scholar]
- (41).Cifani P; Dhabaria A; Kentsis A Fabrication of Nanoelectrospray Emitters for LC-MS. Protoc. Exch 2015, DOI: 10.1038/protex.2015.053. [DOI] [Google Scholar]
- (42).Adusumilli R; Mallick P Data Conversion with ProteoWizard msConvert. Methods Mol. Biol 2017, 1550, 339–368. [DOI] [PubMed] [Google Scholar]
- (43).The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res 2017, 45, D158–D169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (44).Mellacheruvu D; et al. The CRAPome: a contaminant repository for affinity purification-mass spectrometry data. Nat. Methods 2013, 10, 730–736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (45).McIlwain S; Tamura K; Kertesz-Farkas A; Grant CE; Diament B; Frewen B; Howbert JJ; Hoopmann MR; Kall L; Eng JK; MacCoss MJ; Noble WS Crux: Rapid Open Source Protein Tandem Mass Spectrometry Analysis. J. Proteome Res 2014, 13, 4488–4491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (46).Cox J; Mann M MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol 2008, 26, 1367–1372. [DOI] [PubMed] [Google Scholar]
- (47).Craig R; Beavis RC TANDEM: matching proteins with tandem mass spectra. Bioinformatics 2004, 20, 1466–1467. [DOI] [PubMed] [Google Scholar]
- (48).Patiny L; Borel A ChemCalc: a building block for tomorrow’s chemical infrastructure. J. Chem. Inf. Model 2013, 53, 1223–1228. [DOI] [PubMed] [Google Scholar]
- (49).Vizcaíno JA; et al. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res. 2012, 41, D1063–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (50).Hokao R; Francia G Differential display. Methods Mol. Med 2001, 57, 297–305. [DOI] [PubMed] [Google Scholar]
- (51).Kentsis A Between light and eye: Goethe’s science of color and the polar phenomenology of nature. arXiv, November 14, 2005, arXiv:physics/0511130. [Google Scholar]
- (52).Zolg DP; et al. ProteomeTools: Systematic Characterization of 21 Post-translational Protein Modifications by Liquid Chromatography Tandem Mass Spectrometry (LC-MS/MS) Using Synthetic Peptides. Mol. Cell Proteomics 2018, 17, 1850–1863. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (53).Steen H; Mann M The ABC’s (and XYZ’s) of peptide sequencing. Nat. Rev. Mol. Cell Biol 2004, 5, 699–711. [DOI] [PubMed] [Google Scholar]
- (54).Creasy DM; Cottrell JS Unimod: Protein modifications for mass spectrometry. Proteomics 2004, 4, 1534–1536. [DOI] [PubMed] [Google Scholar]
- (55).Alasoo K; Martinez FO; Hale C; Gordon S; Powrie F; Dougan G; Mukhopadhyay S; Gaffney DJ Transcriptional profiling of macrophages derived from monocytes and iPS cells identifies a conserved response to LPS and novel alternative transcription. Sci. Rep 2015, 5, 12524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (56).Kamal AHM; Chakrabarty JK; Udden SMN; Zaki MH; Chowdhury SM Inflammatory Proteomic Network Analysis of Statin-treated and Lipopolysaccharide-activated Macrophages. Sci. Rep 2018, 8, 164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (57).Rattigan KM; et al. Metabolomic profiling of macrophages determines the discrete metabolomic signature and metabolomic interactome triggered by polarising immune stimuli. PLoS One 2018, 13, No. e0194126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (58).Seim GL; et al. Two-stage metabolic remodelling in macrophages in response to lipopolysaccharide and interferon-γ stimulation. Nat. Metab 2019, 1, 731–742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (59).Cifani P; et al. ProteomeGenerator: A Framework for Comprehensive Proteomics Based on de Novo Transcriptome Assembly and High-Accuracy Peptide Mass Spectral Matching. J. Proteome Res 2018, 17, 3681–3692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (60).Poole LB; Nelson KJ Discovering mechanisms of signaling-mediated cysteine oxidation. Curr. Opin. Chem. Biol 2008, 12, 18–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (61).Seo YH; Carroll KS Profiling protein thiol oxidation in tumor cells using sulfenic acid-specific antibodies. Proc. Natl. Acad. Sci. U. S. A 2009, 106, 16163–16168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (62).Jeong J; Jung Y; Na S; Jeong J; Lee E; Kim M-S; Choi S; Shin D-H; Paek E; Lee H-Y; Lee K-J Novel oxidative modifications in redox-active cysteine residues. Mol. Cell Proteomics 2011, 10, M110.000513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (63).Strelko CL; et al. Itaconic acid is a mammalian metabolite induced during macrophage activation. J. Am. Chem. Soc 2011, 133, 16386–16389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (64).Cordes T; et al. Immunoresponsive Gene 1 and Itaconate Inhibit Succinate Dehydrogenase to Modulate Intracellular Succinate Levels. J. Biol. Chem 2016, 291, 14274–14284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (65).Mills EL; et al. Itaconate is an anti-inflammatory metabolite that activates Nrf2 via alkylation of KEAP1. Nature 2018, 556, 113–117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (66).Kulkarni RA; Bak DW; Wei D; Bergholtz SE; Briney CA; Shrimp JH; Alpsoy A; Thorpe AL; Bavari AE; Crooks DR; Levy M; Florens L; Washburn MP; Frizzell N; Dykhuizen EC; Weerapana E; Linehan WM; Meier JL A chemoproteomic portrait of the oncometabolite fumarate. Nat. Chem. Biol 2019, 15, 391. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (67).Qin W; Yang F; Wang C Chemoproteomic profiling of protein-metabolite interactions. Curr. Opin. Chem. Biol 2020, 54, 28–36. [DOI] [PubMed] [Google Scholar]
- (68).West AP; et al. TLR signalling augments macrophage bactericidal activity through mitochondrial ROS. Nature 2011, 472, 476–480. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (69).Paulsen CE; Carroll KS Cysteine-mediated redox signaling: chemistry, biology, and tools for discovery. Chem. Rev 2013, 113, 4633–4679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (70).Sihvola V; Levonen A-L Keap1 as the redox sensor of the antioxidant response. Arch. Biochem. Biophys 2017, 617, 94–100. [DOI] [PubMed] [Google Scholar]
- (71).Guijas C; et al. METLIN: A Technology Platform for Identifying Knowns and Unknowns. Anal. Chem 2018, 90, 3156–3164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (72).Zheng Q; Maksimovic I; Upad A; David Y Non-enzymatic covalent modifications: a new link between metabolism and epigenetics. Protein Cell 2020, 11, 401–416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (73).Gut P; Verdin E The nexus of chromatin regulation and intermediary metabolism. Nature 2013, 502, 489–498. [DOI] [PubMed] [Google Scholar]
- (74).Zhou F; et al. Genome-scale proteome quantification by DEEP SEQ mass spectrometry. Nat. Commun 2013, 4, 2171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (75).Wilhelm M; et al. Mass-spectrometry-based draft of the human proteome. Nature 2014, 509, 582–587. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Mass spectrometry data have been deposited to the ProteomeXchange Consortium via the PRIDE49 partner repository with the data set identifiers PXD019793, PXD019827, PXD019858, and PXD019853. Processed files are available from Zenodo (10.5281/zenodo.4313808).