Abstract
Allysine is a pivotal protein post-translational modification that regulates protein interaction and activities. It is also recognized as a marker of oxidative stress under certain metabolic and physiological conditions. In this study, we developed a capture-and-release chemical proteomics workflow with heavy isotopic labeling that enables system-wide enrichment and site-specific identification of allysine as well as other carbonylated peptides, such as peptides containing glutamic semialdehyde derived from the oxidative damage of arginine and proline, with high confidence. The streamlined workflow led to the identification of 434 allysine sites on 349 proteins in human 293T and HCT116 cells and 317 allysine sites on 157 proteins in mouse muscle tissues without any treatment with an oxidative stress-inducing chemical reagent. We identified 48 histone allysine sites, including 38 sites on core histones in human 293T cells, many of which overlapped with well-characterized histone acetylation and methylation epigenetic marks. Bioinformatic analysis revealed notable characteristics of the amino acid preferences of allysine flanking sequences and the significant depletion of allysine sites in the protein secondary structure in cultured human cells. Pathway analysis showed that allysine substrates were involved in diverse cellular processes including translation, protein folding, and RNA processing in human cells and were enriched with muscle contractile fiber proteins and metabolic enzymes in mouse muscle tissue. Thus, our integrated chemical proteomics analysis revealed the structural and functional features of allysine targets under regular growth conditions in cultured human cells and mouse tissues.
Graphical Abstract

INTRODUCTION
Allysine (Alk) is a carbonyl-derivative post-translational modification (PTM) of lysine on proteins, also known as α-aminoadipic acid-δ-semialdehyde.1–3 The chemical process eliminates the ε-amine on the lysine side chain and generates a reactive protein-bound aldehyde. The modification can be generated through either enzymatic or nonenzymatic processes.1–3 Chemicals such as hydrogen peroxide produced by a variety of sources in the cells can oxidize the side chains of amino acids including lysine, arginine, and proline and generate carbonyl groups. These carbonyl-containing protein modifications including allysine have long been recognized as a marker of oxidative stress.2 The increase in carbonyl-containing derivatives of amino acids is often accompanied by the increase in the oxidative damage of lipids and glycans and is associated with the progression of diverse pathophysiological conditions such as aging, diabetes, fibrosis, arthritis, and Alzheimer’s disease.4–6
Allysine can also be produced enzymatically by the lysyl oxidase protein family (LOX).7–9 Lysyl oxidase (LOX) is a family of oxygen- and copper-dependent amine oxidases including LOX, LOXL1, LOXL2, LOXL3, and LOXL4, which are known to target collagen and elastin for allysine formation, promote their cross-linking, and enhance extracellular matrix (ECM) rigidity in the tumor microenvironment to facilitate cancer metastasis or in the cardiac system to promote extensive fibrosis.10–14 Some members of the enzyme family are also known to localize and function inside the cell, regulating transcriptional activities and epigenetics.3 LOX protein interacts with histone H1 and promotes chromatin decondensation.15–17 It also targets basic fibroblast growth factor (bFGF) to promote intersubunit cross-linking and oligomer formation, which significantly inhibits its mitogenic activities and cell growth.18 LOXL2 oxidizes methylated TAF10 and regulates gene expression in neuronal cell differentiation.19 It can also oxidize trimethylated histone H3 lysine 4, potentially inhibiting the gene expression of cadherin 1 (CDH1).20 LOXL3 targets STAT3 for deacetylation and oxidation, thereby inhibiting its transcription activity and STAT3-dependent inflammation response.21
Despite the important functions of allysine in physiology and known enzymatic activities of lysyl oxidase, the knowledge of proteome-wide allysine modification substrates is rather limited due to challenges in developing site-specific strategies to identify allysine targets. Direct MS identification of allysine has been challenging due to its low abundance and chemically reactive nature. The small mass difference (−1 Da) may give rise to false positive identifications due to similar isotopic envelopes of multiply charged unmodified and modified peptides in MS1 and interference from coeluting peptides with close m/z. Spectrophotometric detection of carbonyl-containing amino acids can be readily achieved by 2,4dinitrophenylhydrazine (DNPH) labeling.22–25 Using DNPH labeling and an exhaustive 2D HPLC fractionation analysis, a previous study has reported the identification of 284 allysine sites in HeLa cells after treatment with 0.5 mM H2O2.23 Given the low abundance of allysine, chemical proteomics strategies have been developed by applying biotin-hydrazide to label the carbonyl group in allysine followed by monoavidin-based enrichment and LCMS identification.25–28 However, this strategy has certain technical limitations.2 First, biotin labeling introduces a large and hydrophobic tag on allysine and significantly affects the LCMS-based peptide identification. Second, the formation of hydrazone is not chemically stable. While chemical reduction can stabilize the labeling, it may further increase the sample complexity due to variations in the reduction efficiency. Lastly, abundant endogenously biotinylated peptides may affect the enrichment efficiency and significantly increase the background for LCMS.
In this study, we demonstrated a capture-and-release chemical proteomics strategy for global, site-specific identification of Alk modifications, as well as other carbonyl-containing peptides in cultured human cells and mouse tissues (Figure 1). We identified diverse Alk targets in major cellular pathways in cells and tissues without oxidative stress and revealed widespread Alk modifications on histones as epigenetic marks. Structural and sequence analysis revealed striking features of amino acid preferences in Alk flanking sequences and relative depletion of Alk in the secondary protein structures in cultured human cells.
Figure 1.
Experimental workflow of the chemical enrichment strategy with stable isotope labeling for the confident identification of allysine modification.
EXPERIMENTAL SECTION
Proteomic Sample Preparation for Human Cell Line Analysis.
293T cells or HCT116 cells were cultured in 15 cm plates and collected by scraping in PBS. Five plates of cells were lysed in a lysis buffer containing 9 M urea in PBS with 1 mM PMSF protease inhibitor or 1× cOmplete protease inhibitor cocktail, followed by sonication and centrifugation to remove insoluble pellets. Protein concentrations of the supernatant were measured with Bradford assay, and about 8–12 mg of proteins was aliquoted for analysis. Cell lysate from 293T cells was diluted to 3 M urea and then subjected to benzonase digestion of nucleotides (250–290 units per 3 mg of proteins with 1 mM MgCl2) for 30 min at 37 °C before tryptic digestion. Then, the proteins were reduced and alkylated by TCEP and iodoacetamide (10 mM final, respectively) at room temperature with rotation in the dark for 30 min and quenched by cysteine (10 mM final). Sample solutions were diluted with 100 mM TEAB to 1.5 M urea. HCT116 cell lysate was directly subjected to reduction/alkylation as 293T cell lysate without benzonase digestion. The protein lysate was further digested with trypsin at an enzymeto-substrate ratio of 1:50 (w/w) at 37 °C overnight. The digested lysate was desalted with an Oasis C18 cartridge (Waters, Milford, MA, USA) following manufacturer’s instructions. Finally, the desalted peptides were separated on an XBridge C18 column (4.6 mm × 250 mm, 5 μm I.D. resin, Waters, Milford, MA) for reverse-phase HPLC fractionation with HPLC buffer A (water with 10 mM TEAB) and buffer B (ACN with 10 mM TEAB). The flow rate was set as 1 mL/min with a gradient of 1% B to 95% B in 60 min. Then, elution was combined into 8 fractions through concatenation followed by lyophilization for enrichment and LCMS analysis.
Enrichment of Carbonyl-Containing Peptides.
The dried tryptic peptides were resuspended in a conjugation buffer containing 200 mM sodium acetate pH 4.5 and 0.1 M aniline and incubated with hydrazide beads at room temperature with rotation for 2 h. The beads were washed with 1× PBS 6 times. The conjugated peptides on beads were eluted by 0.2 M D3-methoxyamine in 200 mM sodium acetate pH 4.5 and 0.1 M aniline three times (30–45 min each time). Finally, eluted peptides were combined and desalted with an in-house-packed C18 stage tip for LCMS analysis.
Identification of Carbonylated Peptides and Proteins in LCMS Analysis.
LCMS data were analyzed with the Maxquant search engine (ver 2.2.0.0) against a Uniprot human reference protein database (UP000005640_9606), mouse protein database (UP000000589_10090), or histone database concatenated with reversed decoy sequences and common contaminant sequences. Trypsin was specified as the protease, with the maximum missing cleavage specified as 2 for whole cell lysate analysis or 3 for histone sample analysis. Mass tolerances of precursor ions acquired in Orbitrap and MS/MS fragment ions acquired in an ion trap were set at 4.5 ppm and 0.5 Da, respectively. A stringent cutoff false discovery rate of 1% was specified at the levels of protein, peptide, and modification site identifications with an additional minimum score of 40 for the identification of modification sites. Ccarbamindomethylation on cysteine was set as a fixed modification. Protein N-terminal acetylation and methionine oxidation were assigned as variable modifications. To analyze allysine modification, variable modification included allysine with D3-methoxyamine conjugation modification (defined as the composition of CH(−3)D(3)O and a neutral loss of CH(2)D(3)NO, and not at C-terminal). To analyze proline-derived glutamic semialdehyde modification, variable modification included D3-methoxyamine conjugation of glutamic semialdehyde modification on proline (defined as the composition of CD(3)NO and a neutral loss of CH(2)D(3)NO) and position at anywhere. To analyze arginine-derived glutamic semialdehyde modification, variable modifications included D3-methoxyamine conjugation of glutamic semialdehyde modification on arginine (defined as the composition of H(−5)D(3)N(−2)O and a neutral loss of CH(2)D(3)NO) and position at non C-terminal.
For histone searching, lysine acetylation (defined as C(2)H(2)O with diagnostic peaks of C(7)H(11)ON and C(7)H(14)ON(2) and position at non C-terminal) was included. Cysteine carbamindomethylation is not included in histone searching. The max number of modifications per peptide was set at 3 for histone searching and 3 for whole cell and tissue lysate.
For additional details in the Experimental Section, please see the Supporting Information.
RESULTS
Streamlined Chemical Proteomics Workflow for Site-Specific Identification of Allysine and Other Carbonyl-Containing Peptides.
Despite recent advances in understanding the roles of allysine in diverse cellular pathways including extracellular structure, transcription, and epigenetic regulation, the knowledge of basal level site-specific Alk targets is rather limited. System-wide profiling of allysine and other carbonyl-containing proteins requires a more efficient enrichment workflow due to their low abundance. To this end, we developed a streamlined chemical proteomics workflow with a capture-and-release strategy for site-specific identification of allysine as well as other carbonyl-containing peptides (Figure 1). In this strategy, cells from cultures or tissues were first rapidly lysed with denaturing buffer, and proteins were digested by trypsin. Then, the peptides were incubated with hydrazide-linked agarose beads, which allowed the Schiff-based linkage formation and covalent enrichment of carbonyl-containing peptides. After washing, peptides were competitively eluted from the beads with methoxyamine by forming a more stable oxime compared to hydrazone.29,30 The eluted peptides were modified by a small methoxyamine group and could be readily identified by a classic LCMS workflow (Figure 1).
It is important to note that methoxyamine-conjugated allysine has a delta mass shift of +28 Da compared to the unmodified lysine, which is very close to the delta mass shift of dimethyllysine and indistinguishable from formyllysine with an identical element composition. Both dimethyl- and formyllysine are known post-translational modifications on histones and other nuclear proteins31 (Figure S1). To eliminate false positive Alk site identifications, we applied D3-methoxyamine for elution, which gave a delta mass of 31.0134 Da that is unique among known lysine post-translational modifications (https://www.unimod.org). This approach ensured high confidence in our Alk target identification in complex biological samples from both cultured cells and tissues.
Exploring Unique Neutral Loss Fragmentation Feature of Methoxyamine-Conjugated Allysine for Confident Allysine Identification.
Collision-induced fragmentation of peptides containing methoxyamine-conjugated lysine in tandem mass spectrometry may result in the neutral loss of the methoxyamine group in the gas phase. There are two potential mechanisms for such neutral loss32 (Figure S2). One mechanism is the direct elimination of methoxyamine on the lysine side chain, and the other is the neutral loss facilitated by the nucleophilic attack from a nearby alpha amino group. We could expect that if the neutral loss followed the former mechanism, then many fragments that contained the modified lysine may display strong neutral loss signature ions. It would be analogous to the neutral loss of 98 Da in the MS/MS analysis of the phosphorylated peptides. On the other hand, if the neutral loss followed the latter mechanism, then we may observe dominant neutral loss when a nucleophile (such as an alpha amino group) was nearby (Figure S2). To determine the neutral loss mechanism of methoxyamine-conjugated Alk peptides, we compared the Alk identifications using either H3-methoxyamine or D3-methoxyamine for elution and showed the analysis of histone peptide “GTLVQTK(al) GTGASGSFK” (K(al) indicating the allysine site) as an example (Figure 2A). Side-by-side comparison showed that H3- and D3-methoxyamine-conjugated peptide pairs were correctly identified based on the expected masses in high resolution MS1 analysis with very close HPLC elution times in LC chromatogram analysis (Figure 2B,C). When we analyzed the MS/MS spectra, we noticed that among the y9, y10, and y11 fragment ions that flanked the modification site, only the y10 ion that had methoxyamine-conjugated lysine on the N-terminus showed a strong neutral loss of the entire methoxyamine group, which resulted in the same m/z for y10 after neutral loss for both H3- and D3-methoxyamineconjugated peptides, while most other fragments containing the modified lysine did not show such a strong neutral loss feature (Figure 2D). Our data suggested that the alpha amino group of the modified lysine was critical in inducing the neutral loss of methoxyamine in the gas phase, and such a unique neutral loss signature may help the identification and localization of methoxyamine-conjugated lysine. We therefore incorporated this mechanism in the database searching to improve the sensitivity and confidence of our analysis.
Figure 2.
Validation of methoxyamine conjugation and neutral loss fragment feature at the allysine site with stable isotope labeling. (A) Schematic workflow for the comparative analysis of methoxyamine-conjugated allysine. (B) HPLC chromatograms, (C) precursor ion MS spectra, and (D) MS/MS spectra of an example histone peptide “GTLVQTK(al)GTGASGSFK” (K(al) indicating Alk site) corresponding to histone H1.2 K97 with H3-methoxyamine (KH3‑al) conjugation in the upper panel (HPLC chromatogram MS1 m/z = 783.9027–783.9183) and D3-methoxyamine (KD3‑al) conjugation in the lower panel (HPLC chromatogram MS1 m/z = 785.4106–785.4264), respectively.
Global Analysis Revealed the Widespread Targets and Unique Features of the Allysine Proteome in Human Cells.
With our streamlined chemical proteomics workflow, we identified a total of 434 Alk sites on 349 proteins in human 293T cells and HCT116 cells (Figure S3 and Table S1). Interestingly, comparing Alk sites identified from HCT116 cells and from 293T cells showed a low percentage of overlap between the two sources of proteins (Figure S4A). This can be because either allysine has a cell-type specific target proteome or our Alk proteome coverage needs further improvement to identify low abundant overlapping sites. Analysis of site count distribution showed that nearly 80% of the Alk-containing proteins were identified with a single Alk site, in distinct contrast to some other lysine modifications such as lysine 5-hydroxylation (5-Hyl) with nearly 60% identified with more than one modification site32 (Figure S4B). Five Alk sites potentially interrupts cation−π interactions of Lys with neighboring Trp33 (Table S1). Flanking sequence analysis showed that small nonpolar amino acids such as Gly and Pro tend to locate adjacent to the Alk sites (Figure 3A). Strikingly, acidic residues, especially aspartic and glutamic acids, were strongly preferred for positions in close proximity to the Alk sites in the flanking sequences.
Figure 3.
Systematic enrichment and profiling of allysine sites and proteins in HCT116 and 293T human cell lines. (A) Flanking sequence enrichment analysis with pLogo using default settings (p < 0.05). (B) Secondary structure enrichment analysis comparing the distribution of allysine sites among known secondary structures with the distribution of all lysines identified in study (HyperG test p < 0.05, n.s. indicates “not significant”). (C) Functional annotation enrichment analysis of allysine sites identified in human cells showing representative terms for Gene Ontology (BP: biological processes; CC: cellular compartments; MF: molecular function) and KEGG pathways (adj. FDR < 0.05).
With only 20 sites annotated in the Uniprot human database, our identification of 434 Alk sites significantly expanded the Alk proteome in human cells (Figure S4C). To determine if allysine had preference in targeting secondary structures, we mapped the Alk sites in this analysis to the manually annotated Uniprot sequence features with secondary structures determined by experimental evidence and performed the secondary structure enrichment analysis of Alk sites using all lysines identified in the peptides of the same batch of samples as background control. Our data showed that Alk sites were significantly depleted in alpha helix (p < 1.8 × 10−6) and had a strong preference for unannotated/unstructured regions (p < 6.9 × 10−8) (Figure 3B). These data suggested that allysine does not favor structured regions and may preferentially target unstructured protein domains, which is similar to 5-hydroxyllysine (Hyl) but different from acetyllysine.32,34 To further evaluate the correlation between Alk sites and protein structures, we performed solvent accessibility analysis with NetsurfP 3.035 and compared the relative solvent accessibility (RSA) profiles of all Lys sites and Alk sites from human Alk proteins. Our analysis showed that Alk sites had an overall higher solvent accessibility in distribution compared to the Lys sites, and significantly higher percentages of Alk sites were exposed than Lys sites when an RSA solvent exposure cutoff of 25% or 50% was applied (Figure S4D,E). These data corroborated with the secondary structure analysis, agreeing with previous understandings of a correlation between amino acids in secondary structure with lower solvent exposure and lower propensity for oxidative damage.36
To determine if Alk targets were enriched in specific biological processes and cellular pathways, we performed Gene Ontology and KEGG pathway annotation enrichment analyses (Figure 3C and Table S2). Our data showed that Alk-containing proteins were broadly enriched in major cellular house-keeping processes including cytosolic translation (adj. FDR < 1.44 × 10−16), protein folding (adj. FDR < 3.92 × 10−14), RNA splicing (adj. FDR < 4.45 × 10−7), nuclear localization (adj. FDR < 9.24 × 10−9), and chromosome organization (adj. FDR < 5.25 × 10−5), which matched the enrichment in cellular compartment analysis (Figure 3C). Surprisingly, cytoskeleton organization-related processes were not enriched in this data set (adj. FDR cutoff: 0.05). Enrichment analysis of molecular function surprisingly showed that Alk proteins were significantly enriched with diverse binding activities including cadherin binding (adj. FDR < 4.31 × 10−22), mRNA binding (adj. FDR < 2.00 × 10−17), unfolded protein binding (adj. FDR < 3.13 × 10−9), and nucleosome binding (adj. FDR < 5.98 × 10−4) (Figure 3C). Enriched KEGG pathways were in line with the enrichment in biological processes and cellular compartments, showing very few signaling or metabolic pathway enrichment except for the biosynthesis of amino acids (adj. FDR < 1.13 × 10−3) and glycolysis/gluconeogenesis (adj. FDR < 4.17 × 10−2) (Figure 3C). Protein interaction network analysis identified top highly connected subnetworks with Alk targets related to protein translation, degradation, and folding (Figure S4D). Our data suggested that Alk pathways in human cells target fundamental cellular processes such as protein translation, folding, and transport and also target diverse protein functions with binding activities.
Systematic Profiling of Alk Targets in Mouse Muscle Tissues.
Protein carbonylation from elevated oxidative stress is widely implicated in Duchenne muscular dystrophy.37–39 To determine if our chemical proteomics workflow can be applied to study muscle tissues, we analyzed Alk targets from the quad and gastrocnemius muscle tissues of mice. The tissues were lysed in denaturing urea lysis buffer and subjected to tryptic digestion and Alk enrichment. In total, we identified a total of 317 sites from 157 proteins in mouse muscle tissues (Table S3). To reveal common or differential functional features of the Alk proteome, we compared the Alk targets in mouse muscle with those identified from human cell lines. First, analysis of Alk site counts per protein showed that nearly 70% mouse Alk proteins were identified with a single site (Figure S5A), which was similar to the site count analysis of human Alk targets (Figure S4B). Second, the amino acid preference analysis of flanking sequences of the mouse muscle Alk sites showed that acidic amino acids, especially glutamic acid, were favored in close proximity to allysine (Figure 4A), which was also similar to the amino acid preference analysis of human Alk sites (Figure 3A). Third, solvent accessibility analysis showed that Alk sites in mouse muscle also had an overall higher solvent accessibility in distribution compared to the Lys sites, similar to the solvent accessibility of Alk sites in humans, although to a lesser extent (Figure S5B,C).
Figure 4.
Systematic enrichment and profiling of allysine sites and proteins in mouse quad and gastrocnemius muscle tissues. (A) Flanking sequence enrichment analysis with pLogo (p < 0.05). (B) Functional annotation enrichment analysis allysine sites identified in mouse muscle tissue with mouse protein-coding genes as background showing representative terms for Gene Ontology (BP: biological processes; CC: cellular compartments) and KEGG pathways (adj. FDR < 0.05).
With only 36 sites annotated in the Uniprot mouse proteome, our identification of 317 Alk sites in the mouse muscle tissues significantly expanded the knowledge of Alk targets in mice (Figure S5D). Bioinformatic analysis of Alk-containing proteins showed strong characteristics of muscle cells, completely different from those of Alk targets in human cells (Table S4A–C). More specifically, Gene Ontology enrichment analysis of biological processes with mouse protein-coding genes as the background showed a strong enrichment of Alk targets among muscle system processes (adj. FDR < 1.1 × 10−13) and major metabolic processes, including nucleoside triphosphate metabolism (adj. FDR < 9.2 × 10−8) and pyruvate metabolic processes (adj. FDR < 8.85 × 10−5) (Figure 4B). Cellular compartment analysis identified significant enrichment of Alk targets on muscle contractile fibers (adj. FDR < 2.91 × 10−11) including myosin filament (adj. FDR < 3.75 × 10−5) and actin skeleton (adj. FDR < 1.62 × 10−6) (Figure 4B). In line with the enrichment in biological processes and cellular compartments, KEGG pathway enrichment analysis showed significant enrichments in motor proteins (adj. FDR < 5.35 × 10−3), the calcium signaling pathway (adj. FDR < 2.31 × 10−3), and glucagon signaling pathway (adj. FDR < 5.9 × 10−4) (Figure 4B). STRING-based interaction network analysis identified highly interconnected subnetworks of Alk proteins in mouse muscle tissues related to muscle fiber and cell skeleton organization as well as metabolic processes (Figure S5E).
To determine if the significant enrichment of proteins involved in the muscle system process among Alk proteins in mouse skeletal muscle was due to the high abundance of tissue-specific proteins, we repeated the biological process enrichment analysis using the mouse skeletal muscle proteins from a previous deep proteome profiling study as the background40 (Table S4D). Interestingly, our data showed that muscle system processes became even more enriched among Alk proteins after the background adjustment (adj. FDR < 3.9 × 10−15) (Figure S5F). This data suggested that allysine preferentially targets muscle system-related proteins in mouse skeletal muscle.
Identification of Widespread Allysine Sites on Histones.
Our whole cell analysis identified nine histone Alk sites in human cells that contributed to the significant enrichment of nucleosome organization processes (adj. FDR < 1.82 × 10−3) and nucleosome binding activities (adj. FDR < 5.96 × 10−4) among allysine proteins. These Alk sites were mainly located in the central region or close to the C-terminus of histones (Table S1). To comprehensively identify histone Alk sites, we performed histone extraction from 293T cells followed by allysine enrichment and exhaustive LCMS analysis. The focused analysis identified 44 sites from extracted histones and showed that all core human histones as well as linker histone H1 were targets of Alk modification (Table S5, Figure 5 and Figures S6–S10). N-terminal Alk sites on histones such as H4K5, K8, K12, K16 and H3K9, K18, K23 clearly overlapped with known acetylation sites that have important epigenetic regulatory functions. On the other hand, Alk sites in globular domains such as H3K79, K122, H2BK116, K120 and H4K59, K77, K91 may impact nucleosome structure and histone interaction with DNA.41 In comparison to the human cell lines, we identified only two histone sites, H4K31 and H4K91, from the mouse muscle tissue in global analysis, which were also identified in human cells through both whole cell and extracted histone analysis.
Figure 5.
Identification and distribution of representative allysine modification sites (in red) on core histones and linker histone H1 in 293T cells.
Comparing Allysine with Arg and Pro-Derived Carbonylation Targets.
Metal-catalyzed oxidation of proteins leads to the production of carbonyl-containing amino acids. The major products were glutamic and aminoadipic semialdehydes (allysine)5 (Figure S11). The glutamic semialdehyde can be generated by the oxidative damage of either Arg or Pro. Although our chemical proteomics workflow was designed for the unambiguous identification of allysine sites, it can also be broadly applied to other carbonyl-containing targets such as various oxidative products of protein-bound amino acids, lipids, and glycans.4–6 To identify Arg and Pro-derived carbonylation peptides, we reanalyzed the data by specifying D3-methoxyamine-conjugated Arg and Pro as variable modifications.
We identified 472 Pro-derivatized glutamic semialdehyde sites on 318 proteins in cultured human cells and 568 sites on 257 proteins in mouse muscle tissues (Table S6 and Figure S12). Flanking sequence analysis showed a preference for tyrosine and amino acids with small side chains such as serine and glycine in the close vicinity of the modification sites in human cell analysis and a preference for proline, threonine, and alanine next to the modification sites in mouse tissue data (Figure S13A). In comparison, we identified 169 Arg-derivatized glutamic semialdehyde sites on 145 proteins in cultured human cells and 155 sites on 127 proteins in mouse muscle tissues (Table S7 and Figure S14), but flanking sequence analysis did not show a strong pattern of amino acid preference (Figure S13B).
Functional annotation enrichment analysis of glutamic semialdehyde-containing proteins derivatized from Pro showed a similar enrichment in major house-keeping cellular processes as the allysine targets including translation, elongation, RNA splicing, and protein folding in human cell and metabolic biological processes in mouse tissue analysis (Figure S13C,D and Table S8). Annotation enrichment analysis of glutamic semialdehyde-containing proteins derivatized from Arg showed few enriched biological processes (adj FDR cutoff: 0.05), likely due to the low coverage of the modified proteins (data not shown). Compared to Alk sites, only a few Pro and Arg-derivatized glutamic semialdehyde sites on histones were identified including H2B R92, H2A P26, H1.2 P5 and P12 in human whole cell analysis and H1t P13 in mouse muscle tissue (Tables S6 and S7).
DISCUSSION
In this study, we report a streamlined chemical proteomics workflow for the reliable and sensitive identification of site-specific allysine in cells and tissues. The strategy involved capturing carbonyl-containing peptides with hydrazide beads and eluting the peptides with isotope-labeled methoxyamine. The workflow presented two important technical advances. First, methoxyamine competitive elution generates a small chemical tag on Alk peptides that is relatively stable and can be easily identified by LCMS compared to the bulky and hydrophobic biotin tag that was widely used previously. Second, D3-methoxyamine conjugation generates a chemical tag with a unique delta mass, which significantly improves the confidence of Alk site identification. In addition, compared to the biotin-avidin-based enrichment, our approach avoided interference from endogenously biotinylated proteins and lowered the background for higher sensitivity. Application of this workflow led to the unambiguous identification of 434 human Alk sites in cultured cells and 317 mouse Alk sites in muscle tissue without inducing oxidative stress. In addition, we also identified 472 and 169 Pro and Arg-derived carbonylation sites in human cells and 568 and 155 Pro and Arg-derived carbonylation sites in mouse muscle.
These data showed that allysine substrates as well as other carbonyl-derivatized targets are widespread in cell lines and tissues under normal growth conditions, involved in diverse fundamental cellular processes including protein folding, translation, and RNA splicing in both the cytosol and nucleus in cultured cells and the muscle system as well as energy metabolic processes in mouse tissues. Interestingly, Alk sites tend to have adjacent acidic amino acids in the flanking sequence, which are conserved between human and mouse targets. Furthermore, we observed that Alk is depleted from structured protein domains such as the alpha helix in cultured human cells and preferentially targets solvent-exposed sites in both human cell and mouse tissue analysis, though the secondary structural analysis of Alk sites in mouse tissue did not reveal significant results, likely due to the sample types or limited availability of mouse proteins with structure determination (data not shown). Although the findings from bioinformatic analysis could be limited by the depth of analysis and sample type, these data sets regardless offered new insights of Alk proteome complexity for future studies. Combined with quantitative analysis, our strategy may find wide applications in studying oxidative stress-induced carbonylation in cells and tissues.
Our identification of widespread allysine modification on histones further expanded the large inventory of epigenetic marks. The loss of a positively charged amino group and the replacement with a chemically reactive aldehyde side chain on lysines indicate a potentially great impact on epigenetic protein interactions and histone–DNA interactions that may have strong implications in transcriptional activities. Unlike carbonylation derivatized from Arg and Pro, allysine can also be generated in vivo by enzymatic reactions through lysyl oxidases (LOX and LOXLs).7–9 High expression of LOX enzymes is associated with poor prognosis in patients and the promotion of metastasis in breast, lung, liver, and head and neck carcinomas, while the inhibition of LOX family enzymes drastically reduces ECM formation and associated tumor progression.10–12 Given the important role of LOX proteins in tumor development and metastasis, it will be intriguing to study how the upregulation of LOX enzymes facilitates cancer progression by mediating gene transcription and epigenetic regulation through diverse targets in the allysine pathway.
Supplementary Material
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.analchem.5c00660.
Supplementary Materials and Methods, chemical structures, neutral loss fragmentation mechanisms, experiment workflow, proteomic profiling data, MS/MS spectra, and bioinformatic analysis (PDF)
Tables S1–S8: allysine sites in human cells and mouse muscle tissues, enriched annotations for allysine proteins, histone sites, Arg and Pro-derived glutamic semialdehyde sites, and enriched Gene Ontology biological processes of Pro-derived glutamic semialdehyde proteins (ZIP)
ACKNOWLEDGMENTS
We greatly appreciate the discussion and suggestions from the members of the Chen lab and helpful advice from David Bernlohr, Timothy Griffin, and Douglas Mashek. We also acknowledge the funding support from the University of Minnesota and the National Institute of Health (R35GM124896 to Y.C., R01AR079477 and R01HL122323 to J.M.M.).
Footnotes
Complete contact information is available at:
The authors declare no competing financial interest.
Contributor Information
Yi-Cheng Sin, Department of Biochemistry, Molecular Biology and Biophysics and Bioinformatics and Computational Biology Program, University of Minnesota at Twin Cities, Minneapolis, Minnesota 55455, United States.
Nora Hosny, Department of Integrative Biology and Physiology, University of Minnesota at Twin Cities, Minneapolis, Minnesota 55455, United States.
Addeli Bez Batti Angulski, Department of Integrative Biology and Physiology, University of Minnesota at Twin Cities, Minneapolis, Minnesota 55455, United States.
Do-Hyung Kim, Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota at Twin Cities, Minneapolis, Minnesota 55455, United States.
Joseph M. Metzger, Department of Integrative Biology and Physiology, University of Minnesota at Twin Cities, Minneapolis, Minnesota 55455, United States
Yue Chen, Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota at Twin Cities, Minneapolis, Minnesota 55455, United States.
Data Availability Statement
The data supporting this article have been included as part of the Supporting Information. The mass spectrometry proteomics data have been deposited to the Proteome-Xchange Consortium via the PRIDE42 partner repository with the data set identifier PXD060168.
REFERENCES
- (1).Madian AG; Regnier FE J. Proteome Res. 2010, 9 (8), 3766–3780. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (2).Fedorova M; Bollineni RC; Hoffmann R Mass Spectrom Rev. 2014, 33 (2), 79–97. [DOI] [PubMed] [Google Scholar]
- (3).Serra-Bardenys G; Peiró S FEBS Journal 2022, 289 (24), 8020–8031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (4).Curtis JM; Hahn WS; Long EK; Burrill JS; Arriaga EA; Bernlohr DA Trends Endocrinol Metab 2012, 23 (8), 399–406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (5).Requena JR; Chao C-C; Levine RL; Stadtman ER Proc. Natl. Acad. Sci. U. S. A. 2001, 98 (1), 69–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (6).Dalle-Donne I; Giustarini D; Colombo R; Rossi R; Milzani A Trends in Molecular Medicine 2003, 9 (4), 169–176. [DOI] [PubMed] [Google Scholar]
- (7).Siegel RC Lysyl Oxidase. In International Review of Connective Tissue Research; Hall DA, Jackson DS, Eds.; Elsevier, 1979; Vol. 8, pp 73–118. 10.1016/B978-0-12-363708-6.50009-6. [DOI] [PubMed] [Google Scholar]
- (8).Smith-Mungo LI; Kagan HM Matrix Biol. 1998, 16 (7), 387–398. [DOI] [PubMed] [Google Scholar]
- (9).Kagan HM; Li W Journal of Cellular Biochemistry 2003, 88 (4), 660–672. [DOI] [PubMed] [Google Scholar]
- (10).Xiao Q; Ge G Cancer Microenviron 2012, 5 (3), 261–273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (11).Liburkin-Dan T; Toledano S; Neufeld G Int. J. Mol. Sci. 2022, 23 (11), 6249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (12).Löser R; Kuchar M; Wodtke R; Neuber C; Belter B; Kopka K; Santhanam L; Pietzsch J ChemMedChem. 2023, 18 (18), No. e202300331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).Yang J; Savvatis K; Kang JS; Fan P; Zhong H; Schwartz K; Barry V; Mikels-Vigdal A; Karpinski S; Kornyeyev D; Adamkewicz J; Feng X; Zhou Q; Shang C; Kumar P; Phan D; Kasner M; López B; Diez J; Wright KC; Kovacs RL; Chen P-S; Quertermous T; Smith V; Yao L; Tschöpe C; Chang C-P Nat. Commun. 2016, 7, 13710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (14).Erasmus M; Samodien E; Lecour S; Cour M; Lorenzo O; Dludla P; Pheiffer C; Johnson R Int. J. Mol. Sci. 2020, 21 (16), 5913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (15).Giampuzzi M; Oleggini R; Di Donato A Biochim. Biophys. Acta 2003, 1647 (1–2), 245–251. [DOI] [PubMed] [Google Scholar]
- (16).Mello MLS; Alvarenga EM; Vidal B. de C.; Di Donato A Micron 2011, 42 (1), 8–16. [DOI] [PubMed] [Google Scholar]
- (17).Oleggini R; Di Donato A Biochem Cell Biol. 2011, 89 (6), 522–532. [DOI] [PubMed] [Google Scholar]
- (18).Li W; Nugent MA; Zhao Y; Chau AN; Li SJ; Chou I-N; Liu G; Kagan HM J. Cell Biochem 2003, 88 (1), 152–164. [DOI] [PubMed] [Google Scholar]
- (19).Iturbide A; Pascual-Reguant L; Fargas L; Cebrià JP; Alsina B; García de Herreros A; Peiró S Mol. Cell 2015, 58 (5), 755–766. [DOI] [PubMed] [Google Scholar]
- (20).Herranz N; Dave N; Millanes-Romero A; Pascual-Reguant L; Morey L; Díaz VM; Lórenz-Fonfría V; Gutierrez-Gallego R; Jerónimo C; Iturbide A; Di Croce L; García de Herreros A; Peiró S FEBS J. 2016, 283 (23), 4263–4273. [DOI] [PubMed] [Google Scholar]
- (21).Ma L; Huang C; Wang X-J; Xin DE; Wang L-S; Zou QC; Zhang Y-NS; Tan M-D; Wang Y-M; Zhao TC; Chatterjee D; Altura RA; Wang C; Xu YS; Yang J-H; Fan Y-S; Han B-H; Si J; Zhang X; Cheng J; Chang Z; Chin YE Mol. Cell 2017, 65 (2), 296–309. [DOI] [PubMed] [Google Scholar]
- (22).Uchiyama S; Inaba Y; Kunugita NJ Chromatogr B Analyt Technol. Biomed Life Sci. 2011, 879 (17–18), 1282–1289. [DOI] [PubMed] [Google Scholar]
- (23).Bollineni RC; Hoffmann R; Fedorova M Free Radical Biol. Med. 2014, 68, 186–195. [DOI] [PubMed] [Google Scholar]
- (24).Bollineni RC; Fedorova M; Hoffmann RJ Proteomics 2011, 74 (11), 2351–2359. [DOI] [PubMed] [Google Scholar]
- (25).Yan L-J; Forster MJ J. Chromatogr B Analyt Technol. Biomed Life Sci. 2011, 879 (17–18), 1308–1315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (26).Mirzaei H; Regnier F Anal. Chem. 2005, 77 (8), 2386–2392. [DOI] [PubMed] [Google Scholar]
- (27).Hensley K Methods Mol. Biol. 2015, 1314, 95–100. [DOI] [PubMed] [Google Scholar]
- (28).Meany DL; Xie H; Thompson LV; Arriaga EA; Griffin TJ PROTEOMICS 2007, 7 (7), 1150–1163. [DOI] [PubMed] [Google Scholar]
- (29).Kalia J; Raines RT Angew. Chem., Int. Ed. Engl. 2008, 47 (39), 7523–7526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (30).Xu S; Sun F; Wu R Anal. Chem. 2020, 92 (14), 9807–9814. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (31).Huang H; Lin S; Garcia BA; Zhao Y Chem. Rev. 2015, 115 (6), 2376–2418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (32).Sin Y-C; Park M; Griffin TJ; Yong J; Chen Y Chem. Sci. 2024, 15 (44), 18395–18404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (33).Xie X; Moon PJ; Crossley SWM; Bischoff AJ; He D; Li G; Dao N; Gonzalez-Valero A; Reeves AG; McKenna JM; Elledge SK; Wells JA; Toste FD; Chang CJ Nature 2024, 627 (8004), 680–687. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (34).Choudhary C; Kumar C; Gnad F; Nielsen ML; Rehman M; Walther TC; Olsen JV; Mann M Science 2009, 325 (5942), 834–840. [DOI] [PubMed] [Google Scholar]
- (35).Høie MH; Kiehl EN; Petersen B; Nielsen M; Winther O; Nielsen H; Hallgren J; Marcatili P Nucleic Acids Res. 2022, 50 (W1), W510–W515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (36).Chang RL; Stanley JA; Robinson MC; Sher JW; Li Z; Chan YA; Omdahl AR; Wattiez R; Godzik A; Matallana-Surget S EMBO Journal 2020, 39 (23), No. e104523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (37).Kaczor JJ; Hall JE; Payne E; Tarnopolsky MA Free Radic Biol. Med. 2007, 43 (1), 145–154. [DOI] [PubMed] [Google Scholar]
- (38).Renjini R; Gayathri N; Nalini A; Srinivas Bharath MM Neurochem. Res. 2012, 37 (4), 885–898. [DOI] [PubMed] [Google Scholar]
- (39).Petrillo S; Pelosi L; Piemonte F; Travaglini L; Forcina L; Catteruccia M; Petrini S; Verardo M; D’Amico A; Musarò A; Bertini E Hum. Mol. Genet. 2017, 26 (14), 2781–2790. [DOI] [PubMed] [Google Scholar]
- (40).Deshmukh AS; Murgia M; Nagaraj N; Treebak JT; Cox J; Mann M Mol. Cell Proteomics 2015, 14 (4), 841–853. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (41).Mersfelder EL; Parthun MR Nucleic Acids Res. 2006, 34 (9), 2653–2662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (42).Perez-Riverol Y; Bai J; Bandla C; García-Seisdedos D; Hewapathirana S; Kamatchinathan S; Kundu DJ; Prakash A; Frericks-Zipper A; Eisenacher M; Walzer M; Wang S; Brazma A; Vizcaíno JA Nucleic Acids Res. 2022, 50 (D1), D543–D552. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data supporting this article have been included as part of the Supporting Information. The mass spectrometry proteomics data have been deposited to the Proteome-Xchange Consortium via the PRIDE42 partner repository with the data set identifier PXD060168.





