Abstract
The tyrosine residue of proteins participates in a wide range of activities including enzymatic catalysis, protein–protein interaction, and protein–ligand binding. However, the functional annotation of the tyrosine residues on a large scale is still very challenging. Here, we report a novel method integrating azo coupling, bioorthogonal chemistry, and multiplexed proteomics to globally investigate the tyrosine reactivity in the human proteome. Based on the azo-coupling reaction between aryl diazonium salt and the tyrosine residue, two different probes were evaluated, and the probe with the best performance was employed to further study the tyrosine residues in the human proteome. Then, tagged tyrosine-containing peptides were selectively enriched using bioorthogonal chemistry, and after the cleavage, a small tag on the peptides perfectly fits for site-specific analysis by MS. Coupling with multiplexed proteomics, we quantified over 5000 tyrosine sites in MCF7 cells, and these quantified sites displayed a wide range of reactivity. The tyrosine residues with high reactivity were found on functionally and structurally diverse proteins, including those with the catalytic activity and binding property. This method can be extensively applied to advance our understanding of protein functions and facilitate the development of covalent drugs to regulate protein activity.
Graphical Abstract

INTRODUCTION
With the rapid development of mass spectrometry (MS)-based proteomics,1–3 large-scale characterization and quantification of proteins have significantly advanced in recent years and provided a wealth of information including protein expression, spatial distribution, dynamics, and interactions.4–6 Nevertheless, the biological functions of many proteins in the eukaryotic and prokaryotic organisms remain elusive. Comprehensive analysis of protein post-translational modifications (PTMs), including glycosylation, phosphorylation, and ubiquitination, has attracted much attention7–15 and is effective in investigating the biological functions of proteins on a system-wide level. Additionally, the amino acid sequences and protein folding create a specific microenvironment and have critical roles in determining the functional activities of proteins. Correspondingly, the same amino acid residue may have different reactivities in different local environments. Previously, global analysis of the reactivity of the cysteine residues of proteins was achieved by integrating a covalent probe (iodoacetamide-alkyne) and quantitative proteomics. The cysteine residues with hyper-reactivity were found to be involved in different types of functional activities, including enzymatic catalysis and redox regulation, which greatly expand our understanding of protein functions.16
Besides revealing their biological functions, large-scale analysis of the reactivity of the amino acid residues also facilitates the discovery of proteins as drug targets that can be targeted by covalent ligands. Although many proteins have been reported to be correlated with various diseases, it is still challenging to develop drugs to specifically target them and some are even considered as undruggable. The development of covalent ligands offers another strategy to expand the landscape of proteins amenable to be targeted by small molecules. Based on the results from the reactivity of the amino acid residues using MS-based proteomics, specific electrophilic reagents can be designed to target certain sites. For example, reactive amino acid residues are often found in the binding pockets of enzymes, which can be targeted by covalent inhibitors,17,18 indicating that comprehensive profiling of the reactivity of the amino acid residues using MS-based proteomics promotes the development of covalent drug inhibitors.
Besides cysteine, other amino acid residues have also recently been studied using MS-based proteomics. The lysine residue is frequently labeled with N-hydroxysuccinimide-esters (NHS-esters), and recently, over 9000 lysine residues were profiled in human cells with an STP (sulfotetrafluorophenyl) ester-based probe.19 Furthermore, the aspartate and glutamate residues in human cells and bacteria were investigated using a 2H-azirine-based probe and a light-activatable 2,5-disubstituted tetrazole-based probe, respectively.20,21 A method called redox-activated chemical tagging (ReACT) was developed to specifically target methionine in biological systems and applied for chemoproteomic identification of functional methionine residues, which revealed a group of proteins with hyperactive methionine residues including enzymes, chaperones, nucleoproteins, and structural proteins.22
The tyrosine side chain plays a critical role in a variety of biological functions of proteins because of its unique structural and electronic features and has been reported to be selectively targeted by covalent inhibitors under certain conditions.23–25 For example, Chen and co-workers were able to specifically modify a lipid-binding protein, i.e., cellular retinoic acid binding protein 2 (CRABP2), using aryl fluorosulfates. The aryl fluorosulfate warhead with low reactivity was used to covalently target tyrosine through the proximity effect.26 Although the tyrosine residue is involved in various biological processes and regulates different protein functions, global analysis of tyrosine still lags behind. Systematic investigation of the tyrosine residues will reveal their biological functions and help identify proteins that can be covalently targeted by small-molecule ligands. Recently, Hahm et al. globally characterized tyrosine residues using sulfur-triazole exchange (SuTEx) chemistry and studied tyrosine phosphorylation changes with pervanadate activation.27,28 Effective methods to analyze tyrosine on proteins will further advance our understanding of the tyrosine residues, which unravels their novel biological functions and expands the scope of covalent drug candidates.
In this work, we developed a novel and effective method integrating azo coupling, bioorthogonal chemistry, and multiplexed quantitative proteomics to globally investigate the tyrosine residues in the human proteome. The azo-coupling reaction between aryl diazonium salt and the aromatic amino acid residues, such as tyrosine, was reported to modify the protein surface.29–33 Through changing the substituent groups on the phenyl ring of the aryl diazonium, its reactivity toward tyrosine can be tuned.34 Reactive tyrosine normally possesses a high level of nucleophilicity on the oxygen atom and tends to have higher electron density, which also enhances the nucleophilicity of the phenyl ring on tyrosine. Therefore, an aryl diazonium probe that specifically targets the tyrosine residues through the electrophilic aromatic substitution process can be used to study the reactivity of tyrosine. Combining the azo coupling with multiplexed quantitative proteomics, we achieved systematic and site-specific analysis of the tyrosine residues in the human proteome. This method can be extensively applied to study the tyrosine residues in the biological and biomedical research fields.
EXPERIMENTAL SECTION
Cell Culture.
MCF7 cells (ATCC) were cultured in Dulbecco’s modified Eagle’s medium (DMEM, Sigma-Aldrich) with high glucose supplemented with 10% fetal bovine serum (FBS, Corning) and 1% penicillin–streptomycin solution (Sigma-Aldrich). When the confluency reached ~90%, the cells were washed with PBS three times, harvested by scraping, and pelleted by centrifugation (300g, 5 min). The cells were further washed with ice-cold PBS.
Tyrosine Labeling, Click Chemistry, and Protein Digestion.
The diazonium-alkyne probe was synthesized according to previous reports.35,36 Briefly, m-ethynylaniline or p-ethynylaniline (Sigma-Aldrich) was mixed with 1 M HCl on ice for 15 min. A freshly prepared sodium nitrite (NaNO2, Sigma-Aldrich) aqueous solution was chilled on ice for 15 min and added slowly to the ethylaniline solution above. The mixture was incubated on ice for 45 min. Subsequently, a tetrafluoroboric acid solution (HBF4, 48%, Sigma-Aldrich) was added to precipitate the diazonium-alkyne probes. After filtration, the product was purified by being dissolved in acetonitrile (ACN) and then precipitated by adding cold ether (Sigma-Aldrich). The purification procedure was repeated one more time.
The cell pellets were resuspended in PBS and lysed through rapid freeze–thawing three times with liquid nitrogen. After centrifugation (5000g, 10 min), the supernatant was transferred to a new tube. The diazonium-alkyne probe 1 was added to the cell lysates to a final concentration of 50 μM (low concentration, L) or 500 μM (high concentration, H), respectively. The labeling reaction lasted for 60 min at room temperature (RT) with end-over-end rotation. The probe was removed, and proteins were isolated through the methanol–chloroform precipitation method. The purified proteins were solubilized in PBS with 0.4% SDS via sonication. Biotin-alkyne (25.0 mM stock solution in DMSO, Click Chemistry Tools) was added to the solution to a final concentration of 250 μM, followed by CuSO4 and tris(3-hydroxypropyltriazolylmethyl)-amine (THPTA, Click Chemistry Tools) to the concentrations of 1.0 and 5.0 mM, respectively.37 Sodium ascorbate and guanidine hydrochloride were transferred to the solution at 15.0 mM to initiate the copper(I)-catalyzed azide alkyne cycloaddition (CuAAC) reaction. The reaction was incubated at RT for 2 h. Eventually, the proteins were purified again and then digested with trypsin (Promega) in a digestion buffer (50 mM HEPES, pH 8.6, 1.5 M urea) at 37 °C overnight. Trifluoroacetic acid (TFA) was added to quench the digestion reaction by adjusting the pH value to be <2. The peptides were desalted using a tC18 Sep-Pak cartridge (Waters) and dried under vacuum.
Enrichment of Labeled Tyrosine Peptides.
Dried peptides were dissolved in PBS and incubated with NeutrAvidin beads (Fisher Scientific) at RT for 2 h. The beads were transferred to a spin column and washed with PBS for 10 times. Then, the tyrosine-modified peptides were eluted from the beads by incubating with a freshly prepared 25.0 mM sodium dithionite (Na2S2O4) solution for 30 min. The elution step was repeated one more time and the elutes were combined. The enriched peptides were purified using a tC18 Sep-Pak cartridge.
TMT Labeling and Peptide Fractionation.
The peptides from six samples were labeled with the six-plex tandem mass tag (TMT) reagents (Thermo Scientific) according to the manufacturer’s protocol. Channels 126, 127, and 128 were used to label the peptides from the samples reacted with the probe at the low concentration, and channels 129, 130, and 131 were for the peptides reacted at the high concentration. Briefly, the peptides from each sample were dissolved in the HEPES buffer (pH 8.6, 200 mM, 100 μL). Then, ACN (30 μL) was added to each sample above. Each tube of the TMT reagent was warmed to RT and then dissolved in ACN (41 μL), and 5 μL was used for the peptide labeling. The TMT labeling reaction lasted for 60 min at RT with shaking and subsequently was quenched with hydroxylamine (10 μL, 5%, Sigma-Aldrich) for 15 min. The labeled peptides from each sample were mixed, purified, and dried. The peptides were dissolved in 300 μL of ammonium acetate (pH 10, 10 mM) and then loaded onto a reversed-phase C18 HPLC column (Waters). The peptides were separated into 16 fractions with a 40 min gradient of 5–55% ACN-containing ammonium acetate (pH 10, 10 mM). Each fraction was purified using the StageTip method before LC–MS/MS analysis.
LC–MS/MS Analysis.
The dried, TMT-labeled peptides were dissolved in the loading buffer (5% ACN and 4% formic acid (FA)) and 4 μL was loaded onto a reversed-phase microcapillary column packed with C18 beads through a WPS-3000TPLRS autosampler (UltiMate 3000). The peptides were first separated by HPLC via an UltiMate 3000 HPLC system, followed by being detected in a hybrid dual-cell quadrupole linear ion trap-Orbitrap mass spectrometer (LTQ Orbitrap Elite, Thermo Scientific). A data-dependent Top15 method was used for peptide detection. A full MS scan (resolution: 60,000) was recorded in the Orbitrap at the automatic gain control (AGC) of 106. The top 15 precursor ions with the highest intensities were selected for fragmentation using higher-energy collision dissociation (HCD) and the normalized collision energy (NCE) was set to 40%. The fragments were then detected in the Orbitrap cell with high resolution and high mass accuracy. The selected precursor ions were excluded for 90 s. Ions with a single or unassigned charge were not fragmented.
Database Searching, Data Filtering, and Bioinformatic Analysis.
The SEQUEST algorithm (version 28)38 was used to search the raw files against the database containing all human proteins downloaded from UniProt (Homo sapiens). The search parameters were set as indicated below: precursor mass tolerance (10 ppm), product ion mass tolerance (0.025 Da), digestion enzyme (trypsin), and missed cleavages (up to three). The fixed modifications included the TMT tag of lysine and the peptide N-terminus (+229.1629), and the variable modifications included oxidation of methionine (+15.9949) and tyrosine (Y) with the modified tag (+15.0109). The target–decoy method was employed to evaluate the false discovery rates (FDRs) for peptide and protein identifications.39 Linear discriminant analysis (LDA) that considers multiple parameters (XCorr, precursor mass accuracy, and the charge state) was performed to control the degree of accuracy of probe-modified peptide identifications,40 and peptides with <7 amino acid residues were removed. The FDRs of probe-modified peptides and proteins were both controlled to <1%. The ModScore was used to evaluate the accuracy of the site localization, and sites with ModScore > 13 (P < 0.05) were considered to be well localized.41 The intensities of the TMT reporter ions in the tandem MS were used to quantify the identified peptides.
The tyrosine phosphorylation sites were downloaded from PhosphoSitePlus (HTP score ≥ 1).42 The amino acid frequency was generated with pLogo.43 The protein–protein interactions (PPIs) were extracted from STRING and visualized by Cytoscape.44,45 Functional analysis of the protein interaction network was performed using STRING and PANTHER (Protein Analysis Through Evolutionary Relationships).46 ClusterMaker in Cytoscape was exploited to perform protein cluster analysis with the MCL (Markov Clustering) algorithm.47 The inflation parameter was set to 3. The protein clusters were also compared with the protein complexes deposited in CORUM.48 Protein domains were extracted from SUPFAM.49 The crystal structures were downloaded from RCSB Protein Data Bank and visualized using PyMOL.50 For ENPP1, the crystal structure (PDB: 4B56) of its analog from mouse was used.
Gel-Based Fluorescence Assay.
As described above, proteins reacted with the diazonium-alkyne probe at different concentrations for 60 min after cell lysis. The diazonium-alkyne probe was removed by passing the cell lysates through a Bio-Gel P-6 column (Bio-Rad). Then, azide-fluor 545 (Sigma-Aldrich) was added to the cell lysates (the final concentration of 100 μM), followed by the addition of CuSO4, THPTA, and sodium ascorbate. The reaction lasted for 60 min with shaking at RT. The SDS-PAGE loading buffer without DTT (DTT tends to reduce the formed diazo group during the SDS-PAGE) was added, and then proteins were separated by SDS-PAGE. The gel was further stained with SimplyBlue. The fluor545 signal and SimplyBlue staining bands were visualized and recorded using a GE Typhoon Trio+ Fluorescence/Phospho-Imager system.
RESULTS AND DISCUSSION
Principle of Large-Scale Analysis of the Tyrosine Residues in the Human Proteome.
Comprehensive analysis of the tyrosine residues on proteins offers an effective and straightforward way to systematically investigate the potentially reactive tyrosine sites and reveals novel biological functions of corresponding proteins on a large scale. This will facilitate the development of small molecules to covalently modulate the activities of proteins and to target proteins previously known as undruggable. Until now, a wide range of small molecule-based probes have been developed to specifically target various amino acid residues. Compared with other common nucleophilic amino acids, such as cysteine, lysine, and serine, the structure of tyrosine is unique because there is a conjugative effect between the benzene ring and the oxygen atom. Therefore, an electrophilic aromatic substitution that targets the tyrosine residues can reveal their nucleophilicity and investigate the reactivity of the tyrosine residues.
The tyrosine residue has been used for chemo- and site-selective protein modification through the electrophilic aromatic substitution. The unique properties of the tyrosine residues allow for their selective modifications with reactive aryl diazonium salts via electrophilic aromatic substitution to generate diazo compounds, which is also called azo coupling. However, up to date, the azo coupling has not been exploited to systematically study the reactivity of the tyrosine residues. Here, we developed an aryl diazonium-alkyne probe to selectively target the tyrosine residues in the human proteome and established a novel method to comprehensively profile the tyrosine residues (Figure 1a). The reactivities of aryl diazonium salts can be tuned by changing the substituent groups. After the aryl diazonium-alkyne probe was conjugated to the tyrosine residue, a biotin tag was added to tyrosine-containing proteins through CuAAC. Then, proteins were digested and peptides containing the biotinylated tyrosine were enriched using NeutrAvidin beads. The reaction between the side chain of tyrosine and the aryl diazonium-alkyne probe produced an azo group, which was cleaved by sodium dithionite. The generated amino group on the modified peptides facilitates peptide ionization for MS analysis. The peptides containing the modified tyrosine can be site-specifically identified by MS.
Figure 1.

(a) Experimental procedure for comprehensive profiling of the tyrosine residues in the human proteome by integrating azo coupling and click chemistry (CuAAC). (b) Comparison of unique peptides with the probe-modified tyrosine residues identified using probe 1 or probe 2. (c) The experimental results demonstrate that both probes are specific to target the tyrosine residues.
Identification of the Tyrosine Residues with Azo Coupling and LC–MS/MS.
To optimize the proteomics workflow for identifying the tyrosine residues in the human proteome, we first compared the protein-centric and peptide-centric enrichment strategies. For the peptide-centric approach, the workflow is described in Figure 1a. For the protein-centric approach (Figure S1a), the tagged proteins via azo coupling were enriched directly by the beads conjugated with the azide group through CuAAC, followed by on-bead digestion and elution with sodium dithionite. After LC–MS/MS analysis, the results showed that the peptide-centric method can identify many more peptides containing the modified tyrosine compared to the protein-centric method (Figure S1b). The reason is that the enrichment at the peptide level can reduce nonspecific binding and eliminate other peptides without the probe-modified tyrosine residues from proteins.
To further improve the coverage, we compared two aryl diazonium-alkyne probes (Figure 1a) with different reactivities for profiling the tyrosine residues. With the same experimental conditions, probe 1 and probe 2 were used to label the tyrosine residues, respectively, followed by biotinylation, protein digestion, peptide enrichment, and LC–MS/MS analysis. The results demonstrated that probe 1 outperformed probe 2, and 71% more unique tyrosine-containing peptides were identified using probe 1 compared to probe 2 (Figure 1b and Table S1). For both probes, most of the identified modified peptides are labeled at the tyrosine residues (Figure 1c), and with the fast speed of MS, the side reactions with other residues will not affect the quantification of the tyrosine reactivity. These results indicate that probe 1 is similarly specific as probe 2 but more effective for targeting the tyrosine residues in the human proteome. After finishing the experimental work, we found a recent study revealing that the most dominant modified sites came from cysteine instead of tyrosine for probe 2.51 This may be caused by the radical-based coupling between aryl diazonium salts and amino acid residues rather than azo coupling. Different sample preparation procedures, including the elution of the intact biotin tag in that report and the reduction of the azo group in this work, may be the reasons for the distinct labeling specificities. Compared to probe 2, probe 1 has similar specificity toward the tyrosine residue, but it enabled us to identify many more tyrosine residues among the parallel experiments (3071 vs 1795), as shown in Figure 1.
Quantification of the Tyrosine Residues in the Human Proteome.
The concentration-dependent labeling strategy has been frequently used to evaluate the reactivity of amino acid residues, including cysteine, lysine, aspartate, and glutamate, and reveal their potential biological functions. In this work, we employed probe 1 with better performance and quantitative proteomics to profile the tyrosine reactivity in the human proteome. The protein labeling with probe 1 was concentration-dependent as evaluated by conjugating the labeled proteins with an azide dye using CuAAC and visualizing via in-gel fluorescence scanning (Figure S2), enabling us to study the reactivity of the tyrosine residues.
We treated the whole cell lysates from MCF7 with a high (500 μM) or low (50 μM) concentration of probe 1, respectively, followed by quantitative proteomics to evaluate the reactivity of the tyrosine residues. Multiplexed quantitative proteomics with tandem mass tags (TMTs) as the labeling reagents has been widely used to quantify proteins because it can analyze many samples simultaneously, which reduces the experimental time and increases the quantification accuracy. Here, we combined the azo labeling and multiplexed proteomics to evaluate the reactivity of the tyrosine residues in the biological triplicate experiments. The peptides labeled with each channel of the TMT reagents generate a unique reporter ion in the tandem MS, and the intensities of the reporter ions can be used to quantify peptides among the six samples (Figure 2a). One potential limitation of the TMT method is to cause ratio suppression. However, it does not seem to be an issue in this work because the enriched sample is much simpler than the whole cell lysate. Furthermore, we performed extensive HPLC fractionation to decrease the complexity of the sample.
Figure 2.
(a) Experimental procedure for quantifying the reactivity of the tyrosine residues by integrating azo coupling and multiplexed proteomics. (b) Example tandem mass spectrum of the peptide ITLDNAY#MEK (# refers to the modified site). (c) PCA of the peptide intensities of the samples treated with the high-concentration (H) and low-concentration (L) probes in the biological triplicate experiments.
An example of peptide identification and quantification is shown in Figure 2b. The peptide of ITLDNAY#MEK was confidently identified with an XCorr of 4.5 and the mass accuracy of 3.6 ppm. The ModScore was determined to be 1000, indicating that the modified site was localized on the tyrosine residue. The peptide is from PKM, an important glycolytic enzyme. The intensities of the reporter ions clearly demonstrated that the modification of Y148 was dependent on the probe concentration. To assess the similarity and difference in the high- and low-concentration-treated samples, we next performed principal component analysis (PCA). As shown in Figure 2c, all the replicates in the high-concentration-treated samples clustered together and segregated from the three replicates in the low-concentration-treated samples. The high Pearson correlation coefficient (>0.9) among the three replicates in each group further demonstrated the reasonable reproducibility of the current quantification approach (Figure S3).
Over 5000 tyrosine sites have been characterized from the experiments (Table S2). The majority of the sites (over 95%) are considered as well localized with ModScore larger than 13 (P < 0.05), and about 93% have ModScore larger than 19 (P < 0.01) (Figure 3a). Nearly half (48%) of the proteins contained only one modified tyrosine, while 18% identified proteins had at least four modified tyrosine sites (Figure 3b). For example, glucose-6-phosphate 1-dehydrogenase (G6PD), which plays an important role in glycolysis, was detected with 13 tyrosine sites in this work. We also studied the relationship between the protein length and the number of the modified tyrosine residues per protein, but no significant correlation was found (Figure S4).
Figure 3.
Systematic analysis of the tyrosine residues profiled in the quantification experiment. (a) Distribution of the ModScore values for the modified tyrosine sites. (b) Distribution of the number of the identified tyrosine residues per protein. (c) GRAVY values of proteins with the modified tyrosine residues and the values from all proteins in the human proteome (***P < 0.001). (d) Overlap between the identified tyrosine residues and the tyrosine phosphorylation sites in the PhosphoSitePlus database.
The GRAVY values, indicating the extent of hydrophobicity, from the proteins with the identified tyrosine residues were lower than those from the whole proteome (Figure 3c). The result suggests that the identified proteins were more hydrophilic, which may facilitate the reaction between tyrosine and the aryl diazonium-alkyne probe in aqueous solution. About 50% of the modified tyrosine residues were annotated as phosphorylation sites based on the information from the PhosphoSitePlus database, consistent with the previous results from sulfur-triazole exchange chemistry (Figure 3d).27 Next, we investigated the relationship between the modified sites and protein secondary structures. NetsurfP52 was used to predict the secondary structures of the identified proteins, and the results are shown in Figure S5. Among all the tyrosine residues, the percentage at the coil structure was the highest, while the lowest percentage came from the β-strand structure. The disordered structures of coils may increase the chance of the exposure of the tyrosine residues to the solvent and therefore promote the azo-coupling reaction.
Analysis of the Quantified Tyrosine Residues.
The distribution of the measured H/L ratios (high versus low concentration) is displayed in Figure 4a. The ratios seem to be higher than those from some previously reported results about tyrosine, lysine, and cysteine. The ratio distribution is related to multiple experimental conditions, such as the reaction time and temperature. Furthermore, the higher and lower concentrations play a very important role in the ratio distribution. Comparison of the quantified tyrosine-containing peptides and proteins from this work and the previous report27 showed that the overlap is relatively low (Figure S6), which indicated that different types of probes targeted different tyrosine residues in the proteome. The possible main reason is that the reaction mechanisms are different (the electrophilic aromatic substitution reaction in this work vs the nucleophilic substitution reaction in the previous report). This further emphasizes the importance of developing new chemical strategies to investigate the properties of the amino acid residues.
Figure 4.

(a) Distribution of the measured ratios (H/L) for the quantified tyrosine residues in MCF7 cells labeled with high or low concentration of probe 1. (b) Comparison of the reported phosphorylation sites (pY) and nonphosphorylation sites (non-pY) among the quantified tyrosine residues in the different ratio ranges. (c) Distribution of the relative surface accessibility (RSA) of the tyrosine sites in the groups with different ratio ranges.
Here, the quantified tyrosine residues were separated into three groups based on their ratios, and then we performed further investigation regarding the differences among these groups. Among all the quantified tyrosine sites, 34 and 49% of the sites have the ratios of 10–20 and >20, respectively, while only 17% have the ratios of <10, of which 75 sites (about 9%) are <5 (Figure 4a and Table S3). The H/L ratio is correlated with the reactivity of the tyrosine residues. For example, the site at Y197 of glycogen phosphorylase (PYGB), a critical enzyme in glycogen metabolism, was quantified with a ratio of 5.9 (Figure S7). Previous research showed that Y197 forms a hydrogen bond with AMP (adenosine monophosphate), which promotes the binding of AMP to PYGB and then activates its catalytic function.53 Another two tyrosine residues on this protein were also quantified, but they have much higher ratios: 26.8 for Y473 and 12.2 for Y821.
The fraction of the quantified tyrosine sites that were reported as phosphorylation sites differed in each category based on the ratios with the smallest fraction from the group with the ratio of <10 (0.45) (Figure 4b). The distribution of the relative surface accessibility (RSA) of the quantified tyrosine sites was similar among the three groups, which indicates that the reactivity of tyrosine is not determined by its accessibility in the aqueous solution and may be regulated by other factors, such as the microenvironment created by nearby amino acids. For the tyrosine residues in the three groups, the acidic amino acids (D and E) were enriched near the quantified tyrosine sites (±1 and ±2 positions) (Figure S8a–c). However, as the ratio increased, the basic amino acids (K or R) at the positions of +5 and +6 became overrepresented. We reason that this unique pattern that creates specific microenvironment may lead to different reactivities of tyrosine on proteins. The carboxyl group can form a hydrogen bond with tyrosine, which may increase the reactivity of tyrosine. However, when a basic amino acid residue is around, the electrostatic effect between protonated K or R and the carboxyl group becomes dominant, which may prevent the formation of the hydrogen bond between the carboxyl group and tyrosine (Figure S8d). Therefore, it may result in the lower reactivity of the tyrosine.
Functional Analysis of Proteins with the Quantified Tyrosine Residues.
Reactive amino acid residues are central to the biological functions of proteins because they are essential for many important activities of proteins, such as catalytic activity and ligand binding. Here, we identified >700 proteins containing the quantified tyrosine sites with an H/L ratio of <10 (Table S3c). Functional annotation revealed that these proteins are highly enriched in different activities including nucleotide binding, enzyme binding, drug binding, protein kinase binding, and hydrolase activity (Figure 5a). Particularly, about 30% of the proteins have catalytic activity.
Figure 5.

(a) Clustering based on the molecular function for the proteins containing the quantified tyrosine residues with a ratio of <10. (b) Functional annotation of the proteins containing the quantified sites with a ratio of <5. (c) Fraction of proteins possessing unique tyrosine residues with a ratio of <5 found in DrugBank. (d) Domain analysis of the tyrosine sites with a ratio of <5. (e, f) Example structures of the proteins ENPP1 (PDB: 4B56) and UBE3C (PDB: 6K2C) with the reactive tyrosine residues (H/L < 5).
To further analyze the protein functions, we performed network clustering using clusterMaker in Cytoscape, which analyzes the protein complexes in a network. Protein complexes formed through protein–protein interactions (PPIs) are very important to a cellular system. The MCL algorithm generated over 50 protein complexes (Figure S9a). The largest cluster contains proteins related to mRNA splicing. The generated clusters are involved in different pathways including aminoacyl-tRNA biosynthesis, endocytosis, and the insulin signaling pathway. We also compared the generated protein complexes to CORUM, a database deposited with manually curated protein complexes, to further evaluate the biological functions of the network. Two examples are shown in Figure S9b, and the proteins containing the tyrosine sites with a ratio of <10 were reported to participate in the EIF3 complex and the proteasome.
For the tyrosine residues with a lower H/L ratio (<5), their reactivity is supposed to be higher. Proteins with these tyrosine residues are correlated with different types of activities and functions (Figure 5b), with the dominant groups having catalytic activity (45%) and binding (34%). They were also found to participate in different types of protein complexes (Figure S9b). For proteins harboring the tyrosine residues with a ratio of <5, 23% were found in DrugBank, half of which are enzymes (Figure 5c). Non-DrugBank proteins (77%) came from various protein classes including transporters, transcription factors, and cytoskeletal proteins, which are usually difficult to be targeted by small molecules. Of these sites with the ratio of <5, 59% were located in different protein domains (Figure 5d), including binding, enzyme, and protein–protein interaction domains, which indicated that these sites may participate in the mediation of protein functions. We also examined how the reactive sites may affect protein functions. Ectonucleotide pyrophosphatase/phosphodiesterase-1 (ENPP1) is a calcium- and zinc-dependent enzyme that can hydrolyze phosphodiester or pyrophosphate bonds.54 Y803 (Figure 5e) was located in the nuclease-like domain of ENPP1 and found to be within the binding pocket for Ca2+ (<5 Å).55 A reactive site (Y1061, Figure 5f) was identified on the E3 ligase catalytic domain of ubiquitin-protein ligase E3C (UBE3C), an enzyme involved in protein ubiquitination. Although the site is not near the catalytic site of C1051 (>5 Å), it may affect the function of UBE3C through allosteric regulation since both Y1061 and C1051 are located on the same domain.56 Overall, the tyrosine residues with high reactivity were found on a wide range of protein classes, which provides valuable information for future development of covalent drugs to manipulate protein activities and expand the scope of drug targets.
CONCLUSIONS
The tyrosine residue is involved in diverse protein functions including ligand binding, catalysis, and cell signaling. The functions of tyrosine are often correlated with its intrinsic reactivity. Comprehensive investigation of the reactivity of tyrosine in the whole proteome facilitates the development of chemical probes and covalent drugs to selectively tune the biological activities of proteins. In this work, we developed a new method integrating azo coupling, bioorthogonal chemistry, and multiplexed proteomics to globally study the tyrosine residues in the human proteome. Based on the azo-coupling reaction between aryl diazonium salt and the tyrosine residue, the probe can specifically target the tyrosine residues. After the reaction, tagged tyrosine-containing peptides were selectively enriched using bioorthogonal chemistry, and a small tag on the peptides from the cleavage perfectly fits for site-specific analysis by MS. Over 5000 tyrosine residues were quantified in MCF7 cells from the biological triplicate experiments. Although most of them were found to display a high H/L value (>10) in the concentration-dependent experiment, the quantified tyrosine residues with a low H/L ratio (<5) were found to be located in proteins with different functions including catalytic activity, kinase binding, and hydrolase activity. In combination with multiplexed proteomics, this method enables the global profiling of tyrosine in the proteome, resulting in a better understanding of protein functions, and leading to the development of covalent drugs to regulate protein activity.
Supplementary Material
ACKNOWLEDGMENTS
This work was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award number R01GM127711.
Footnotes
The authors declare no competing financial interest.
ASSOCIATED CONTENT
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.analchem.1c01935.
Experimental procedure of the protein-centric strategy (Figure S1); in-gel fluorescence experimental results (Figure S2); reproducibility evaluation from the biological triplicate experiments (Figure S3); the relationship between the number of the modified tyrosine residues per protein and the protein length (Figure S4); the distribution of the tyrosine residues in different secondary structures of proteins (Figure S5); comparison of the quantified tyrosine-containing peptides and proteins from this work and the previous report (Figure S6); an example of the quantified tyrosine from glycogen phosphorylase (Figure S7); motif analysis for the quantified tyrosine residues with different ratios (Figure S8); a protein–protein interaction network (Figure S9) (PDF)
Identification of the tyrosine residues using probe 1 and probe 2 (Table S1) (XLSX)
Identification of the tyrosine residues from the concentration-dependent experiment (Table S2) (XLSX)
Quantification of the tyrosine residues from the concentration-dependent experiment (Table S3) (XLSX)
Complete contact information is available at: https://pubs.acs.org/10.1021/acs.analchem.1c01935
Contributor Information
Fangxu Sun, School of Chemistry and Biochemistry and the Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, Georgia 30332, United States.
Suttipong Suttapitugsakul, School of Chemistry and Biochemistry and the Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, Georgia 30332, United States.
Ronghu Wu, School of Chemistry and Biochemistry and the Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, Georgia 30332, United States.
REFERENCES
- (1).Aebersold R; Mann M Nature 2016, 537, 347–355. [DOI] [PubMed] [Google Scholar]
- (2).Cravatt BF; Simon GM; Yates JR III Nature 2007, 450, 991–1000. [DOI] [PubMed] [Google Scholar]
- (3).Altelaar AFM; Munoz J; Heck AJR Nat. Rev. Genet. 2013, 14, 35–48. [DOI] [PubMed] [Google Scholar]
- (4).Chen W; Smeekens JM; Wu R Chem. Sci. 2016, 7, 1393–1400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (5).Arul AB; Robinson RAS Anal. Chem. 2019, 91, 178–189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (6).Huang M; Wang Y Mass Spectrom. Rev. 2021, 40, 215–235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (7).Woo CM; Felix A; Byrd WE; Zuegel DK; Ishihara M; Azadi P; Iavarone AT; Pitteri SJ; Bertozzi CR J. Proteome Res. 2017, 16, 1706–1718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (8).Suttapitugsakul S; Sun F; Wu R Anal. Chem. 2020, 92, 267–291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (9).Yang Y; Franc V; Heck AJR Trends Biotechnol. 2017, 35, 598–609. [DOI] [PubMed] [Google Scholar]
- (10).Olsen JV; Blagoev B; Gnad F; Macek B; Kumar C; Mortensen P; Mann M Cell 2006, 127, 635–648. [DOI] [PubMed] [Google Scholar]
- (11).Wu R; Haas W; Dephoure N; Huttlin EL; Zhai B; Sowa ME; Gygi SP Nat. Methods 2011, 8, 677–683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (12).Aguilar HA; Iliuk AB; Chen IH; Tao WA Nat. Protoc. 2020, 15, 161–180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).Taylor BC; Young NL Biochem. J. 2021, 478, 511–532. [DOI] [PubMed] [Google Scholar]
- (14).Udeshi ND; Svinkina T; Mertins P; Kuhn E; Mani DR; Qiao JW; Carr SA Mol. Cell. Proteomics 2013, 12, 825–831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (15).Li Y; Evers J; Luo A; Erber L; Postler Z; Chen Y Angew. Chem., Int. Ed. 2019, 131, 547–551. [DOI] [PubMed] [Google Scholar]
- (16).Weerapana E; Wang C; Simon GM; Richter F; Khare S; Dillon MBD; Bachovchin DA; Mowen K; Baker D; Cravatt BF Nature 2010, 468, 790–795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (17).Fomenko DE; Xing W; Adair BM; Thomas DJ; Gladyshev VN Science 2007, 315, 387–389. [DOI] [PubMed] [Google Scholar]
- (18).Wang J; Liu Y; Liu Y; Zheng S; Wang X; Zhao J; Yang F; Zhang G; Wang C; Chen PR Nature 2019, 509–513. [DOI] [PubMed] [Google Scholar]
- (19).Hacker SM; Backus KM; Lazear MR; Forli S; Correia BE; Cravatt BF Nat. Chem. 2017, 9, 1181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (20).Ma N; Hu J; Zhang Z-M; Liu W; Huang M; Fan Y; Yin X; Wang J; Ding K; Ye W; Li ZJ Am. Chem. Soc. 2020, 142, 6051–6059. [DOI] [PubMed] [Google Scholar]
- (21).Bach K; Beerkens BLH; Zanon PRA; Hacker SM ACS Cent. Sci. 2020, 6, 546–554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (22).Lin S; Yang X; Jia S; Weeks AM; Hornsby M; Lee PS; Nichiporuk RV; Iavarone AT; Wells JA; Toste FD; Chang CJ Science 2017, 355, 597–602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (23).Mukherjee H; Grimster NP Curr. Opin. Chem. Biol. 2018, 44, 30–38. [DOI] [PubMed] [Google Scholar]
- (24).Gu C; Shannon DA; Colby T; Wang Z; Shabab M; Kumari S; Villamor JG; McLaughlin CJ; Weerapana E; Kaiser M; Cravatt BF; van der Hoorn RAL Chem. Biol. 2013, 20, 541–548. [DOI] [PubMed] [Google Scholar]
- (25).Koide S; Sidhu SS ACS Chem. Biol. 2009, 4, 325–334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (26).Mortenson DE; Brighty GJ; Plate L; Bare G; Chen W; Li S; Wang H; Cravatt BF; Forli S; Powers ET; Sharpless KB; Wilson IA; Kelly JW J. Am. Chem. Soc. 2018, 140, 200–210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (27).Hahm HS; Toroitich EK; Borne AL; Brulet JW; Libby AH; Yuan K; Ware TB; McCloud RL; Ciancone AM; Hsu K-L Nat. Chem. Biol. 2020, 16, 150–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (28).Brulet JW; Borne AL; Yuan K; Libby AH; Hsu KL J. Am. Chem. Soc. 2020, 142, 8270–8280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (29).Sengupta S; Chandrasekaran S Org. Biomol. Chem. 2019, 17, 8308–8329. [DOI] [PubMed] [Google Scholar]
- (30).Addy PS; Erickson SB; Italia JS; Chatterjee AJ Am. Chem. Soc. 2017, 139, 11670–11673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (31).Gavrilyuk J; Ban H; Nagano M; Hakamata W; Barbas CF III Bioconjugate Chem. 2012, 23, 2321–2328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (32).Hooker JM; Kovacs EW; Francis MB J. Am. Chem. Soc. 2004, 126, 3718–3719. [DOI] [PubMed] [Google Scholar]
- (33).Schlick TL; Ding Z; Kovacs EW; Francis MB J. Am. Chem. Soc. 2005, 127, 3718–3723. [DOI] [PubMed] [Google Scholar]
- (34).Mo F; Dong G; Zhang Y; Wang J Org. Biomol. Chem. 2013, 11, 1582–1593. [DOI] [PubMed] [Google Scholar]
- (35).Evrard D; Lambert F; Policar C; Balland V; Limoges B Chem. – Eur. J. 2008, 14, 9286–9291. [DOI] [PubMed] [Google Scholar]
- (36).Ma X; Herzon SB Beilstein J. Org. Chem. 2018, 14, 2259–2265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (37).Xiao H; Wu R Anal. Chem. 2017, 89, 3656–3663. [DOI] [PubMed] [Google Scholar]
- (38).Eng JK; McCormack AL; Yates JR J. Am. Soc. Mass Spectrom. 1994, 5, 976–989. [DOI] [PubMed] [Google Scholar]
- (39).Elias JE; Gygi SP Nat. Methods 2007, 4, 207–214. [DOI] [PubMed] [Google Scholar]
- (40).Käll L; Canterbury JD; Weston J; Noble WS; MacCoss MJ Nat. Methods 2007, 4, 923–925. [DOI] [PubMed] [Google Scholar]
- (41).Beausoleil SA; Villén J; Gerber SA; Rush J; Gygi SP Nat. Biotechnol. 2006, 24, 1285–1292. [DOI] [PubMed] [Google Scholar]
- (42).Hornbeck PV; Zhang B; Murray B; Kornhauser JM; Latham V; Skrzypek E Nucleic Acids Res. 2015, 43, D512–D520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (43).O’Shea JP; Chou MF; Quader SA; Ryan JK; Church GM; Schwartz D Nat. Methods 2013, 10, 1211–1212. [DOI] [PubMed] [Google Scholar]
- (44).Shannon P; Markiel A; Ozier O; Baliga NS; Wang JT; Ramage D; Amin N; Schwikowski B; Ideker T Genome Res. 2003, 13, 2498–2504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (45).Szklarczyk D; Morris JH; Cook H; Kuhn M; Wyder S; Simonovic M; Santos A; Doncheva NT; Roth A; Bork P; Jensen LJ; von Mering C Nucleic Acids Res. 2016, 45, D362–D368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (46).Mi H; Muruganujan A; Ebert D; Huang X; Thomas PD Nucleic Acids Res. 2018, 47, D419–D426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (47).Morris JH; Apeltsin L; Newman AM; Baumbach J; Wittkop T; Su G; Bader GD; Ferrin TE BMC Bioinf. 2011, 12, 436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (48).Giurgiu M; Reinhard J; Brauner B; Dunger-Kaltenbach I; Fobo G; Frishman G; Montrone C; Ruepp A Nucleic Acids Res. 2018, 47, D559–D563. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (49).Pandurangan AP; Stahlhacke J; Oates ME; Smithers B; Gough J Nucleic Acids Res. 2018, 47, D490–D494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (50).Berman HM; Westbrook J; Feng Z; Gilliland G; Bhat TN; Weissig H; Shindyalov IN; Bourne PE Nucleic Acids Res. 2000, 28, 235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (51).Patrick RAZ; Fengchao Y; Patricia M; Lisa L; Michael Z; Kristina K; Dario M; Patrick R; Thomas EM; Marko C; Christopher C; Kathrin L; F. Dean T; Alexey IN; Stephan MH Profiling the Proteome-Wide Selectivity of Diverse Electrophiles, 2021, DOI: 10.26434/chemrxiv.14186561.v1. [DOI] [Google Scholar]
- (52).Petersen B; Petersen TN; Andersen P; Nielsen M; Lundegaard C BMC Struct. Biol. 2009, 9, 51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (53).Mathieu C; de la Sierra-Gallay IL; Duval R; Xu X; Cocaign A; Léger T; Woffendin G; Camadro J-M; Etchebest C; Haouz A; Dupret J-M; Rodrigues-Lima F J. Biol. Chem. 2016, 291, 18072–18083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (54).Onyedibe KI; Wang M; Sintim HO Molecules 2019, 24, 4192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (55).Jansen S; Perrakis A; Ulens C; Winkler C; Andries M; Joosten RP; Van Acker M; Luyten FP; Moolenaar WH; Bollen M Structure 2012, 20, 1948–1959. [DOI] [PubMed] [Google Scholar]
- (56).Singh S; Sivaraman, J. Biochem. J. 2020, 477, 905–923. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


