Skip to main content
eLife logoLink to eLife
. 2023 Mar 16;12:e82345. doi: 10.7554/eLife.82345

High-throughput profiling of sequence recognition by tyrosine kinases and SH2 domains using bacterial peptide display

Allyson Li 1, Rashmi Voleti 1, Minhee Lee 1, Dejan Gagoski 1,2, Neel H Shah 1,
Editors: Tony Hunter3, Jonathan A Cooper4
PMCID: PMC10065799  PMID: 36927728

Abstract

Tyrosine kinases and SH2 (phosphotyrosine recognition) domains have binding specificities that depend on the amino acid sequence surrounding the target (phospho)tyrosine residue. Although the preferred recognition motifs of many kinases and SH2 domains are known, we lack a quantitative description of sequence specificity that could guide predictions about signaling pathways or be used to design sequences for biomedical applications. Here, we present a platform that combines genetically encoded peptide libraries and deep sequencing to profile sequence recognition by tyrosine kinases and SH2 domains. We screened several tyrosine kinases against a million-peptide random library and used the resulting profiles to design high-activity sequences. We also screened several kinases against a library containing thousands of human proteome-derived peptides and their naturally-occurring variants. These screens recapitulated independently measured phosphorylation rates and revealed hundreds of phosphosite-proximal mutations that impact phosphosite recognition by tyrosine kinases. We extended this platform to the analysis of SH2 domains and showed that screens could predict relative binding affinities. Finally, we expanded our method to assess the impact of non-canonical and post-translationally modified amino acids on sequence recognition. This specificity profiling platform will shed new light on phosphotyrosine signaling and could readily be adapted to other protein modification/recognition domains.

Research organism: E. coli

Introduction

Cells respond to external stimuli by activating a finely-tuned cascade of enzymatic reactions and protein-protein interactions. This signal transduction is governed, in large part, by post-translational modifications that alter protein activity, stability, and localization, as well as the formation of higher-order macromolecular complexes. Despite its low abundance relative to serine and threonine phosphorylation, tyrosine phosphorylation is an essential post-translational modification in metazoans (Lim and Pawson, 2010). Tyrosine kinases, the enzymes that phosphorylate tyrosine residues on proteins, and Src homology 2 (SH2) domains, protein modules that bind tyrosine-phosphorylated sequences, must have the ability to discriminate among a myriad of potential phosphorylation sites (phosphosites) in the proteome, in order to ensure proper signal transduction. The preferential engagement of specific phosphosites by tyrosine kinases and SH2 domains is dependent on the amino acid sequence surrounding the tyrosine or phosphotyrosine residue (Songyang et al., 1995; Songyang et al., 1993).

Isolated tyrosine kinase domains most efficiently engage phosphosites that conform to specific sequence motifs, which are defined by a small number of key residues that contribute significantly to recognition (Songyang et al., 1995). These motifs suggest a mechanism by which a specific set of phosphosites in a proteome is selectively engaged by an individual kinase, based on the presence of favorable sequence features around that site. Negative selection of specific sequence features can also play a role in kinase specificity (Alexander et al., 2011). For example, the T cell tyrosine kinase ZAP-70 cannot readily phosphorylate co-localized proteins that contain even a modest positive charge (Shah et al., 2016).

Phosphosite sequence recognition by kinase domains is just one mechanism of substrate selection for tyrosine kinases, and other interactions are necessary to achieve efficient substrate targeting in vivo. Binding domains, such as SH2 domains, can strongly influence specificity by localizing kinases to the vicinity of phosphorylation targets (Pawson and Nash, 2000). Secondary interactions between SH2 and kinase domains can also refine the substrate preferences of a tyrosine kinase by stabilizing its active state (Filippakopoulos et al., 2008). Thus, for signaling systems that involve a tyrosine kinase domain and a tethered SH2 domain, the sequence specificities of both domains contribute to the intricate control of phosphotyrosine signaling responses.

Many methods have been developed to characterize sequence recognition by tyrosine kinases and SH2 domains. The most prominent approach employs purified kinases/SH2 domains and oriented peptide libraries, which are synthetic, degenerate peptide libraries with a central tyrosine or phosphotyrosine residue (Songyang et al., 1995; Songyang et al., 1993). Several variations on this technique have been reported to improve the throughput and quantification of sequence preferences (Deng et al., 2014; Huang et al., 2008; Hutti et al., 2004; Mok et al., 2010). Notably, this method is also applicable to serine/threonine kinases, and large swaths of the yeast and human kinomes have been characterized using oriented peptide libraries, providing significant insights into kinase-substrate recognition and phospho-signaling (Deng et al., 2014; Johnson et al., 2023; Mok et al., 2010; Songyang et al., 1995). Oriented peptide library screens have primarily been useful for determining the preference for each amino acid at a given position, independent of sequence context, but evidence suggests that some amino acid preferences may depend on the surrounding sequence (Cantor et al., 2018).

Several groups have developed strategies to compare the phosphorylation of specific sequences, rather than obtain position-averaged amino acid preferences from pooled degenerate libraries. Strategies include ‘one-bead-one-peptide’ combinatorial libraries (Imhof et al., 2006; Ren et al., 2011; Sweeney et al., 2005; Trinh et al., 2013; Wavreille et al., 2007) and protein/peptide microarrays (Amanchy et al., 2008; Jones et al., 2006; Koytiger et al., 2013; Mok et al., 2009; Schutkowski et al., 2004; Uttamchandani et al., 2003). One-bead-one-peptide methods often require manual isolation and individual sequencing of positive (phosphorylated or SH2-bound) beads, making the method technically challenging. Microarrays offer the capacity to analyze thousands of discrete sequences and require small quantities of proteins, but their use can be limited by the high cost of reagents. As an alternative, several groups have conducted mass spectrometry proteomics on heterologously expressed purified peptide libraries, kinase-treated cell extracts, and cells over-expressing a kinase of interest (Barber et al., 2018; Chou et al., 2012; Corwin et al., 2017; Douglass et al., 2012; Finneran et al., 2020; Imamura et al., 2014; Kettenbach et al., 2012; Lubner et al., 2018; Sugiyama et al., 2019; Xue et al., 2012). This strategy has enabled the identification of potential substrates and can also be used to infer position-specific amino acid preferences. Studies using intact proteomes have the added benefit that the kinase of interest is operating on intact proteins, rather than isolated peptides, but interpretation of the results can be convoluted by the presence of endogenous kinases.

Molecular display techniques, such as mRNA, phage, yeast, and bacterial display, have also been used for specificity profiling. Early investigations employed phage or mRNA display to profile tyrosine kinase and SH2 specificity. These methods were relatively low-throughput, as they relied on Sanger sequencing of individual clones (Cujec et al., 2002; Dente et al., 1997). The advent of deep sequencing technologies has transformed this style of specificity profiling, by enabling rapid, quantitative analysis of library composition without requiring the sequencing of individual clones. This was demonstrated recently in a series of studies that employed bacterial/yeast peptide display, fluorescence-activated cell sorting (FACS), and deep sequencing to profile tyrosine kinase and SH2 domain specificity (Cantor et al., 2018; Lo et al., 2019; Shah et al., 2018; Shah et al., 2016; Taft et al., 2019). A key facet of these investigations was the facile generation of peptide libraries tailored to specific mechanistic questions: these included scanning mutagenesis libraries derived from individual substrates (Shah et al., 2016), as well as diverse peptide libraries encoding known phosphosites in the human proteome (Shah et al., 2018).

In this report, we describe a high-throughput platform to profile the recognition of large peptide libraries by any tyrosine kinase or SH2 domain. Our approach uses biotinylated bait proteins (pan-phosphotyrosine antibodies or SH2 domains) and avidin-functionalized magnetic beads to isolate tyrosine kinase-phosphorylated bacterial cells, and is coupled to deep sequencing for a quantitative readout (Figure 1A). The use of magnetic bead-based separation, rather than FACS, permits simultaneous, benchtop processing of multiple samples and enables the analysis of larger libraries for less time and cost. Libraries can be custom-made for specific readouts: mutational scanning for structure-activity relationships, libraries derived from natural proteomes to answer specific signaling questions, or degenerate libraries for the generation of predictive models.

Figure 1. High-throughput profiling of tyrosine kinase substrate specificity using bacterial peptide display.

(A) Schematic representation of the workflow for kinase specificity profiling. (B) Heatmap depicting the specificity of the c-Src kinase domain, measured using the X5-Y-X5 library. Enrichment scores were log2-transformed and are displayed on a color scale from blue (disfavored sequence features, negative value), to white (neutral sequence features, near zero value), to red (favored sequence features, positive value). Values in the heatmap are the average of three replicates. (C) Correlation between position-specific amino acid enrichments from screens with the 4G10 Platinum and PY20 biotinylated pan-phosphotyrosine antibodies.

Figure 1.

Figure 1—figure supplement 1. Composition of the X5-Y-X5 library.

Figure 1—figure supplement 1.

(A) Table showing the read counts for all amino acids and the stop codon across all positions in the strep-tagged X5-Y-X5 library, from one sequencing run with an unselected (input) library. (B) Correlation of amino acid frequencies at each position from two replicates of the input library. (C) Distribution of frequencies from two replicates of the input library.
Figure 1—figure supplement 1—source data 1. Counts table corresponding to one sequence run from an input X5-Y-X5 library.

Figure 1—figure supplement 2. Phosphorylation of the X5-Y-X5 library by c-Src.

Figure 1—figure supplement 2.

Flow cytometry analysis monitoring the distribution of phosphotyrosine levels over time (left). The mean fluorescence intensities, which represent phosphorylation levels, plotted as a function of time (right).

Figure 1—figure supplement 3. Heatmap and logo depicting the specificity of the c-Src kinase domain, measured using the X5-Y-X5 library.

Figure 1—figure supplement 3.

Only peptides with one central tyrosine were considered in this analysis. Enrichment scores were log2-transformed and are displayed on a color scale from blue (disfavored sequence features, negative value), to white (neutral sequence features, near zero value), to red (favored sequence features, positive value). The same values were used to plot the heatmap and the sequence logo. The height for the central ‘Y’ in the sequence logo is an arbitrary value, chosen for optimal visualization of other features. Values are the average of three replicates.

To demonstrate the versatility of our approach, we designed two new bacterial peptide display libraries that provide distinct insights into tyrosine kinase and SH2 sequence recognition. The first library contains 106–107 random 11-residue sequences with a central tyrosine (referred to as the X5-Y-X5 library). Screens with the X5-Y-X5 library recapitulate previously reported specificity motifs and can be used to generate highly efficient peptide substrates. The second library contains defined sequences spanning 3000 human tyrosine phosphorylation sites, along with 5000 variant sequences bearing disease-associated mutations and natural polymorphisms (referred to as the pTyr-Var library). Kinase and SH2 screens with the pTyr-Var library reveal hundreds of phosphosite-proximal mutations that significantly impact phosphosite recognition by individual protein domains. These datasets will be a valuable resource in the growing efforts to understand the functional impact of protein variants across the human population that may contribute to disease (Stein et al., 2019). Finally, we show that our peptide display platform is compatible with Amber codon suppression, enabling analysis of how non-canonical or post-translationally modified amino acids impact sequence recognition. Overall, the method described in this report provides an accessible, high-throughput platform to study the specificity of phosphotyrosine signaling proteins.

Results and discussion

A bacterial display and deep sequencing platform to screen tyrosine kinases against large peptide libraries

We expanded upon a previously established screening platform that combines bacterial display of genetically encoded peptide libraries and deep sequencing to quantitatively compare phosphorylation efficiencies across a substrate library (Shah et al., 2016). In the published approach, peptides are displayed on the surface of E. coli cells as fusions to an engineered bacterial surface-display protein, eCPX (Rice and Daugherty, 2008), then phosphorylated by a purified kinase (Henriques et al., 2013). Following this, the cells are labeled with a pan-phosphotyrosine antibody, and cells with high phosphorylation levels are separated by FACS. The DNA encoding the peptides is then amplified and analyzed by Illumina deep sequencing to determine the frequency of each peptide in the library before and after selection (Shah et al., 2018; Shah et al., 2016). In order to determine the phosphorylation efficiency of each peptide by a particular kinase, an enrichment score is determined by calculating the frequency of that peptide in the kinase-selected sample normalized to the frequency in the input sample.

While peptide libraries of virtually any composition can theoretically be screened using this approach, previous implementations focused on libraries containing less than 5000 peptides, due to the low throughput of FACS (Shah et al., 2018). In those experiments, the objective was to over-sample the library at the cell sorting step by a factor of 100–1000, to ensure that enrichment or depletion of every member of the library could be accurately quantified by deep sequencing. When multiple screens were conducted in parallel, the throughput of FACS limited experiments to small libraries (less than 5000 sequences). To improve the scalability and cost-effectiveness of this approach, we switched to a bead-based sorting method, using avidin-coated magnetic beads to enrich highly-phosphorylated cells, thus circumventing the need for FACS (Figure 1A). With this approach, the cells are instead labeled with biotinylated pan-phosphotyrosine antibodies and then sorted using magnetic beads. The use of magnetic beads permits simultaneous separation of multiple samples of virtually any size, enabling larger library analysis for less time and cost. Notably, these screens can be carried out in any laboratory, without the need for a fluorescence-activated cell sorter.

To test our upgraded screening platform, we generated a random library of 11-residue sequences with a central tyrosine (the X5-Y-X5 library, where X is any of the 20 canonical amino acids). The library was generated using a degenerate synthetic oligonucleotide with five NNS codons (N=A,T,G,C and S=G,C) before and after the central codon that encodes for tyrosine (TAT). The NNS triplet has the benefit of encoding all 20 amino acids, but it can still contain an Amber stop codon (TAG) roughly 3% of the time. Therefore, up to 30% of the peptide-coding sequences in the library are expected to have an Amber stop codon – a feature that we take advantage of later in this study. The degenerate oligonucleotide mixture was cloned into a plasmid in between the DNA encoding a signal sequence and the eCPX surface-display scaffold. In a previously reported version of this platform, the eCPX scaffold contained a C-terminal strep-tag to detect surface-display level (Shah et al., 2016). Due to the potential background binding of the strep-tag with the avidin-coated magnetic beads during cell enrichment, we cloned both a strep-tagged and a myc-tagged version of the library. Deep sequencing of both versions of the X5-Y-X5 library confirmed that they have 1–10 million unique peptide sequences, 20% of which contain one or more stop codons. Furthermore, all 20 canonical amino acids were well-represented at each of the 10 variable positions surrounding the fixed tyrosine residue (Figure 1—figure supplement 1). Notably, our library includes peptides containing Cys residues and non-central Tyr residues, both of which are often excluded from tyrosine kinase specificity screens to avoid oxidation-related artifacts and challenges in interpreting signal from multi-Tyr sequences (Deng et al., 2014). These sequences can be filtered during data analysis, if needed, although they did not pose significant issues in our studies.

Using the myc-tagged X5-Y-X5 library, we determined the position-specific amino acid preferences of the kinase domain of c-Src. Cells displaying the library were phosphorylated by c-Src to achieve roughly 20–30% phosphorylation, as determined by flow cytometry (Figure 1—figure supplement 2). The phosphorylated cells were labeled with a biotinylated anti-phosphotyrosine antibody and enriched with magnetic beads, then peptide-coding DNA sequences were counted by deep sequencing. We visualized the sequence preferences of c-Src by generating a heatmap and sequence logo based on the position-specific enrichment scores of each amino acid residue surrounding the central tyrosine (Figure 1B, Figure 1—figure supplement 3). Sequences containing a stop codon were not considered in these calculations, but the depletion of stop codons at each position was separately confirmed and is reported below the heatmap on the same color scale. The preferences determined from this screen matched the sequence specificity of c-Src defined by prior reports using oriented peptide libraries (Deng et al., 2014; Songyang et al., 1995). We observed a strong preference for bulky aliphatic residues (Ile/Leu/Val) at the –1 position relative to the central tyrosine and a phenylalanine at the +3 position (Figure 1B, Figure 1—figure supplement 3). Our results showed modest differences from the specificity observed by oriented peptide libraries, including a strong preference for a+1 Asp/Glu/Ser in addition to the previously reported +1 Gly. To test whether these differences were due to biases introduced by the specific pan-phosphotyrosine antibody used, we obtained a different commercially available biotinylated pan-phosphotyrosine antibody and repeated the screen. The position-specific amino acid enrichments obtained using both antibodies were nearly identical (Figure 1C). This suggests that there is no significant bias in the enrichment of peptides introduced by the pan-phosphotyrosine antibody.

Degenerate library screens capture specificity profiles for diverse tyrosine kinases

We next used the degenerate X5-Y-X5 library to characterize the sequence preferences of four additional tyrosine kinase domains, derived from the non-receptor tyrosine kinases c-Abl and Fer, and the receptor tyrosine kinases EPHB1 and EPHB2. The kinases were selected because they represent a few distinct branches of the tyrosine kinome and can be easily produced through bacterial expression (Albanese et al., 2018). The X5-Y-X5 library was screened against the kinases in triplicate, and the data from replicates were averaged to generate specificity profiles for each kinase (Figure 2A and Figure 2—source data 1). The amino acid preferences for c-Abl are well-characterized and were recapitulated in this screen (Deng et al., 2014; Songyang et al., 1995; Till et al., 1999; Till et al., 1994). Like c-Src, c-Abl preferred bulky aliphatic residues at the –1 position with respect to the central tyrosine. Unlike c-Src, c-Abl preferred an alanine at the +1 position and had a notably strong preference for proline at the +3 position (Figures 1B and 2A, Figure 2—figure supplement 1). Fer showed a specificity pattern distinct from both c-Src and c-Abl, which included a preference for tryptophan residues at the +2,+3, and +4 positions. As expected, the closely related EPHB1 and EPHB2 kinases had similar specificities, which included a unique preference for Asn and Asp at the –1 residue that was not observed for the tested non-receptor tyrosine kinases (Figure 2A, Figure 2—figure supplement 1).

Figure 2. Specificity profiling of tyrosine kinases using the X5-Y-X5 library.

(A) Heatmaps depicting the specificities of c-Abl, Fer, EPHB1, and EPHB2. Enrichment scores were log2-transformed and are displayed on a color scale from blue (disfavored sequence features, negative value), to white (neutral sequence features, near zero value), to red (favored sequence features, positive value). Values in the heatmaps are the average of three replicates. (B) Sequences of consensus peptides identified through X5-Y-X5 screens, compared with previously reported SrcTide and AblTide sequences. (C) Phosphorylation kinetics of five consensus peptides against five kinases. Initial rates were normalized to the rate of the cognate consensus peptide. All peptides were used at a concentration of 100 μM, and the kinases were used at a concentration of 10–50 nM. Error bars represent the standard deviation from at least three measurements.

Figure 2—source data 1. Position-specific amino acid enrichment matrices from the tyrosine kinase X5-Y-X5 library screens.
Matrices calculated with and without inclusion of multi-tyrosine sequences are provided.

Figure 2.

Figure 2—figure supplement 1. Heatmaps and logos depicting the specificities of c-Abl, Fer, EPHB1, and EPHB2.

Figure 2—figure supplement 1.

Only peptides with one central tyrosine were considered in this analysis. Enrichment scores were log2-transformed and are displayed on a color scale from blue (disfavored sequence features, negative value), to white (neutral sequence features, near zero value), to red (favored sequence features, positive value). The same values were used to plot the heatmaps and the sequence logos. The height for the central ‘Y’ in the sequence logos is an arbitrary value, chosen for optimal visualization of other features. Values are the average of three replicates.
Figure 2—figure supplement 2. Phosphorylation kinetics of five consensus peptides against five kinases.

Figure 2—figure supplement 2.

Initial rates measured for each kinase were normalized to the rate of the corresponding consensus peptide. All peptides were used at a concentration of 20 μM, and the kinases were used at a concentration of 10–50 nM. Error bars represent the standard deviation from at least three measurements.

Degenerate library screens can be used to design highly-efficient peptide substrates

Specificity profiling methods are often used to design consensus sequences that serve as optimal peptide substrates for biochemical assays and biosensor design (Deng et al., 2014; Lin et al., 2019; Songyang et al., 1995). We wanted to assess whether our method could also be used to generate high-efficiency substrates, and whether these would differ from sequences identified using oriented peptide libraries. To test this, we combined the most favorable amino acids in each position flanking the central tyrosine residue in our specificity profiles, excluding tyrosine, to generate unique consensus peptide substrates for c-Src, c-Abl, Fer, EPHB1, and EPHB2. Consensus sequences for c-Src and c-Abl have been identified previously using oriented peptide libraries (Deng et al., 2014; Songyang et al., 1995). These sequences, often referred to as SrcTide and AblTide, are different than our consensus sequences at a few residues surrounding the phospho-acceptor tyrosine (Figure 2B). The SrcTide and AblTide peptides are canonically embedded within a conserved peptide scaffold containing N-terminal Gly and C-terminal (Lys)3-Gly flanks. For direct comparison, we embedded our consensus peptides in the same scaffold and conducted a series of kinetic studies.

First, we used an in vitro continuous fluorimetric assay to compare the steady-state kinetic parameters (kcat and KM) for our c-Src and c-Abl consensus peptides with the SrcTide and AblTide peptides. The Michaelis-Menten parameters for our Src Consensus peptide were on par with one of the previously reported SrcTide substrates (SrcTide 1995, Songyang et al., 1995), but the KM value for a more recently reported SrcTide variant (SrcTide 2014, Deng et al., 2014) was substantially tighter (Table 1). Our Src Consensus peptide had a higher maximal catalytic rate (kcat) but a lower apparent binding affinity (KM) when compared to both SrcTides. We were surprised to see that our Src Consensus peptide had a+1 Asp residue, as opposed to the +1 Gly residue in both SrcTides. Substitution of the +1 Asp for a Gly in a related peptide marginally improved the KM value but reduced kcat (Table 1). These results indicate that our c-Src specificity screens may select for peptides with a high kcat, and that there is a trade-off between kcat and KM for c-Src substrate recognition. For c-Abl, our consensus peptide had both a higher maximal rate (kcat) and tighter apparent affinity (KM) relative to the previously reported AblTide peptide (Table 1). Collectively, these experiments suggest that different methods may be biased toward slightly different realms of sequence space, and that there are multiple solutions to achieving high-efficiency phosphorylation.

Table 1. Michaelis-Menten parameters for consensus peptides against c-Src and c-Abl kinase domains.

All measurements were carried out using the ADP-Quest assay in three to five replicates. Errors represent the standard error in global fits of all replicates to the Michaelis-Menten equation.

Entry Kinase Peptide name Peptide sequence kcat (s–1) KM (μM)
1 c-Src Src Consensus GPDECIYDMFPFKKKG 4.9±0.4 196±38
2 c-Src Src Consensus (P-5C, D+1 G) GCDECIYGMFPFKKKG 4.4±0.2 97±10
3 c-Src SrcTide (1995) GAEEEIYGEFEAKKKG 3.1±0.2 64±10
4 c-Src SrcTide (2014) GAEEEIYGIFGAKKKG 1.8±0.1 7±3
5 c-Src Fer Consensus GPDEPIYEWWWIKKKG 0.4±0.1 8±4
6 c-Src Abl Consensus GPDEPIYAVPPIKKKG 2.0±0.2 159±31
7 c-Abl Abl Consensus GPDEPIYAVPPIKKKG 3.0±0.2 6±2
8 c-Abl AblTide (2014) GAPEVIYATPGAKKKG 2.5±0.2 35±8

Next, we assayed all of the consensus peptides generated using our approach against their cognate kinases, as well as the other kinases in our screens. For the non-receptor tyrosine kinases (c-Src, c-Abl, and Fer), the corresponding consensus peptides were the best substrates tested. At a higher substrate concentration (100 μM), c-Abl also efficiently phosphorylated the Src and EPHB2 consensus peptides (Figure 2C), but selectivity for the Abl Consensus improved at a lower concentration (20 μM), consistent with selectivity being driven by KM for this set of peptides (Figure 2—figure supplement 2 and Table 1). By contrast, c-Src was selective for the Src Consensus peptide at high concentrations (Figure 2C), but showed significant off-target activity toward the Fer Consensus at low concentrations (Figure 2—figure supplement 2). Michaelis-Menten analysis of the Fer Consensus with c-Src revealed that it has a remarkably tight KM for c-Src, with a low kcat as a trade-off (Table 1). Finally, we observed that the receptor tyrosine kinase EPHB1 showed very little selectivity across the consensus peptides and did not prefer its own cognate consensus sequence (Figure 2C and Figure 2—figure supplement 2). EPHB2, on the other hand, efficiently phosphorylated its own consensus peptide, as well as the Abl Consensus. Both of these sequences contain a –1 Ile and +3 Pro (Figure 2B and C). These experiments demonstrate the applicability of our bacterial peptide display method to the design of high-activity substrates. Our results also suggest that not all consensus peptides will be selective for their given kinase, as there can be overlap in substrate specificities.

Data from X5-Y-X5 library screens can be used to predict the relative phosphorylation rates of peptides

Given that data from the X5-Y-X5 library screens could yield high-efficiency substrates, we investigated whether the same data could be used to quantitatively predict the relative phosphorylation rates of biologically interesting sequences. If so, this would be a potentially powerful tool for the identification of native substrates and the dissection of phosphotyrosine signaling pathways. Indeed, oriented peptide libraries have been applied extensively to predict the native substrates of protein kinases (Johnson et al., 2023; Miller et al., 2008; Obenauer et al., 2003). We are particularly interested in using high-throughput specificity screens to predict how mutations proximal to phosphorylation sites affect tyrosine kinase selectivity. The PhosphoSitePlus database documents thousands of missense mutations within five residues of tyrosine phosphorylation sites, many of which are associated with human diseases or are human polymorphisms, but the functional consequences of most of these mutations are unexplored (Hornbeck et al., 2019; Hornbeck et al., 2015; Krassowski et al., 2018; Landrum et al., 2018).

We used the c-Src X5-Y-X5 screening data to predict the relative phosphorylation rates of six peptide pairs, corresponding to reference and variant sequences derived from human phosphorylation sites. Each peptide sequence was scored using an approach that is similar to that used for oriented peptide libraries in the Scansite database (Obenauer et al., 2003; Yaffe et al., 2001). For each peptide sequence, we summed the log2-transformed enrichment values for the appropriate amino acid at each position in the peptide (the numerical values that make up the heatmaps in Figures 1B and 2A). This sum was divided by the number of variable positions (10 positions for all peptides in this study), then normalized to be on a scale from 0 (the worst possible sequence) to 1 (the best possible sequence). We compared the predicted scores to in vitro phosphorylation rates measured using a highly-sensitive assay based on reverse-phase high-performance liquid chromatography (RP-HPLC) (Figure 3 and Figure 3—figure supplement 1). We found that our predictions, which were derived from log-transformed enrichment scores, correlated moderately well with the log-transformed rates of phosphorylation by c-Src (Figure 3A). The predictions could differentiate high, medium, and low activity substrates but could not accurately rank peptides within these clusters. Focusing specifically on the effects of the mutations in this set of peptides, we found that the X5-Y-X5 screening data could accurately predict the directionality of the effects of five out of six mutations (Figure 3B).

Figure 3. Predicting relative phosphorylation rates using data from X5-Y-X5 library screens.

(A) Correlation between measured phosphorylation rates and X5-Y-X5 predictions for 12 peptides with c-Src. All peptides were used at a concentration of 100 μM, and c-Src was used at a concentration of 500 nM. Error bars represent the standard deviation from at least three rate measurements and three separate scores with individual replicates of the X5-Y-X5 screen. (B) Correlation between the magnitude of mutational effects for 6 peptide pairs with mutational effects predicted from X5-Y-X5 library screens. Error bars represent the standard deviation of at least three rate measurements and three separate scores with individual replicates of the X5-Y-X5 screen.

Figure 3.

Figure 3—figure supplement 1. Assay to measure peptide phosphorylation rates using reverse-phase HPLC.

Figure 3—figure supplement 1.

Phosphorylation of the CDK5_Y15 peptide (100 μM) by c-Src (500 nM), monitored by RP-HPLC of selected time points. The HPLC chromatogram shows the formation of a phosphorylated species over time, with concomitant loss of the unphosphorylated peptide (left). The area under the two peaks in the chromatogram were quantified and plotted for each time point (right). Initial phosphorylation rates were extracted by fitting a line to the first few timepoints.

One drawback to the aforementioned scoring approach, like all models based on position-specific scoring matrices, is that it cannot capture context-dependent amino acid preferences. We recently explored a machine-learning approach, using screening data from a related degenerate library, to model c-Src kinase specificity (Rube et al., 2022). The model not only incorporated pairwise inter-residue dependencies, but also data from multiple time points. This approach could reasonably predict absolute rate constants, as well as the directionality and magnitude of several phosphosite-proximal mutational effects. As an alternative to building models based on random library screens, we reasoned that direct measurements of reference and variant peptides using our screening platform might also provide reliable assessment of mutational effects.

A proteome-derived peptide library accurately measures sequence specificity and phosphorylation rates

To refine our assessment of phosphosite-proximal mutational effects, we designed a library, derived from the PhosphoSitePlus database, that is composed of 11-residue sequences spanning 3159 human phosphosites and 4760 disease-associated variants of these phosphosites bearing a single amino acid substitution (pTyr-Var library; Figure 4—figure supplement 1; Hornbeck et al., 2019). While the majority of sequences in this library contained a single tyrosine residue, some sequences contained multiple tyrosines, for which we included additional variants where the non-central tyrosine residues were mutated to phenylalanine. Including these tyrosine mutants and additional control sequences, such as previously designed consensus substrates, the library totaled ~10,000 unique sequences. As with the X5-Y-X5 library, we generated two versions of this library, bearing a C-terminal strep-tag or myc-tag. We conducted specificity screens with the myc-tagged pTyr-Var library against 7 non-receptor tyrosine kinases (c-Src, Fyn, Hck, c-Abl, Fer, Jak2, and AncSZ, an engineered homolog of Syk and ZAP-70 Hobbs et al., 2022) and 5 receptor tyrosine kinases (EPHB1, EPHB2, FGFR1, FGFR3, and MERTK). The majority of these kinases could be expressed in bacteria and purified in good yield (Albanese et al., 2018; Hobbs et al., 2022). One of these kinases (Jak2) was purchased from a commercial vendor.

Using the catalytically active tyrosine kinase constructs, we identified an optimal concentration (typically between 0.1–1.5 μM) to ensure 20–30% of maximal phosphorylation in three minutes. For some kinases (FGFR1, FGFR3, and MERTK), pre-incubation with ATP was required in order to activate the kinase by auto-phosphorylation (Figure 4—figure supplement 2). We conducted the screens analogously to those with the X5-Y-X5 library, but rather than calculate position-specific residue preferences from the deep sequencing data, we directly calculated enrichment scores for each peptide in the pTyr-Var library (Figure 4A and Figure 4—source data 1). Three to five replicates of the pTyr-Var screen were conducted with each kinase, and the results were reproducible across replicates (Figure 4—figure supplement 3). To validate our pTyr-Var screens, we examined enrichment scores from the c-Src experiments for the same six peptide pairs for which predictions using X5-Y-X5 screening data were only moderately accurate. We found a strong correlation between the pTyr-Var enrichment scores and phosphorylation rates, particularly for high-activity sequences (Figure 4B). Furthermore, the effects of mutations in the screens were consistent with those observed using the in vitro RP-HPLC assay with purified peptides (Figure 4C).

Figure 4. Specificity profiling of tyrosine kinases using the pTyr-Var library.

(A) Distribution of enrichment scores from pTyr-Var screens with 13 tyrosine kinases. Each point represents a peptide sequence in the pTyr-Var library. Data points in orange-red represent sequences without a Tyr residue and data points in dark gray represent sequences with a Tyr residue. Each dataset represents the average of three to five replicates. (B) Correlation between enrichment scores and measured phosphorylation rates for 12 peptides (100 μM) with c-Src (500 nM). (C) Correlation between the magnitude of mutational effects for 6 peptide pairs in the pTyr-Var library with mutational effects measured using an in vitro kinetic assay. Error bars in panels B and C represent the standard deviation from 3 to 4 rate measurements and four pTyr-Var screens. (D) Matrix of Pearson’s correlation coefficients for all pairwise comparisons between replicate-averaged pTyr-Var datasets for 13 kinases. (E) Volcano plot depicting mutational effects in the pTyr-Var screen with c-Src kinase domain. Data points represent the average of four replicates. Hits are colored orange-red. (F) Percent phosphorylation of SHP2 wild-type, D61V, and D61N (10 μM) after an hour incubation with c-Src, Fyn, and FGFR1 (1 μM). Error bars represent the standard deviation from 2 to 3 measurements.

Figure 4—source data 1. Enrichment scores from tyrosine kinase pTyr-Var screens.
Data are provided in a flat sheet with average and standard deviation values for all kinase-substrate pairs. Data are also provided for each kinase as a side-by-side comparison of enrichment scores reference and variant sequences and whether the mutation was considered a significant in our analysis. Three sheets are provided listing substrates for c-Src, Fyn, and c-Abl that are also found in a curated list of kinase-substrate pairs in the PhosphositePlus database.
Figure 4—source data 2. Position-specific amino acid enrichment matrices from the tyrosine kinase pTyr-Var library screens for sequences containing a single central tyrosine residue.

Figure 4.

Figure 4—figure supplement 1. Properties of the pTyr-Var library.

Figure 4—figure supplement 1.

(A) Frequency of sequences in the library with different numbers of tyrosine residues. (B) Positions of mutations across the library relative to the central tyrosine (zero-position). (C) Frequency of substitutions associated with each phosphosite. (D) Abundance of each possible amino acid substitution across the library.
Figure 4—figure supplement 2. Pre-activation of FGFR1, FGFR3, and MERTK by auto-phosphorylation.

Figure 4—figure supplement 2.

Kinases (25 μM) were incubated with ATP (5 mM) in a magnesium-containing neutral pH buffer for 0.5–2 hr, then desalted and concentrated to remove excess ATP. Proteins were analyzed by electrospray-ionization mass spectrometry. The envelope of multiply-charged states was deconvoluted using the instrument software, and the deconvoluted spectra are shown. The number of phosphorylation events on each kinase is labeled.
Figure 4—figure supplement 3. Matrix of Pearson’s correlation coefficients for all replicates of pTyr-Var screens across all 12 kinases.

Figure 4—figure supplement 3.

Figure 4—figure supplement 4. Assessment of the extent of enrichment in pTyr-Var screens with 12 kinases.

Figure 4—figure supplement 4.

These graphs assess what fraction of the sequences containing no Tyr residue (out of 370 sequences) and what fraction of the sequences containing 1 Tyr residue (out of 7468 sequences) have an enrichment score above the cutoff value indicated on the x-axis.
Figure 4—figure supplement 5. Heatmaps depicting the position-specific amino acid preferences for 12 tyrosine kinase domains, extracted from screens with the pTyr-Var library.

Figure 4—figure supplement 5.

Only sequences with a single central tyrosine were considered in this analysis. Position-specific amino acid enrichment scores were calculated by taking the average log2-transformed enrichment of every sequence with that particular feature. Values are displayed on a color scale from blue (disfavored sequence features, negative value), to white (neutral sequence features, near zero value), to red (favored sequence features, positive value). Values in the heatmaps are the average of three to five replicates.
Figure 4—figure supplement 6. Volcano plots depicting mutational effects in the pTyr-Var screen for 12 kinase domains.

Figure 4—figure supplement 6.

Datasets are the average of three to five replicates. Significant hits are colored in orange-red.
Figure 4—figure supplement 7. Number of significant mutations for each kinase at each position surrounding the central tyrosine residue.

Figure 4—figure supplement 7.

Mutations that added or removed a tyrosine residue were excluded from these counts.
Figure 4—figure supplement 8. Enrichment scores from pTyr-Var screens for phosphorylation of the RET Tyr 981 reference and variant (R982C) peptides by 12 tyrosine kinases.

Figure 4—figure supplement 8.

Error bars represent the standard deviations from three to five replicates.
Figure 4—figure supplement 9. Enrichment scores from pTyr-Var screens for phosphorylation of SHP2 Y62 reference and variant (D61N and D61V) peptides by c-Src, Fyn, and FGFR1.

Figure 4—figure supplement 9.

Error bars represent the standard deviations from three to five replicates.

A total of 370 peptides in the pTyr-Var library contain no tyrosine residues and thus serve as controls to determine background noise in our screens. For every kinase tested, the tyrosine-free sequences showed distinctly low enrichment scores, consistent with signal in these screens being driven by tyrosine phosphorylation of the surface-displayed peptides (Figure 4A). For each kinase, a subset of the library (between 7% and 10%) showed enrichment scores above this background level (Figure 4—figure supplement 4). To confirm that the pTyr-Var screens were reporting on unique substrate specificities across these tyrosine kinases, we calculated Pearson’s correlation coefficients for the average datasets of each kinase pair and visualized position-specific amino acid preferences as heatmaps (Figure 4D, Figure 4—figure supplement 5, and Figure 4—source data 2). We found strong correlation in specificity between kinases of the same family (the Src-family kinases c-Src/Fyn/Hck, and receptor pairs EPHB1/EPHB2 and FGFR1/FGFR3). We also observed that the specificity of Src-family kinases partly overlapped with the Ephrin receptors and MERTK. The specificity of AncSZ and Jak2 correlated with that of FGFRs.

Next, we compared the results of our pTyr-Var library screens with a curated list of kinase-substrate pairs found in the PhosphositePlus database (Hornbeck et al., 2019). For c-Src, Fyn, and c-Abl, out of the sequences that overlapped between our library and the curated list, 30–40% of the kinase-substrate pairs showed efficient phosphorylation in the peptide-display screen (Figure 4—source data 1). This is consistent with a previous study using bacterial display and a different proteome-derived peptide library (Shah et al., 2018). The modest overlap between peptide screens and literature-reported kinase-substrate pairs is not surprising, given that other mechanisms in kinase-substrate recognition, such as localization, may override kinase domain sequence preferences (Miller and Turk, 2018). Furthermore, the curated list of kinase-substrate pairs comes from both in vitro and in vivo studies and may not accurately represent bona-fide substrates for each kinase.

Natural variants of tyrosine phosphorylation sites impact kinase recognition

For pairs of peptides in the pTyr-Var library that correspond to a disease-associated variant and a reference sequence, we calculated the log2-fold change in enrichment for the variant relative to the reference. The large number of replicates for each screen afforded a robust analysis of phosphosite-proximal mutational effects for each kinase. We filtered the results in five steps to identify significant mutations: (1) We omitted phosphosite pairs where there was no statistically significant difference in enrichment between the variant and reference (p-value cutoff of 0.05). (2) We then applied a second filtering step to remove phosphosite pairs where the fold-change in enrichment between the variant and reference sequence was less than two. (3) Next, we excluded pairs where both sequences were low-activity substrates (enrichment score less than 1.5). (4) We removed mutations that added or removed a tyrosine residue, as their interpretation is ambiguous in our assay. (5) Lastly, we excluded phosphosite pairs in which the average read count of either the variant or wild-type sequence was less than 50. This left us with unique set of 50–400 high-confidence candidates for each tyrosine kinase (Figure 4E, Figure 4—figure supplement 6, and Figure 4—source data 1). From this filtered list, we found that kinases showed distinct patterns of mutational sensitivity at each position around the central tyrosine, consistent with their distinct sequence preferences (Figure 4—figure supplement 7).

For c-Src, we identified 381 high-confidence mutations (Figure 4E). A number of these mutations were on proteins involved in neurotrophin-regulated signaling, cyclin-dependent serine/threonine kinase activity, and other receptor/non-receptor tyrosine kinase activity. We found notable mutational effects at a known target of c-Src, Tyr 149 of the tumor suppressor protein FHL1 (Wang et al., 2018), as well as on other proteins known to interact with c-Src, such as the lipid and protein phosphatase PTEN and the immune receptor LILRB4 (Kang et al., 2016; Lu et al., 2003). We were particularly interested in cases where a kinase not previously known to phosphorylate a specific phosphosite showed a dramatic gain-of-function upon phosphosite-proximal mutation. For example, we found that the R982C mutation, proximal to Tyr 981 on the receptor tyrosine kinase RET, significantly enhanced phosphorylation by c-Src (Figure 4—figure supplement 8). This phosphosite is a known to engage the SH2 domain of c-Src and facilitate c-Src activation upon recruitment to RET, but it is not considered a kinase substrate of c-Src (Encinas et al., 2004). This mutation could potentially rewire signaling by promoting phosphorylation of RET by c-Src, and in doing so, sustaining c-Src activation by its binding to phospho-RET. The RET R982C mutation also enhanced Tyr 981 phosphorylation by several other kinases, most notably Fer (Figure 4—figure supplement 8). These examples show how the pTyr-Var data could be used as a resource to guide mutation-focused signaling studies.

To further validate our approach, we examined the effects of phosphosite-proximal mutations on the phosphorylation of an intact protein, rather than a peptide. Tyr 62 in the tyrosine phosphatase SHP2 sits within a region of this protein that is frequently mutated in various human diseases (Tartaglia et al., 2006), and this residue is highly phosphorylated in receptor tyrosine kinase-driven cancers (Gillette et al., 2020; Pfeiffer et al., 2022). Several Tyr 62-proximal mutations are encoded in the pTyr-Var library. In our screens, the reference peptide for Tyr 62 was preferentially phosphorylated by receptor tyrosine kinases, such as FGFR1, over non-receptors such as c-Src and Fyn, and nearby mutations showed varied effects on Tyr 62 phosphorylation, depending on the kinase tested (Figure 4—figure supplement 9). For example, D61V enhanced and D61N attenuated phosphorylation by Src-family kinases, but these mutations had little impact on recognition by FGFR1. To assess whether the effects of D61 mutations in the screens were retained in the context of the intact protein, we monitored phosphorylation of wild-type, D61V, and D61N SHP2 by c-Src, Fyn, and FGFR1 using intact protein mass spectrometry. We made two modifications to SHP2 to facilitate measurements: (1) substitution of the catalytic residue (C459E) to prevent dephosphorylation by the SHP2 phosphatase domain and (2) deletion of the disordered C-terminal tail to avoid background phosphorylation of an accessible site. Our measurements recapitulated the relative phosphorylation efficiencies for the Tyr 62 reference peptides, with Fyn being the slowest, and FGFR1 being the fastest (Figure 4F and Figure 4—figure supplement 9). Both D61V and D61N dramatically enhanced phosphorylation by all three kinases, consistent with reports that mutations at this site dramatically alter SHP2 structure and probably also increase Tyr 62 accessibility (Keilhack et al., 2005). For c-Src and Fyn, but not FGFR1, D61V showed a stronger enhancement of phosphorylation than D61N, consistent with our peptide screens (Figure 4F and Figure 4—figure supplement 9). The effects of these mutations in SHP2 on signal rewiring in cells warrants further investigation.

Position-specific amino acid preferences for tyrosine kinases are context-dependent

As noted earlier, position-specific scoring matrices do not reflect context-dependent sequence preferences. To illustrate this further, we scored peptide sequences in the pTyr-Var library using the position-specific scoring matrices generated from the X5-Y-X5 library. For peptides that showed significant enrichment in the pTyr-Var screens (enrichment >1), there was a modest correlation with the scores predicted using the X5-Y-X5 library, with many outliers (Figure 5A and Figure 5—figure supplement 1). We selected peptides for c-Src and c-Abl that were high-activity sequences based on the pTyr-Var screens (enrichment >4) but deviated significantly from canonical recognition motifs, and therefore were low scoring (score <0.5). The peptides selected for c-Src had unfavorable residues downstream of the central tyrosine (+1 Arg and +3 Gly for MISP_Y95;+1 Asn,+2 Arg, and +3 Glu for HLA-DPB1_Y59_F64L_YF). For c-Abl, the peptides had an unfavorable –1 Glu and +2 Ser (SIRPA_Y496_P491L) or an unfavorable +2 Glu and +3 Gly (HGD_Y166_F169L). We measured phosphorylation rates for these peptides using our RP-HPLC assay. Phosphorylation rates for these peptides deviated from what would be expected based on a position-specific scoring matrix (Figure 5B, Figure 5—figure supplement 1, and Figure 5—source data 1). This suggests that the putatively unfavorable sequence features in these peptides were tolerated in their specific sequence contexts.

Figure 5. Context-dependent effects of tyrosine kinase recognition.

(A) Correlation of enrichment scores measured for c-Src in the pTyr-Var library screen with scores predicted from the X5-Y-X5 library using a position-specific scoring matrix. (B) Correlation between predicted scores and measured phosphorylation rates for 14 peptides (100 μM) with c-Src (500 nM). Peptides that could not be accurately scored by the X5-Y-X5 data are highlighted in orange. (C) Correlation of variant effects measured in the pTyr-Var library screen with those predicted from the X5-Y-X5 library screen for c-Src. Several points lie in the top-left and bottom-right quadrants, indicating a discrepancy between the measured mutational effect in the pTyr-Var screen and the predicted mutational effect from the X5-Y-X5 screen. (D) Effects of serine-to-proline substitution at the –2 position in various assays with c-Src. The left panels show the enrichment levels of –2 serine and proline in the X5-Y-X5 screen (top), and the effect of a –2 serine to proline substitution in a specific peptide in the pTyr-Var screen, (bottom). The right panels show rate measurements using the RP-HPLC assay for the same substitution in the Src consensus peptide (top) and the peptide from the pTyr-Var screen (bottom).

Figure 5—source data 1. Peptide sequences and their phosphorylation rates by c-Src or c-Abl, measured using the RP-HPLC kinetic assay.
Figure 5—source data 2. Mutational effects measured from the pTyr-Var library screens and their corresponding predictions based on the X5-Y-X5 library screening data.
Only those sequence pairs with high-quality sequencing data (read counts >100) and a single central tyrosine were included in the analysis.
elife-82345-fig5-data2.xlsx (144.2KB, xlsx)

Figure 5.

Figure 5—figure supplement 1. Context-dependent effects of c-Abl substrate recognition.

Figure 5—figure supplement 1.

(A) Correlation of enrichment scores measured for c-Abl in the pTyr-Var library screen with scores predicted from the X5-Y-X5 library using a position-specific scoring matrix. (B) Correlation between predicted scores and measured phosphorylation rates for 4 peptides (100 μM) with c-Abl (500 nM). Peptides that showed significant enrichment in the pTyr-Var screen but lower than expected scores from the X5-Y-X5 data are highlighted in orange.
Figure 5—figure supplement 2. Correlation of variant effects measured in the pTyr-Var library screen with those predicted from the X5-Y-X5 library screen for c-Abl, Fer, EPHB1, and EPHB2.

Figure 5—figure supplement 2.

Several points lie in the top-left and bottom-right quadrant, indicating a discrepancy between the measured mutational effect in the pTyr-Var screen and the predicted mutational effect from the X5-Y-X5 screen.

The observation that there are context-dependent sequence preferences for kinase-substrate interactions has important consequences for predicting the effects of phosphosite-proximal mutations. The same substitution could have different effects depending on the composition of the surrounding sequence. This phenomenon is uniquely visible in our screening approach, as we are measuring the phosphorylation of defined peptide sequences, and we are conducting screens with thousands of peptide pairs that vary by only a single amino acid substitution. To test our hypothesis, we assessed whether the directionality of mutational effects observed for specific peptides in the pTyr-Var screen could be predicted using the position-specific scoring matrix derived from the X5-Y-X5 screen (which would represent the effect of making a substitution averaged over all sequence contexts). While the directionality of the effect of most mutations could be predicted by the X5-Y-X5 screen, we observed many mutations that showed a significant effect where none was predicted, as well as mutations where the effect was the opposite of what was predicted (Figure 5C, Figure 5—figure supplement 2, and Figure 5—source data 2).

To validate this observation, we selected a peptide pair in the pTyr-Var library where a mutation (–2 Ser to Pro) had the opposite effect of that predicted by our X5-Y-X5 screen for c-Src (Figure 5D), as well as published results with oriented peptide libraries (Begley et al., 2015; Obenauer et al., 2003). Additionally, we made the same substitutions to the c-Src consensus peptide to determine whether the X5-Y-X5 predictions would hold true in that context. Measurements of these purified peptides by c-Src show that the same amino acid substitution had different impacts on c-Src recognition, depending on the sequence context (Figure 5D). A previous study that analyzed the specificity of the epidermal growth factor receptor (EGFR) kinase using bacterial peptide display showed that the effect of mutations at the –2 position was sometimes dependent on the identity of the –1 residue (Cantor et al., 2018). Molecular dynamics analyses in that report suggested that the amino acid identity at the –1 position determined how the side chain of the –2 residue was presented to the kinase, and vice versa, thereby dictating context-dependent preferences at both positions. Our pTyr-Var screens suggest that context dependent sequence preferences may be commonplace. Depending on the kinase, 5–15% of all significant mutations in the pTyr-Var screen had the opposite effect of that predicted using the X5-Y-X5 library data. Mapping these context-dependent effects comprehensively could have a significant impact on our ability to predict native substrates of kinases, and it will improve our understanding of the structural basis for substrate specificity.

Phosphorylation of bacterial peptide display libraries enables profiling of SH2 domains

In previous implementations of our bacterial peptide display and deep sequencing approach, the specificities of phosphotyrosine recognition domains (e.g. SH2 domains and phosphotyrosine binding (PTB) domains) were analyzed in addition to tyrosine kinase domains (Cantor et al., 2018; Lo et al., 2019). This approach required two amendments to the kinase screening protocol. First, the surface-displayed libraries were phosphorylated to saturating levels using a cocktail of tyrosine kinases. Second, because phosphotyrosine recognition domains generally have fast dissociation rates from their ligands (Morimatsu et al., 2007; Oh et al., 2012), making binding-based selection assays challenging, constructs were generated in which two identical copies of an SH2 domain were artificially fused together. The tandem-SH2 constructs enhanced avidity for phosphopeptides displayed on the cell surface through multivalent effects, thereby enabling enrichment of cells via FACS (Cantor et al., 2018).

For this study, we reasoned that a multivalent SH2 construct could be mimicked by functionalizing avidin-coated magnetic beads with biotinylated SH2 domains. These SH2-coated beads could then be used to select E. coli cells displaying enzymatically phosphorylated peptide display libraries, followed by deep sequencing to determine SH2 sequence preferences (Figure 6A). Thus, we first established a protocol to produce site-specifically biotinylated SH2 domains in E. coli, by co-expressing an Avi-tagged SH2 construct with the biotin ligase BirA (Gräslund et al., 2017). This system yielded quantitatively biotinylated SH2 domains, as confirmed by mass spectrometry (Figure 6—figure supplement 1). Since the biotinylated SH2 domains could be produced in high yields through bacterial expression, the recognition domains were immobilized on the magnetic beads at saturating concentrations to ensure a uniform concentration across experiments. This also prevented background binding of strep-tagged libraries to the beads, making this method compatible with previously reported strep-tagged libraries (Cantor et al., 2018; Shah et al., 2018; Shah et al., 2016).

Figure 6. High-throughput profiling of SH2 domain ligand specificity using bacterial peptide display.

(A) Schematic representation of the workflow for SH2 domain specificity profiling. (B) Heatmaps depicting the specificities of the c-Src, SHP2-C, and Grb2 SH2 domains, measured using the X5-Y-X5 library. Enrichment scores were log2-transformed and are displayed on a color scale from blue (disfavored), to white (neutral), to red (favored). Values in the heatmaps are the average of three replicates. (C) Distribution of enrichment scores from pTyr-Var screens with three SH2 domains and the pan-phosphotyrosine antibody 4G10 Platinum. Each point represents a peptide sequence in the library. The antibody selection was done similar to the kinase screens, with antibody labeling of cells, followed by bead-based enrichment, as opposed to cell enrichment with antibody-saturated beads. Each dataset represents the average of three replicates. (D) Correlation between enrichment scores for 9 peptides from the pTyr-Var screen and binding affinities measured using a fluorescence polarization assay. Error bars represent the standard deviations from three screens or binding measurements. (E) Examples of phosphosite-proximal mutations that selectively enhance binding to specific SH2 domains. Error bars represent the standard deviations from three screens.

Figure 6—source data 1. Position-specific amino acid enrichment matrices from the SH2 domain X5-Y-X5 library screens.
Matrices calculated with and without inclusion of multi-tyrosine sequences are provided.
Figure 6—source data 2. Enrichment scores from SH2 domain pTyr-Var screens.
Data are provided in a flat sheet with average and standard deviation values for all SH2-ligand pairs. Data are also provided for each SH2 domain as a side-by-side comparison of enrichment scores reference and variant sequences and whether the mutation was considered a significant in our analysis.
Figure 6—source data 3. Position-specific amino acid enrichment matrices from the SH2 domain pTyr-Var library screens for sequences containing a single central tyrosine residue.

Figure 6.

Figure 6—figure supplement 1. Mass spectrometry analysis of biotinylated SH2 domains.

Figure 6—figure supplement 1.

Proteins were analyzed by electrospray-ionization mass spectrometry. The envelope of multiply-charged states was deconvoluted using the instrument software, and the deconvoluted spectra are shown.
Figure 6—figure supplement 2. Flow cytometry analysis of library phosphorylation by a cocktail of tyrosine kinases.

Figure 6—figure supplement 2.

Cells displaying the X5-Y-X5 library were treated with a kinase cocktail containing c-Src, c-Abl, AncSZ, and EPHB1 for 3 hr, then labeled with a pan-phosphotyrosine antibody (PY20 PerCP-eFluor 710) and analyzed by flow cytometry.
Figure 6—figure supplement 3. Heatmaps and logos depicting the specificities of the c-Src, SHP2-C, and Grb2 SH2 domains, measured using the X5-Y-X5 library.

Figure 6—figure supplement 3.

Only peptides with one central tyrosine were considered in this analysis. Enrichment scores were log2-transformed and are displayed on a color scale from blue (disfavored sequence features, negative value), to white (neutral sequence features, near zero value), to red (favored sequence features, positive value). The same values were used to plot the heatmaps and the sequence logos. The height for the central ‘Y’ in the sequence logos is an arbitrary value, chosen for optimal visualization of other features. Values are the average of three replicates.
Figure 6—figure supplement 4. Matrix of Pearson’s correlation coefficients for all replicates of pTyr-Var screens across all 3 SH2 domains and 4G10 platinum.

Figure 6—figure supplement 4.

Figure 6—figure supplement 5. Volcano plots depicting mutational effects in the pTyr-Var screen for 3 SH2 domains.

Figure 6—figure supplement 5.

Datasets are the average of three replicates. Hits are colored in orange-red.
Figure 6—figure supplement 6. Number of significant mutations for each SH2 domain at each position surrounding the central phosphotyrosine residue.

Figure 6—figure supplement 6.

Mutations that added or removed a tyrosine residue are excluded from these counts.
Figure 6—figure supplement 7. Comparison of the pTyr-Var screens for the c-Src kinase and SH2 domains.

Figure 6—figure supplement 7.

Kinase domain data are the average of four replicates, and SH2 data are the average of three replicates.
Figure 6—figure supplement 8. Divergent effects of phosphosite-proximal mutations on c-Src kinase and SH2 domain recognition.

Figure 6—figure supplement 8.

The graph on the left shows the effects of mutations that were significant for the kinase domain (gray) or the SH2 domain (orange-red). The graph on the right shows examples of phosphosite-proximal mutations selectively impact the kinase or SH2 domain of c-Src. Error bars for the kinase and SH2 domain indicate the standard deviations from four and three replicates, respectively.

To implement SH2 specificity screens, the strep-tagged X5-Y-X5 library was phosphorylated to a high level using a mixture of c-Src, c-Abl, AncSZ, and EPHB1 (Figure 6—figure supplement 2). The phosphorylated library was screened against three SH2 domains that fall into distinct specificity classes and are derived from three different types of signaling proteins: the SH2 domain from the tyrosine kinase c-Src, the C-terminal SH2 (C-SH2) domain from the tyrosine phosphatase SHP2, and the SH2 domain from the non-catalytic adaptor protein Grb2 (Figure 6B, Figure 6—figure supplement 3, and Figure 6—source data 1). The X5-Y-X5 library screens recapitulated known sequences preferences for each SH2 domain. For c-Src, there was a distinctive preference for –2 His,+1 Asp/Glu, and +3 Ile, as previously reported from oriented peptide libraries (Huang et al., 2008). For Grb2, a characteristic +2 Asn preference dominated the specificity profile (Gram et al., 1997; Huang et al., 2008; Kessels et al., 2002; Rahuel et al., 1996; Songyang et al., 1994). Notably, our Grb2 screen also reveals subtle amino acid preferences at other positions, which could tune the affinity for +2 Asn-containing sequences. Several studies have measured the sequence specificity of the SHP2 C-SH2 domain using diverse methods, including peptide microarrays, oriented peptide libraries, and one-bead-on-peptide libraries (Huang et al., 2008; Miller et al., 2008; Sweeney et al., 2005; Tinti et al., 2013). The results of these reported screens are not concordant. Our method indicates a preference for β-branched amino acids (Thr/Val/Ile) at the –2 position, a small residue (Ala/Ser/Thr) at the +1 position, and strong preference for an aliphatic residue (Ile/Val/Leu) at the +3 position. Our results are most in-line with the one-bead-one-peptide screens (Sweeney et al., 2005).

We next phosphorylated and screened the pTyr-Var library against the same three SH2 domains in triplicate (Figure 6C, Figure 6—source data 2, and Figure 6—source data 3). The replicates for each SH2 domain were highly correlated, but datasets between SH2 domains had poor correlation, suggesting distinct ligand specificities (Figure 6—figure supplement 4). As observed for kinases, we saw negligible enrichment of peptides lacking a tyrosine residue, but each SH2 domain showed strong enrichment of a few hundred peptides containing one or more tyrosines (Figure 6C). With the phosphorylated pTyr-Var library, we also carried out selection with a biotinylated pan-phosphotyrosine antibody to assess the level of bias in phosphorylation across the library. Compared to selection with SH2 domains, selection with the antibody yielded a narrower distribution of enrichment scores, with very few highly enriched sequences, suggesting relatively uniform phosphorylation (Figure 6C). We further validated the SH2 screening method by measuring the binding affinities of 9 peptides from the pTyr-Var library with the c-Src SH2 domain, using a fluorescence polarization binding assay. Enrichment scores from the pTyr-Var screen showed a good linear correlation with measured Kd values over two orders of magnitude (Figure 6D).

The pTyr-Var library screens with the SH2 domains were analyzed and filtered similarly to those with kinase domains. For each SH2 domain, we identified 50–300 phosphosite-proximal mutations that significantly and reproducibly enhanced or attenuated binding (Figure 6—figure supplement 5 and Figure 6—source data 2). As expected, given their distinct specificities, the c-Src, SHP2-C, and Grb2 SH2 domains showed unique sensitivities to mutations (Figure 6—figure supplement 6). We identified several phosphosite-proximal mutations that were selectively gain-of-function for one or two SH2 domains (Figure 6E and Figure 6—source data 2). These mutations could drive the rewiring of signaling pathways by changing which downstream effector engages a phosphosite. This phenomenon was recently reported for lung-cancer associated mutations near phosphorylation sites in EGFR, which impacted the recruitment of Grb2 and SHP2 to the receptor and altered downstream signaling (Lundby et al., 2019).

Finally, we note that our pTyr-Var datasets included screens with both the kinase and SH2 domains of c-Src. When the SH2 domain of c-Src interacts with phosphoproteins, it both localizes the kinase domain in proximity to its substrates and activates the enzyme (Liu et al., 1993). Our screens revealed that the phosphorylation profiles of c-Src kinase and SH2 domains against the pTyr-Var were completely orthogonal (Figure 6—figure supplement 7). Their starkly different activities toward the pTyr-Var library can largely be attributed to kinase domain preference for a+3 Phe and SH2 domain preference for a+3 Ile/Val/Leu/Met. This is in contrast to previous observations for c-Abl, which has kinase and SH2 domains with largely overlapping sequence specificities, dominated by a+3 Pro preference (Songyang et al., 1995). For c-Src, phosphosite mutations that impacted recognition by one domain generally had no effect on the other, because preferred sequence features for one domain were typically tolerated (neutral) for the other (Figure 6—figure supplement 8). A consequence of this is that phosphosite-proximal mutations may alter c-Src function in two mechanistically distinct ways: (1) mutations that enhance SH2 binding can alter the localization and local activation of c-Src or (2) mutations that enhance kinase recognition will directly increase phosphorylation rates by c-Src. These insights highlight value in profiling multiple domains of the same signaling protein against the same peptide library.

Amber codon suppression yields an expanded repertoire of peptides for specificity profiling

The specificity profiling screens described thus far were constrained to sequences that contain the canonical twenty amino acids. Several studies have suggested that non-canonical amino acids and post-translationally modified amino acids can also impact sequence recognition by kinases and SH2 domains (Alfaro-Lopez et al., 1998; Begley et al., 2015; Chapelat et al., 2012; Johnson et al., 2023; Yeh et al., 2001). The most notable example of this is phospho-priming, whereby phosphorylation of one residue on a protein enhances the ability of a kinase to recognize and phosphorylate a proximal residue. This phenomenon was recently described for EGFR, which preferentially phosphorylates sequences containing a tyrosine followed by a+1 phosphotyrosine (Begley et al., 2015). Other prevalent post-translational modifications, such as lysine acetylation, may also impact the ability of kinases or SH2 domains to recognize a particular phosphosite (Parker et al., 2014; Rust and Thompson, 2011).

We sought to expand our specificity profiling method to incorporate non-canonical and post-translationally modified amino acids (Figure 7A). Since our libraries are genetically encoded, we employed Amber codon suppression and repurposing, using engineered tRNA molecules and aminoacyl tRNA synthetases (Amiram et al., 2015; Xie et al., 2007; Zheng et al., 2018). The degenerate (X) positions in our X5-Y-X5 library are encoded using an NNS codon, which means that an Amber codon (TAG) is sampled at each position 3% of the time. Thus, this library theoretically contains a sufficiently large number of diverse sequences to profile specificity with a 21 amino acid alphabet. For Amber suppression in E. coli, tRNA/synthetase pairs are commonly expressed from pEVOL or pULTRA plasmids (chloramphenicol and streptomycin resistant, respectively) (Chatterjee et al., 2013). Both of these systems are incompatible with our surface-display platform, which uses MC1061 cells (streptomycin resistance encoded in the genome) and libraries in a pBAD33 vector (chloramphenicol resistant). Thus, we designed a variant of the pULTRA plasmid in which we swapped the streptomycin resistance gene for an ampicillin resistance gene from a common pET vector for protein expression (pULTRA-Amp).

Figure 7. Expansion of peptide display libraries using Amber suppression.

(A) Non-canonical amino acids used in this study. CMF = 4-carboxymethyl phenylalanine, AzF = 4-azido phenylalanine, and AcK = N-ε-acetyl-L-lysine. (B) Amber suppression in the strep-tagged X5-Y-X5 library using CMF. Library surface-display level was monitored by flow cytometry using a fluorophore-labeled StrepMAB antibody for samples with or without Amber suppression components. (C) AzF labeling on bacterial cells using a DIBO-conjugated fluorophore. Cells expressing the X5-Y-X5 library, with and without various Amber suppression components, were treated with DIBO-conjugated Alexa Fluor 555 then analyzed by flow cytometry. (D) Heatmaps depicting the specificities of c-Src, Hck, and c-Abl after CMF or acetyl lysine incorporation. Only sequences with one stop codon were used in this analysis. Enrichment scores were log2-transformed and are displayed on a color scale from blue (disfavored), to white (neutral), to red (favored). Values in heatmaps are the average of three replicates.

Figure 7.

Figure 7—figure supplement 1. Stop codon enrichment levels in c-Src X5-Y-X5 screens using different analysis methods.

Figure 7—figure supplement 1.

Error bars represent the standard deviations from three screens.
Figure 7—figure supplement 2. Comparison of position-specific enrichments in screens with Amber suppression analyzed in two different ways.

Figure 7—figure supplement 2.

In each plot, the enrichment of specific amino acids or a stop codon, after phosphorylation by c-Src and bead-based selection, were calculated using two different methods. X-values indicate log-transformed enrichment values calculated across all sequences in the library. Y-values indicate log-transformed enrichment values only for sequences that contain exactly one Amber stop codon. The orange-red points correspond to the Amber codon enrichments at all 10 positions, which selectively fall off of the x=y diagonal line.
Figure 7—figure supplement 3. Phosphorylation kinetics of Lys- and AcK-containing consensus peptides against c-Src and c-Abl.

Figure 7—figure supplement 3.

Initial rates measured for each kinase were normalized to the rate of the corresponding cognate consensus peptide. Peptides were used at a concentration of 100 or 20 μM, and the kinases were used at a concentration of 10–50 nM. Error bars represent the standard deviation from three measurements.

To confirm that non-canonical amino acids could be incorporated into the X5-Y-X5 library, we co-transformed E. coli with the library and a pULTRA-Amp plasmid encoding a tRNA/synthetase pair that can incorporate 4-carboxymethyl phenylalanine (CMF) via Amber suppression (Figure 7A; Xie et al., 2007). We measured peptide display levels by flow cytometry for cultures that were grown with or without CMF in the media. For the cultures grown without CMF, roughly 20% of the cells had no surface-displayed peptides, consistent with termination of translation at Amber codons within the peptide-coding region (Figure 7B). In the presence of CMF, this premature termination was significantly suppressed, and a larger fraction of the cells displayed peptides. As an additional test, we incorporated 4-azido phenylalanine (AzF) into the X5-Y-X5 library (Figure 7A; Amiram et al., 2015). Cells expressing this expanded library were treated with a dibenzocyclooctyne (DIBO)-functionalized fluorophore, which should selectively react with the azide on AzF via strain-promoted azide-alkyne cycloaddition (Ning et al., 2008). Only cells expressing the synthetase and grown in the presence of AzF showed significant DIBO labeling, confirming Amber suppression and non-canonical amino acid incorporation into our library (Figure 7C).

Using this library expansion strategy, we assessed how substrate recognition by c-Src is impacted by neighboring CMF or acetyl-lysine residues. We subjected CMF- or AcK-containing X5-Y-X5 libraries to c-Src phosphorylation, selection, and sequencing, using the same methods described above. When analyzing X5-Y-X5 libraries in standard kinase and SH2 screens, we typically omit all Amber-containing sequences from our calculations, as they do not encode expressed peptides (Figure 1B and Figure 2A). For these experiments, we included Amber-containing sequences in our analysis. Using this strategy, we found that the Amber codon was less depleted at each position surrounding the central tyrosine than we observed for libraries without Amber suppression, but the log-transformed enrichment scores for Amber codons at all positions surrounding the tyrosine residue were still negative (Figure 7—figure supplement 1). We reasoned that, if Amber suppression efficiency was not 100%, any Amber-containing sequence would still be depleted relative to a sequencing lacking a stop codon, due to some premature termination. Thus, we re-analyzed the data by exclusively counting sequences that contained one Amber codon, under the assumption that every sequence would have approximately the same amount of premature termination. This revealed positive enrichment for CMF and AcK at select positions (Figure 7D and Figure 7—figure supplement 1). Although we only included a fraction of the total library in our new analysis, the overall specificity profile was almost identical to that observed when including the whole library, indicating that this sub-sampling approach was valid (Figure 7—figure supplement 2).

Next, we compared the preferences for CMF and AcK at each position to their closest canonical amino acids, phenylalanine (Phe) and lysine (Lys). CMF was enriched at the –3 and –2 positions, where Phe is not tolerated by c-Src (Figure 7D). Negatively-charged amino acids (Asp and Glu) are also preferred at these positions, and the negative charge on the carboxymethyl group of CMF at neutral pH may be able to mimic this recognition. c-Src has a strong selective preference for Phe at the +3 position, which it engages via a well-formed hydrophobic pocket near the active site (Bose et al., 2006; Shah et al., 2018). The charged carboxymethyl group on CMF is likely to be incompatible with this mode of binding, consistent with depletion of CMF at this site (Figure 7D). The difference between Lys and AcK was even more striking. Lys is unfavorable for c-Src at every position around the phospho-acceptor tyrosine. By contrast, AcK was not only tolerated, but even favorable at a few positions (Figure 7D).

To determine whether the position-specific responsiveness to lysine acetylation was kinase-dependent, we also performed additional screens of the AcK-containing X5-Y-X5 library with Hck and c-Abl. These screens showed that all three kinases had very similar position-dependent tolerance for AcK over Lys, with the closely-related c-Src and Hck being more similar to one another than their distant relative c-Abl (Figure 7D). Finally, we assessed how the effect of lysine acetylation translated to actual changes in phosphorylation rates. We produced variants of the c-Src and c-Abl consensus peptides with Lys or AcK at various positions and measured their rates of phosphorylation by their respective cognate kinases (Figure 7—figure supplement 3). Of the positions tested (−2,+1, and +5 relative to the tyrosine), we saw the largest effect at the +1 position, consistent with the screens. At the +1 position, where Lys is not tolerated, acetylation enhanced activity as much as five-to-ten-fold, depending on the peptide concentration. In the long-term, we envision using this approach to predict sites in the proteome where lysine acetylation creates new, high-activity substrates for tyrosine kinases. Furthermore, the same analysis could be applied to other tyrosine kinases and to SH2 domains, and our strategy could be readily expanded to other post-translational modifications that can be encoded using Amber suppression.

Concluding remarks

In this report, we describe a significant expansion to a previously developed method for profiling the sequence specificities of tyrosine kinases and SH2 (phosphotyrosine recognition) domains (Cantor et al., 2018; Shah et al., 2018; Shah et al., 2016). Our method relies on bacterial display of DNA-encoded peptide libraries and deep sequencing, and it enables the simultaneous analysis of multiple phosphotyrosine signaling proteins against thousands-to-millions of peptides or phosphopeptides. The resulting data can be used to design high-activity consensus sequences, predict the activities of uncharacterized sequences, and accurately measure the effects of amino acid substitutions on sequence recognition. A notable feature of our platform is that it relies on deep sequencing as a readout, yielding quantitative results. Furthermore, the data generated from our screens show a strong correlation with phosphorylation rates and binding affinities measured using orthogonal biochemical assays.

We envision a number of exciting applications of this expanded specificity profiling platform. Several recent reports have aimed to explain the molecular basis for tyrosine kinase and SH2 sequence specificity and affinity, by combining protein sequence and structure analysis with specificity profiling data (Bradley et al., 2021; Creixell et al., 2015a; Kaneko et al., 2010; Liu et al., 2019). The rich datasets generated using our platform will augment these approaches, particularly when coupled with screening data for additional proteins. A long-term goal of these efforts will undoubtedly be to accurately predict the sequence specificity and signaling properties of any uncharacterized phosphotyrosine signaling protein, such as a disease-associated kinase variant (Creixell et al., 2015b). Given the nature of the data generated by our platform, we expect that it will also aid the development and implementation of machine learning models for sequence specificity and design (Creixell et al., 2015a; Cunningham et al., 2020; Kundu et al., 2013). Indeed, our initial efforts in this realm suggest that specificity profiling data using the X5-Y-X5 library, without any protein structural information, may be sufficient to build models of sequence specificity that can accurately predict phosphorylation rates (Rube et al., 2022).

The pTyr-Var Library described in this report provides a unique opportunity to investigate variant effects across the human proteome. The vast majority of mutations near tyrosine phosphorylation sites are functionally uncharacterized (Hornbeck et al., 2019; Krassowski et al., 2018). Our screens are yielding some of the first mechanistic biochemical hypotheses about how many of these mutations could impact cell signaling. For example, these datasets will allow us to identify mutations that tune signaling pathways by altering the phosphorylation efficiency of specific phosphosites or the binding of SH2-containing effector proteins to those sites. Alternatively, these screens may help identify instances of network rewiring, in which a phosphosite-proximal mutation alters the canonical topology of a pathway by changing which kinases phosphorylate a phosphosite or which SH2-containing proteins get recruited to that site. The biological effects of signal tuning and rewiring caused by phosphosite-proximal mutations remain largely unexplored.

Our high-throughput platform to profile tyrosine kinase and SH2 sequence recognition is accessible and easy to use in labs that are equipped to culture E. coli and execute common molecular biology and biochemistry techniques. Screens can be conducted on the benchtop with proteins produced in-house or obtained from commercial vendors. Peptide libraries of virtually any composition, tailored to address specific biochemical questions, can be produced using commercially available oligonucleotides and standard molecular cloning techniques. Furthermore, facile chemical changes to the library (e.g. enzymatic phosphorylation or the introduction of non-canonical amino acids via Amber suppression) afford access to new biochemical questions. For example, the tyrosine-phosphorylated libraries described here will also be useful for the characterization of tyrosine phosphatase specificity, and acetyl-lysine-containing libraries could be used to profile lysine deacetylases and bromodomains. Additional amendments to this platform will enable the analysis of serine/threonine kinases and other protein modification or recognition domains, adding to the growing arsenal of robust methods for the high-throughput biochemical characterization of cell signaling proteins.

Materials and methods

Expression and purification of tyrosine kinase domains

Constructs for the kinase domains of c-Src, c-Abl, Fyn, Hck, AncSZ, Fer, FGFR1, FGFR3, EPHB1, EPHB2, and MERTK all contained an N-terminal His6-tag followed by a TEV protease cleavage site. These proteins were co-expressed in E. coli BL21(DE3) cells with the YopH tyrosine phosphatase. Cells transformed with YopH and the tyrosine kinase domains were grown in LB supplemented with 100 μg/mL ampicillin and 100 μg/mL streptomycin at 37 °C. Once cells reached an optical density of 0.5 at 600 nm, 500 uM of Isopropyl-β-D-1-thiogalactopyranoside (IPTG) was added to induce the expression of proteins and the cultures were incubated at 18 °C for 14–16 hours. Cells were harvested by centrifugation (4000 rpm at 4 °C for 30 min), resuspended in a lysis buffer containing 50 mM Tris, pH 7.5, 300 mM NaCl, 20 mM imidazole, 2 mM β-mercaptoethanol (BME), 10% glycerol, plus protease inhibitor cocktail, and lysed using sonication (Fisherbrand Sonic Dismembrator). After separation of insoluble material by centrifugation (33,000 g at 4 °C for 45 min), the supernatant was applied to a 5 mL HisTrap Ni-NTA column (Cytiva). The resin was washed with 10 column volumes of lysis buffer and wash buffer containing 50 mM Tris, pH 8.5, 50 mM NaCl, 20 mM imidazole, 2 mM BME, 10% glycerol. The protein was eluted with 50 mM Tris, pH 8.5, 300 mM NaCl, 500 mM imidazole, 2 mM BME, and 10% glycerol.

The eluted protein was further purified by anion exchange on a 5 mL HiTrap Q column (Cytiva) and eluted with a gradient of 50 mM to 1 M NaCl in 50 mM Tris, pH 8.5, 1 mM TCEP-HCl and 10% glycerol. The His6-TEV tag of the collected fractions were cleaved by the addition of 0.10 mg/mL TEV protease overnight. The reaction mixture was subsequently flowed through 2 mL of Ni-NTA resin (ThermoFisher). The cleaved protein was collected in the flow-through and washes, then concentrated by centrifugation in an Amicon Ultra-15 30 kDa MWCO spin filter (Millipore). The concentrate was separated on a Superdex 75 16/600 gel filtration column (Cytiva), equilibrated with 10 mM HEPES, pH 7.5, 100 mM NaCl, 1 mM TCEP, 5 mM MgCl2, 10% glycerol. Pure fractions were pooled, aliquoted, and flash frozen in liquid N2 for long-term storage at –80 °C.

Expression and purification of biotinylated SH2 domains

Grb2 SH2 (56-152), c-Src SH2 (143-250), and SHP2 CSH2 (105-220) domains were cloned into a His6-SUMO-SH2-Avi construct and were co-expressed with biotin ligase BirA in E. coli C43(DE3) cells. Specifically, cells transformed with both BirA and SH2 domains were grown in LB supplemented with 100 µg/mL kanamycin and 100 µg/mL streptomycin at 37 °C until cells reached an optical density of 0.5 at 600 nm. The temperature was brought down to 18 °C, protein expression was induced with 1 mM IPTG, and the media was also supplemented with 250 µM biotin to facilitate biotinylation of the Avi-tagged SH2 domains in vivo. Proteins expression was carried out at 18 °C for 14–16 hours. After removal of media by centrifugation, the cells were resuspended in a lysis buffer containing 50 mM Tris, pH 7.5, 300 mM NaCl, 20 mM imidazole, 10% glycerol, and 2 mM BME, supplemented with protease inhibitor cocktail. The cells were lysed using sonication (Fisherbrand Sonic Dismembrator), and the lysate was clarified by ultracentrifugation. The supernatant was applied to a 5 mL Ni-NTA column (Cytiva). The resin was washed with 10 column volumes each of buffers containing 50 mM Tris, pH 7.5, 300 mM NaCl, 20 mM imidazole, 10% glycerol, and 2 mM BME and 50 mM Tris, pH 7.5, 50 mM NaCl, 20 mM imidazole, 10% glycerol, and 2 mM BME. The protein was eluted in a buffer containing 50 mM Tris pH 7.5, 300 mM NaCl, 500 mM imidazole, 10% Glycerol.

The eluted protein was further purified by ion exchange on a 5 mL HiTrap Q anion exchange column (Cytiva). The following buffer was used: 50 mM Tris, pH 7.5, 50 mM NaCl, 1 mM TCEP and 50 mM Tris, pH 7.5, 50 mM NaCl, 1 mM TCEP. The protein was eluted off the column over a salt gradient from 50 mM to 1 M NaCl. The His6-SUMO tag was cleaved by addition of 0.05 mg/mL Ulp1 protease. The reaction mixture was flowed over 2 mL Ni-NTA column (ThermoFisher) to remove the Ulp1, the uncleaved protein, and His6-SUMO fragments. The cleaved protein was further purified by size-exclusion chromatography on a Superdex 75 16/60 gel filtration column (Cytiva) equilibrated with buffer containing 20 mM HEPES, pH 7.4, 150 mM NaCl, and 10% glycerol. Pure fractions were pooled, aliquoted, and flash frozen in liquid N2 for long-term storage at –80 °C.

Synthesis and purification of peptides for in vitro validation measurements

All the peptides used for in vitro kinetic assays were synthesized using 9-fluorenylmethoxycarbonyl (Fmoc) solid-phase peptide chemistry. All syntheses were carried out using the Liberty Blue automated microwave-assisted peptide synthesizer from CEM under nitrogen atmosphere, with standard manufacturer-recommended protocols. Peptides were synthesized on MBHA Rink amide resin solid support (0.1 mmol scale). Each Nα-Fmoc amino acid (6 eq, 0.2 M) was activated with diisopropylcarbodiimide (DIC, 1.0 M) and ethyl cyano(hydroxyamino)acetate (Oxyma Pure, 1.0 M) in dimethylformamide (DMF) prior to coupling. Each coupling cycle was done at 75 °C for 15 s then 90 °C for 110 s. Deprotection of the Fmoc group was performed in 20% (v/v) piperidine in DMF (75 °C for 15 s then 90 °C for 50 s). The resin was washed (4 x) with DMF following Fmoc deprotection and after Nα-Fmoc amino acid coupling. All peptides were acetylated at their N-terminus with 10% (v/v) acetic anhydride in DMF and washed (4 x) with DMF.

After peptide synthesis was completed, including N-terminal acetylation, the resin was washed (3 x each) with dichloromethane (DCM) and methanol (MeOH) and dried under reduced pressure overnight. The peptides were cleaved and the side chain protecting groups were simultaneously deprotected in 95% (v/v) trifluoroacetic acid (TFA), 2.5% (v/v) triisopropylsilane (TIPS), and 2.5% water (H2O), in a ratio of 10 μL cleavage cocktail per mg of resin. The cleavage-resin mixture was incubated at room temperature for 90 min, with agitation. The cleaved peptides were precipitated in cold diethyl ether, washed in ether, pelleted, and dried under air. The peptides were redissolved in 50% (v/v) water/acetonitrile solution and filtered from the resin.

The crude peptide mixture was purified using reverse-phase high-performance liquid chromatography (RP-HPLC) on a semi-preparatory C18 column (Agilent, ZORBAX 300 SB-C18, 9.4x250 mm, 5 μm) with an Agilent HPLC system (1260 Infinity II). Flow rate was kept at 4 mL/min with solvents A (H2O, 0.1% (v/v) TFA) and B (acetonitrile, 0.1% (v/v) TFA). Peptides were generally purified over a 40-min linear gradient from solvent A to solvent B, with the specific gradient depending on the peptide sample. Peptide purity was assessed with an analytical column (Agilent, ZORBAX 300 SB-C18, 4.6x150 mm, 5 μm) at a flow rate of 1 mL/min over a 0–90% B gradient in 30 minutes. All peptides were determined to be ≥95% pure by peak integration. The identities of the peptides were confirmed by mass spectroscopy (Waters Xevo G2-XS QTOF). Pure peptides were lyophilized and redissolved in 100 mM Tris, pH 8.0, as needed for experiments.

Preparation of the X5-Y-X5 and pTyr-Var libraries for specificity profiling

All bacterial display libraries used in this study are embedded within the pBAD33 plasmid (chloramphenicol resistant), with the surface-display construct inducible by L-(+)-arabinose (Rice and Daugherty, 2008). All libraries have the same general structure:

  • [signal sequence: MKKIACLSALAAVLAFTAGTSVA]-[GQSGQ]-[peptide-coding sequence]-[GGQSGQ]-[eCPX scaffold]-[GGQSGQ]-[strep-tag: WSHPQFEK or myc-tag: EQKLISEEDL]

The X5-Y-X5 library contains 11-residue peptide sequences with five randomized amino acids flanking both sides of a fixed central tyrosine residue. The library was produced using the X5-Y-X5 library oligo, with each X encoded by an NNS codon, and Y encoded by a TAT codon (see key resources table for all primer sequences). This oligo included a 5’ SfiI restriction site and DNA sequences encoding the flanking linkers that connect library peptide sequences to the 5’ signal sequence and 3’ eCPX scaffold.

The sequences in the pTyr-Var library were derived from the PhosphoSitePlus database and include 3159 human tyrosine phosphorylation sites and 4,760 variants of these phosphosites bearing a single amino acid mutation (Hornbeck et al., 2019). The sequences in this library are named as ‘GeneName_pTyr-position’ and ‘GeneName_pTyr-position’ (e.g. ‘SRC_Y530’ and ‘SRC_Y530_527 K’). In this initial list, about 2,133 sequences had more than one tyrosine residue, and so a second version of those sequences were included in which the tyrosines except the central tyrosine were substituted with phenylalanine (denoted with a ‘_YF’ suffix). In addition, 24 previously reported consensus substrate sequences were included (Begley et al., 2015; Deng et al., 2014; Marholz et al., 2018; Rube et al., 2022; Songyang et al., 1995). In total, our designed pTyr-Var library contained 9,898 unique 11-residue peptide sequences, which were then converted into DNA sequences using the most frequently used codon in E. coli. The DNA sequences were further optimized, swapping synonymous codons to achieve a GC content of all sequence between 30% and 70%. Sequences were also inspected and altered to remove any internal SfiI recognition sites. The 33-base peptide-coding sequences were flanked by 5’-GCTGGCCAGTCTGGCCAG-3’ on the 5’ side and 5’- GGAGGGCAGTCTGGGCAGTCTG-3’ on the 3’ side, the same flanks used for the X5-Y-X5 library oligo. An oligonucleotide pool based on all 9,898 sequences was generated by on-chip massively parallel synthesis (Twist Bioscience). This oligo-pool was amplified by PCR in ten cycles with the Oligopool-fwd-primer and Oligopool-rev-primer, using the NEB Q5 polymerase with a slow ramping speed (2 °C/s) and long denaturation times.

Next, we integrated the oligonucleotide sequences encoding the X5-Y-X5 and pTyr-Var library into a pBAD33 vector as a fusion to the eCPX bacterial display scaffold, in a series of steps. The eCPX gene was previously fused to a sequence encoding a 3’ strep-tag (pBAD33-eCPX-cStrep) (Shah et al., 2018), and we produced a myc-tagged eCPX construct analogously, using standard molecular cloning techniques (pBAD33-eCPX-cMyc). The coding sequences for the eCPX-strep and eCPX-myc constructs were amplified from these plasmids by PCR using the link-eCPX-fwd primer and the link-eCPX-rev primer. These PCR products contained a 3’ SfiI restriction site. The peptide-coding sequences were then fused to the eCPX scaffold at the 5’ end of the scaffold in another PCR step to generate the library-scaffold inserts. For the X5-Y-X5 Library, this step used the X5-Y-X5 library oligo and the link-eCPX-rev primer, along with the amplified eCPX gene. For the pTyr-Var library, this step used the amplified oligo-pool, the amplified eCPX gene, and the Oligopool-fwd-primer and link-eCPX-rev primer. The resulting PCR products contained the peptide-scaffold fusion constructs flanked by two unique SfiI sites.

In parallel, the pBAD33-eCPX backbone was amplified by PCR from the pBAD33-eCPX plasmid using the BB-fwd-primer and BB-rev primer. Both the amplified insert and backbone were purified over spin columns and then digested with the SfiI restriction endonuclease overnight at 50 °C. After digestion, the backbone was treated with Quick CIP (NEB) to prevent self-ligation from occurring. Both the digested insert and backbone were gel purified. The purified library insert was ligated into the digested pBAD33-eCPX backbone using T4 DNA ligase (NEB) overnight at 16 °C. Typically, this reaction was done with a total of approximately 1.5 μg of DNA, with a 1:5 molar ratio of backbone:insert. The ligation reaction was concentrated and desalted over a spin column and then used to transform commercial DH5a cells by electroporation. The transformed DH5a cells were grown in liquid culture overnight, and the plasmid DNA was isolated and purified using a commercial midiprep kit (Zymo).

Experimental procedure for high-throughput specificity screening of tyrosine kinases

Preparation of cells displaying peptide libraries

The high-throughput specificity screens for tyrosine kinases using the X5-Y-X5 and the pTyr-Var peptide library were carried out as described previously (Shah et al., 2018), with the main difference being the use of magnetic beads to isolate phosphorylated cells, rather than fluorescence-activated cell sorting. 25 µL of electrocompetent E. coli MC1061 F- cells were transformed with 200 ng of library DNA. Following electroporation, the cells were resuspended in 1 mL of LB and allowed to recover at 37 °C for 1 hr with shaking. These cells were resuspended in 250 mL of LB with 25 µg/ml chloramphenicol and incubated overnight at 37 °C. Of the overnight culture, 150 μL was used to inoculate 5.5 mL of LB containing 25 µg/mL chloramphenicol. This culture was grown at 37 °C for 1–2 hr until the cells reached an optical density of 0.5 at 600 nm. Expression of the library was induced by adding arabinose to a final concentration of 0.4% (w/v). The cells were incubated at 25 °C with shaking at 220 rpm for 4 hr. Small aliquots of the cells (75–150 µL) were transferred to microcentrifuge tubes and centrifuged at 1000 g at 4 °C for 10–15 min. The media was removed and the cells were resuspended in PBS and centrifuged again. The PBS was removed and the cells were stored at 4 °C. Experiments were performed with cells stored at 4 °C between 1–4 days. Typical screens were carried out on a 50 μL to 100 μL scale, with cells that were 50% more concentrated than in culture (OD600 value around 1.5). Thus, for a 100 μL reaction, typically 150 μL of cell culture was pelleted and washed.

Phosphorylation of peptides displayed on cells

Phosphorylation reactions of the library were conducted with the purified kinase domain and 1 mM ATP in a buffer containing 50 mM Tris, pH 7.5, 150 mM NaCl, 5 mM MgCl2, 1 mM TCEP, and 2 mM sodium orthovanadate. To achieve similar library phosphorylation levels across the kinases, an optimal concentration of kinase was determined to achieve 20–30% phosphorylation of the library after 3 minutes of incubation at 37 °C. This was assessed by flow cytometry based on anti-phosphotyrosine antibody labeling (Attune NxT, Invitrogen). To label the phosphorylated cells, 50 μL pellets were resuspended with a 1:25 dilution of the PY20-PerCP-eFluor 710 conjugate (eBioscience) in PBS containing 0.2% bovine serum albumin (BSA). The cells were incubated with the antibody for 1 hr on ice in the dark, then centrifuged, washed once with PBS with 0.2% BSA, and finally resuspended in 100 μL of PBS with 0.2% BSA. For flow cytometry analysis, 20 μL of cells were diluted in 130 μL of PBS with 0.2% BSA.

The following concentrations were used: 0.5 μM for Src, 1.5 μM for Abl, 0.4 μM for Fer, 1.5 μM for EPHB1, 1.25 μM for EPHB2, 0.1 μM for JAK2, 0.5 μM for AncSZ, 0.45 μM for FGFR1, 0.5 μM for FGFR3, and 0.7 μM for MERTK. For some tyrosine kinases, such as FGFR1, FGFR3, and MERTK, pre-activation with ATP was required to enhance its kinetic activity. To accomplish this, autophosphorylation reactions were performed with 25 μM kinase and 5 mM ATP for 0.5–2 hours at 25 °C. The preactivated kinase mixture was then desalted and concentrated using an Amicon Ultra-15 30 kDa MWCO spin filter (Millipore) to remove the residual ATP.

After the desired time of library phosphorylation, kinase activity was quenched with 25 mM EDTA and the cells were washed with PBS containing 0.2% BSA. Kinase-treated cells were then labeled with a 1:1000 dilution of biotinylated 4G10 Platinum anti-phosphotyrosine antibody (Millipore) for an hour on ice and washed with PBS containing 0.1% BSA and 2 mM EDTA (isolation buffer). The cells were finally resuspended in PBS containing 0.1% BSA. The phosphorylated, antibody-labeled cells were then mixed with magnetic beads from Dynabeads FlowComp Flexi kit (Invitrogen), at a ratio of 37.5 μL of washed beads per 50 μL of cell suspension, diluted into 450 μL of isolation buffer. The suspension was rotated at 4 °C for 30 minutes, then 375 μL of isolation buffer was added and the beads were separated from the bulk solution on a magnetic rack. The beads were washed once with 1 mL of isolation buffer, and then the supertantant were removed by aspiration. The beads were resuspended in 50 µL of fresh water, vortexed, and boiled at 100 °C for 10 minutes to extract DNA from cells bound to Dynabeads. The bead/lysate mixture was centrifuged to pellet the beads and the mixture was stored at –20 °C.

DNA sample preparation and deep sequencing

To amplify the peptide-coding DNA sequence for deep sequencing, the supernatant from this lysate was used as a template in a 50 μL, 15-cycle PCR reaction using the TruSeq-eCPX-Fwd and TruSeq-eCPX-Rev primers and Q5 polymerase. The resulting mixture from this PCR reaction was used without purification as a template for a second, 20 cycle PCR reaction to append a unique pair of Illumina sequencing adapters and 5’ and 3’ indices for each sample (D700 and D500 series primers). The resulting PCR products were purified by gel extraction, and the concentration of each sample was determined using QuantiFluor dsDNA System (Promega). Each sample was pooled to equal molarity and sequenced by paired-end Illumina sequencing on a MiSeq or NextSeq instrument using a 150 cycle kit. The number of samples multiplexed in one run, and the loading density on the sequencing chip, were adjusted to obtain at least 1–2 million reads for each index/sample.

Experimental procedure for high-throughput specificity screening of SH2 domains

Preparation of cells displaying peptide libraries

Bacteria displaying peptide libraries for SH2 screens were prepared similarly to the bacteria for the kinase screens, with some small modifications. Specifically, after transformation with the library DNA and outgrowth of an overnight culture, 1.8 mL of the overnight culture was added to a 100 mL of LB containing 25 μg/mL of chloramphenicol. This culture was grown at 37 °C until the cells reached an optical density of 0.5 at 600 nm. Then, 20 mL of this culture was transferred to a 50 mL flask, and expression was induced by addition of arabinose to a final concentration of 0.4% (w/v). Expression was carried out at 25 °C for 4 hr, then cells were aliquoted, pelleted, and washed as described for kinase screens.

Phosphorylation of peptides displayed on cells

Phosphorylation of cells was performed in a buffer containing 50 mM Tris, pH 7.5, 150 mM NaCl, 5 mM MgCl2, 1 mM TCEP. A mixture of 2.5 µM of c-Abl kinase domain, 2.5 µM c-Src kinase domain, 2.5 µM of EPHB1 kinase domain, 2.5 µM of AncSZ, 50 µg/mL rabbit muscle creatine phosphokinase, and 5 mM creatine phosphate was prepared in this buffer. Cells were resuspended in this solution such that a pellet derived from 50 μL of cell culture was resuspended in 50 μL of solution. To initiate the phosphorylation reaction, ATP was added from a concentrated stock to a final concentration of 5 mM, and the mixture was incubated at 37 °C for 3 hr. Following this, the kinase activity was quenched by addition of 25 mM EDTA. Library phosphorylation was assessed by flow cytometry based on anti-phosphotyrosine antibody labeling, as described above for the kinase screens (Attune NxT, Invitrogen).

Preparation of magnetic beads functionalized with SH2 domains (SH2-dynabeads)

First, 37.5 µL of magnetic beads from the Dynabeads FlowComp Flexi kit (Invitrogen) were washed with 1 mL of SH2 screen buffer containing 50 mM HEPES, pH 7.5, 150 mM NaCl. After washing, the beads were resuspended in 75 µL of 20 µM biotinylated SH2 domain and incubated at 4 °C for 2.5–3 hr. Unbound SH2 domain protein was removed by washing twice with 1 mL of SH2 screen buffer twice. The beads were finally resuspended in 37.5 µL of SH2 screen buffer.

Selection with SH2-dynabeads

Fifty µL of the phosphorylated cells were centrifuged at 4000 g at 4 °C for 15 min. After the supernatant was discarded, the cells were resuspended in SH2 screen buffer with 0.1% BSA, mixed with 37.5 µL of SH2-dynabeads, and rotated for 1 hr at 4 °C. Then, the magnetic beads were separated from the bulk solution using a magnetic rack, and the supernatant was removed by aspiration. After the supernatant was discarded, the SH2-beads were washed by incubating them with 1 mL of SH2 screen buffer for 30 min at 4 °C. After discarding the wash solution, the beads were resuspended in 50 µL of fresh water, vortexed, and boiled at 100 °C for 10 min to extract DNA from cells bound to SH2-dynabeads. DNA samples were prepped and sequenced identically as done for the kinase screens.

Procedure for incorporating non-canonical amino acids in the high-throughput specificity screen

General protocol

E. coli MC1061 electrocompetent bacteria were transformed with genetically-encoded peptide libraries and grown in liquid LB media as described in the regular screens, but with an additional plasmid encoding the corresponding non-canonical amino acid aminoacyl synthetase and tRNA pair and the addition of 100 µg/mL ampicillin to the growth medium. The cells were grown to an optical density of 0.5 at 600 nm. Peptide expression was induced with 0.4% (w/v) arabinose, 1 mM isopropylβ-D-1-thiogalactopyranoside (IPTG), and 5 mM CMF, 5 mM AzF, or 10 mM AcK, and incubated at 25 °C for 4 h. Cell pellets were collected and washed in PBS as described in the regular screens. Bacteria bearing surface-displayed peptides containing the non-canonical amino acid of interest were phosphorylated with 0.5 µM Src kinase for 3 min using the same buffer conditions as in the regular kinase screens. The reaction was carried out in buffer containing 50 mM Tris, 150 mM NaCl, 5 mM MgCl2, pH 7.5, 1 mM TCEP, and 2 mM activated sodium orthovanadate for 3 min. The reactions were initiated with 1 mM ATP and quenched with 25 mM EDTA, then washed with PBS containing 0.2% BSA, as described for the regular screens. Downstream processing of the samples, including phospho-tyrosine labeling, separation using magnetic beads, and deep sequencing were done exactly as in the regular kinase screens.

Fluorophore labeling of surface-displayed AzF using click chemistry

The DIBO labeling solution was prepared by dissolving 0.5 mg of DIBO-alkyne Alexa Fluor 555 dye (ThermoFisher) in dimethyl sulfoxide (DMSO) to a concentration of 1 mM, and the solution was kept protected from light. The c-Myc tag labeling solution was prepared by a 1:100 dilution of c-Myc Alexa Fluor 488 conjugate (ThermoFisher) in PBS containing 0.2% BSA. The cell pellets treated with AzF were resuspended in 50 μM of the DIBO labeling solution and incubated overnight at RT with gentle nutation, protected from light (Tian et al., 2014). The cell suspension was pelleted and washed 4 x in PBS containing 0.2% BSA to ensure all excess DIBO dye was removed. The cell pellets were then resuspended in the c-Myc antibody solution and incubated on ice for 1 hr, protected from light. The cell suspension was pelleted and washed using PBS with 0.2% BSA. The pellets were resuspended in PBS with 0.2% BSA and analyzed by flow cytometry (Attune NxT, Invitrogen).

A note about replicates for the bacterial peptide display screens

We define technical replicates as sets of screens conducted with library-expressing cells that are all derived from the same library transformation reaction. Biological replicates are screens done using different transformations with the library DNA, often on different days. The replicates in this study are generally all biological replicates or two biological replicate sets of two to three technical replicates.

Processing and analysis of deep sequencing data from high-throughput specificity screens

The raw paired-end reads for each index pair from an Illumina sequencing run were merged using the FLASH (Magoč and Salzberg, 2011). The resulting merged sequences were then searched for the following 5’ and 3’ flanking sequences surrounding the peptide-coding region of the libraries: 5’ flanking sequence = 5’-NNNNNNACCGCAGGTACTTCCGTAGCTGGCCAGTCTGGCCAG-3’, and 3’ flanking sequence = 5’-GGAGGGCAGTCTGGGCAGTCTGGTGACTACAACAAAANNNNNN-3’. These flanks were removed using the software Cutadapt to yield a filed named ‘SampleName.trimmed.fastq’ (Martin, 2011). Sequences that did not contain both flanking regions were discarded at this stage (typically less than 5%). From this point onward, all analysis was carried out using Python scripts generated in-house, which can be found in a GitHub repository https://github.com/nshahlab/2022_Li-et-al_peptide-display (copy archived at Li et al., 2023). Trimmed and translated FastQ and FastA files for all data used in this paper can be found in a Dryad repository (https://doi.org/10.5061/dryad.0zpc86727).

Analysis of data from screens with thepTyr-Var Library

For the samples screened with the pTyr-Var Library, we ran scripts that identify every 33 base trimmed DNA sequence, translate those DNA sequences into amino acid sequences, count the abundance of each translated sequence that matches a peptide in the pTyr-Var library. In one format of this analysis, we used the countPeptides.py script on a trimmed input file, or batch-countPeptides.py script for multiple input files, to generate a list of every unique peptide and its corresponding counts. In a second format of this analysis, we used the countPeptides-var-ref.py (or batch-countPeptides-var-ref.py), along with paired text files listing each variant (pTyr-Var_variant.txt) and their corresponding reference sequence (pTyr-Var_reference.txt), line-by-line, to yield side-by-side counts for each variant-reference pair. These processing steps were conducted for both selected samples (after kinase phosphorylation or SH2 binding), as well as unselected input samples. Next, the number of reads for every sequence (npeptide) was normalized to the total number of peptide-coding reads in that sample (ntotal), to yield a frequency (fpeptide, equation 1). Then, the frequency of each peptide in a selected sample (fpeptide,selected) was further normalized to the frequency of that same peptide in the unselected input sample (fpeptide,input) to yield an enrichment score (Epeptide, equation 2).

fpeptide=npeptidentotal (1)
Epeptide=fpeptide,selectedfpeptide,input (2)

Analysis of data from screens with the X5-Y-X5 Library

For data from the X5-Y-X5 library, we did not calculate enrichments for individual sequences, as the sequencing depth per sample was generally on-par with the library size was (106–107 sequences). Instead, we computed the counts for each amino acid (or a stop codon) at every position along peptides of the expected length (11 amino acid residues). To accomplish this, we first translated all of the DNA sequences in the trimmed sequencing files using the translateUnique.py (or batch-translateUnique.py) script. When stop codons were encountered, they were translated as an asterisk symbol. In addition to producing a file of translated reads named ‘SampleName.translate.fasta’, this script also produced lists of every unique translated 11-residue peptide and the corresponding counts for that peptide. These files allowed us to assess whether any individual sequence was disproportionately enriched (not expected for a single round of selection with a library of this size), how many unique sequences were in each sample, and what fraction of the unique sequences contained a stop codon.

Using the translated read files, we then calculated the position-specific amino acid counts in three formats. In the simplest format, we exclusively counted 11-residue sequences that contained a central tyrosine and no stop codons (AA-count-nostop.py and batch-AA-count-nostop.py). In order to calculate stop codon depletion, we run a version of the script that counted amino acid and stop codon composition across all 11-residue sequences (AA-count-full.py and batch-AA-count-full.py). Finally, for Amber suppression datasets, we exclusively counted sequences containing one stop and a central tyrosine residue (AA-count-1stop.py and batch-AA-count-1stop,py). Each of these scripts generated an 11x21 counts matrix with each position in the peptide represented by a column (from –5 to +5), and each row represented by an amino acid (in alphabetical order, with the stop codon in the 21st row). Frequencies of each amino acid at each position were determined by taking the position-specific count for each amino acid and dividing that by the column total. Frequencies in a matrix from a selected sample were further normalized against frequencies from an input sample, and the resulting enrichment values were log2-transformed to yield the data represented in the heatmaps in Figures 1, 2, 6 and 7.

Scoring sequences using data from the X5-Y-X5 Library

In order to score peptides using position-weighted counts matrixes from the X5-Y-X5 Library, we wrote a Python script called score_peptide_nostop.py. This script requires the selected and input counts matrices for a kinase or SH2 domain, produced by the AA-count-nostop.py script, along with a list of peptides as a text file, with one peptide per line. The script first calculates the normalized enrichments for each amino acid at each position across the matrices. Then, it reads each target sequence, sums up the log2-normalized enrichments for each residue according to the enrichment matrix, ignoring the central tyrosine, and divides the sum by the number of scored residues (10 for the X5-Y-X5 Library). The script also calculates the score for the best and worst sequence, according to the enrichment matrix. Both unnormalized and normalized scores for the whole peptide list are outputted as text files.

In vitro measurements of phosphorylation rates with purified kinases and peptides

RP-HPLC assay to measure peptide phosphorylation kinetics

To validate the enrichment scores observed in the c-Src screening data, the phosphorylation rates were measured in vitro with the purified catalytic domain of c-Src and synthetic 11-residue peptides derived from sequences in the pTyr-Var library. Kinetic measurements were carried out at 37 °C by mixing 500 nM c-Src and 100 μM peptide in a buffer containing 50 mM Tris, pH 7.5, 150 mM NaCl, 5 mM MgCl2, 1 mM TCEP, and 2 mM activated sodium orthovanadate. Reactions were initiated by adding 1 mM ATP. At various time points, 100 μL aliquots were removed and quenched by the addition of EDTA to a final concentration of 25 mM. Each time point sample was analyzed by analytical RP-HPLC, monitoring absorbance at 214 nm. Forty μL of each time point was injected onto a C18 column (ZORBAX 300 SB-C18, 5 μm, 4.6x150 mm). The solvent system used was water with 0.1% trifluoroacetic acid (solvent A) and acetonitrile with 0.1% trifluoroacetic acid (solvent B). Peptides were eluted at a flow rate of 1 mL/min, using the following set of linear gradients: 0–2 min: 5% B, 2–12 min: 5–95% B, 12–13 min: 95% B, 13–14 min: 95–5% B, and 14–17 min: 5% B. The areas under the peaks corresponding to the unphosphorylated and phosphorylated peptides were calculated using the Agilent OpenLAB ChemStation software. The fractional product peak area was plotted as a function of reaction time, and the initial linear regime of this plot was fitted to a straight line to determine a reaction rate. Rates were corrected for substrate and enzyme concentration. Reactions were done in triplicate or quadruplicate.

Michaelis-Menten analysis using the ADP-Quest assay

A fluorescence-based assay from Eurofins (ADP Quest) was used to measure the Michaelis-Menten kinetic parameters for phosphorylation of the consensus peptides by purified tyrosine kinase domains. In this assay, ADP production as a result of kinase activity is coupled to the production of resorufin, a fluorophore that emits signal at 590 nm. For all experiments, the assay reactions were set up as described in the provided assay kit protocol, in a 384 well plate format. The peptide solutions were serially diluted in 100 mM Tris, pH 8.0, and the kinases were diluted to 50 nM in buffer (10 mM HEPES, 100 mM NaCl, 1 mM TCEP, 5 mM MgCl2, 10% (v/v) glycerol). The final reaction mixtures contained 10 nM of kinase with 100 μM of ATP. Reactions were initiated with the addition of 1 mM ATP into a 50 μL reaction mixture for a final concentration of 100 μM ATP. Phosphorylation reaction progress was monitored by measuring fluorescence at excitation 530 nm and emission 590 nm every 2 min at 37 °C on a plate reader (BioTek Synergy Neo 2). The fluorescence units (RFU) were converted to μM ADP by comparison to a standard curve, and the initial rates were extracted from the linear regime of the reaction progress curves. Initial rates were also measured for samples containing each kinase but lacking a peptide substrate, to account for background ATP hydrolysis. This background rate was subtracted from the rates measured in the presence of peptide. The subtracted rates were plotted as a function of substrate concentration and fit to the Michaelis-Menten equation to extract kcat and KM values.

In vitro measurements of binding affinities with purified SH2 and phospho-peptides

Binding affinities of SH2 domains and phospho-peptides were measured using fluorescence polarization-based competition binding assay, following previously reported methods (Cushing et al., 2008). The fluorescent peptide (FITC-Acp-GDG(pY)EEISPLLL) used for KD measurements was a gift from the Amacher lab. A buffer containing 60 mM HEPES, pH 7.2, 75 mM KCl, 75 mM NaCl, 1 mM EDTA, and 0.05% Tween 20 was used for the experiments. For KD measurement, varying concentrations of the c-Src SH2 protein were incubated with 30 nM fluorescent peptide for 15 min in a black, half-area, 96-well plate. The plate was centrifuged for 5 min at 1000 g to remove air bubbles. Following this, fluorescence polarization data was collected on a plate reader at 25 °C (BioTek Synergy Neo 2). The samples were excited at a wavelength of 485 nm and emission data was collected at 525 nm. Data was analyzed and fitted to a quadratic binding equation to determine the KD for the fluorescent peptide with c-Src. A KD of 160 nM was obtained for the fluorescent peptide with the c-Src SH2 domain, and this value was used in subsequent calculations for the competition binding experiments.

Competition binding experiments were performed similarly. A stock solution was prepared by mixing 60 nM fluorescent peptide with SH2 domain at a concentration of 480 nM (3 x KD) and incubated at room temp for 15 min. Unlabeled competitor peptide was serially diluted in buffer. Each serial dilution was mixed with fluorescent peptide-SH2 stock solution at a 1:1 ratio in a black, half-area, 96 well plate. After mixing the samples by pipetting, the plate was centrifuged at 1000 g for 5 min to remove air bubbles. The final fluorescent peptide concentration was 30 nM and the final SH2 concentration was 1.5 x KD (240 nM). Fluoresce polarization was measured as previously described for initial KD measurements. Competition binding data were fit to a cubic binding equation as described previously (Cushing et al., 2008).

In vitro measurements of phosphorylation rates with purified kinases and SHP2 substrate

Expression and Purification of SHP2 WT, D61V, and D61N

All SHP2 variants contained a catalytic cysteine mutation (C459E), C-terminal tail (526-593) deletion, and N-terminal His6-tag followed by a TEV protease cleavage site. The same protocol used to express and purify SH2 domains, excluding co-expression of BirA and addition of biotin, was applied to the expression and purification of the SHP2 variants.

LC-MS assay to measure protein phosphorylation kinetics

To pre-activate the kinases, 1 μM of each purified kinase domain was preincubated at 37 °C for 30 min in the same buffer conditions used in the kinase domain peptide display screen, with 1 mM ATP. The reaction of the kinase with SHP2 was initiated with the addition of 10 μM SHP2, and the mixture was incubated in 37 °C for 1 hr. To terminate the reaction, the mixture was quenched with 200 mM EDTA. The reaction mixture was diluted 3:2 in water and injected onto a BEH C8 column (Waters) on a UPLC-MS system (Xevo QToF, Waters). Reverse-phase liquid chromatography was carried out at 0.3 mL/min with solvents A (H2O, 0.1% (v/v) formic acid) and B (acetonitrile, 0.1% (v/v) formic acid). Proteins were eluted over a gradient of 5–95% B for 8.5 min. The protein peak on the chromatogram was deconvoluted using the MaxEnt1 algorithm from 32,000–65,000 Da with a resolution of 1 Da/channel over 30 iterations. Peaks were chosen according to the theoretical MW of the protein within a range of 5 Da, and integrated for the signal intensity.

Materials and data availability

The key reagents produced in this study (the X5-Y-X5 Library, the pTyr-Var Library, and protein expression plasmids) will be made freely available to any researcher interested in using our specificity profiling platform. Data from the specificity screens in this study, in the form of enrichment scores, are available alongside this publication as source data files. Trimmed and translated deep sequencing data (.fastq and.fasta files) are available via Dryad: https://doi.org/10.5061/dryad.0zpc86727. Code used in this study to process and analyze the data can be found in this GitHub repository: https://github.com/nshahlab/2022_Li-et-al_peptide-display (copy archived at Li et al., 2023). The plasmid libraries and unprocessed data can be requested by directly contacting the corresponding author.

Acknowledgements

We thank Fereshteh Zandkarimi and Brandon Fowler from the Columbia Chemistry mass spectrometry facility for their assistance with mass spectrometry; Jia Ma from the Columbia Precision Biomolecular Characterization Facility for his guidance with biophysical measurements; and the Columbia Genome Center for their support with deep sequencing. We thank Neil Vasan for his guidance with SHP2 phosphorylation assays. We thank Harmen Bussemaker, Tomas Rube, and Chaitanya Rastogi for their insightful discussions, and members of the Shah lab for their technical and conceptual guidance throughout this project. The fluorescently-labeled c-Src SH2 ligand was a gift from the Jeanine Amacher. The pULTRA chAcKRS3 plasmid was a gift from Abhishek Chatterjee. Bacterial expression vectors for Fer, FGFR1, FGFR3, EPHB1, EPHB2, and MERTK were gifts from John Chodera, Nicholas Levinson, and Markus Seeliger (Addgene plasmid #s 79686, 79719, 79731, 79694, 79697, and 79705). The pEVOL pAzFRS.2.t1 plasmid was a gift from Farren Isaacs (Addgene plasmid #73546). This work was supported by NIH grant R35 GM138014 and a Damon Runyon-Dale F Frey Award for Breakthrough Scientists (DFS 31–18), awarded to NHS.

Appendix 1

Appendix 1—key resources table.

Reagent type (species) or resource Designation Source or reference Identifiers Additional information
Strain, strain background (E. coli) MC1061 Lucigen Lucigen: 10361012 bacterial cells used for surface-display screens
Strain, strain background (E. coli) DH5α Invitrogen Invitrogen: 18265017 bacterial cells used for
general cloning and library cloning
Strain, strain background (E. coli) BL21(DE3) ThermoFisher Scientific Thermo: C600003 bacterial cells for general protein-expression;
pre-transformed with pCDF-YopH for
tyrosine kinase overexpression
Strain, strain background (E. coli) C43(DE3) Lucigen Lucigen: NC9581214 bacterial cells used for SH2 domain over-expression;
pre-transformed with pCDFDuet-BirA-WT for biotinylation
Antibody 4 G10 Platinum, Biotin (mouse monoclonal) Millipore Sigma Millipore Sigma: 16–452-MI biotin conjugated mouse monoclonal
pan-phosphotyrosine antibody dilution: (1:1000)
Antibody PY20-PerCP-eFluor 710 (mouse monoclonal) eBioscience eBioscience: 46-5001-42 PerCP-eFluor 710-conjugated mouse monoclonal
pan-phosphotyrosine antibody, clone PY20 dilution: (1:25)
Antibody PY20-biotin (mouse monoclonal) Exalpha Exalpha: 50-210-1865 biotin conjugated mouse monoclonal
pan-phosphotyrosine antibody dilution (1:500)
Antibody StrepMAB Chromeo 488 (mouse monoclonal) IBA LifeSciences IBA: 2-1546-050 Chromeo 488-conjugated antibody that
recognizes the strep-tag dilution: (1:50–100).
Discontinued, but can be replaced with IBA
LifeSciences StrepMAB-Classic conjugate
DY-488 (IBA: 2-1563-050)
Recombinant DNA reagent pBAD33-eCPX PMID:18480093 Addgene: 23336 pBAD33 plasmid encoding the eCPX bacterial
display gene with flanking 5' and 3' SfiI restriction sites
Recombinant DNA reagent pBAD33-eCPX-cStrep PMID:29547119
pBAD33 plasmid encoding the eCPX bacterial
display gene with a 3' sequence encoding a
strep-tag and flanking 5' and 3' SfiI restriction sites
Recombinant DNA reagent pBAD33-eCPX-cMyc this paper
pBAD33 plasmid encoding the eCPX bacterial
display gene with a 3' sequence encoding a
myc-tag and flanking 5' and 3' SfiI restriction sites
Recombinant DNA reagent X5-Y-X5 Library (myc-tagged) this paper
peptide display library in the pBAD33 vector, fused
to the eCPX scaffold, containing 1–10 million
unique sequences with the structure X5-Y-X5, where X is
encoded by an NNS codon. The scaffold protein is
encoded to have a C-terminal myc-tag: EQKLISEEDL.
Recombinant DNA reagent X5-Y-X5 Library (strep-tagged) this paper
peptide display library in the pBAD33 vector, fused to
the eCPX scaffold, containing 1–10 million unique
sequences with the structure X5-Y-X5,
where X is encoded by an NNS codon.
The scaffold protein is encoded to have a
C-terminal strep-tag: WSHPQFEK.
Recombinant DNA reagent pTyr-Var Library (myc-tagged) this paper
peptide display library in the pBAD33 vector,
fused to the eCPX scaffold, containing ~10,000
unique sequences encoding reference and
variant phosphosite pairs deried from the
PhosphoSitePlus database. The scaffold
protein is encoded to have a C-terminal
myc-tag: EQKLISEEDL.
Recombinant DNA reagent pTyr-Var Library (strep-tagged) this paper
peptide display library in the pBAD33 vector,
fused to the eCPX scaffold, containing ~10,000
unique sequences encoding reference and variant
phosphosite pairs deried from the
PhosphoSitePlus database. The scaffold protein is
encoded to have a C-terminal strep-tag: WSHPQFEK.
Recombinant DNA reagent pET-23a-His6-TEV-Src(KD) PMID:29547119
bacterial expression vector encoding the human
c-Src kinase domain (residues 260–528), with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagent pET-23a-His6-TEV-Fyn(KD) PMID:29547119
bacterial expression vector encoding the human
Fyn kinase domain (residues 261–529) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagent pET-23a-His6-TEV-Hck(KD) PMID:29547119
bacterial expression vector encoding the human
Hck kinase domain (residues 252–520) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagent pET-23a-His6-TEV-Abl(KD) PMID:29547119
bacterial expression vector encoding the mouse
c-Abl kinase domain (residues 232–502) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagent pET-23a-His6-TEV-AncSZ(KD) DOI:
10.1101/2022.04.24.489292

bacterial expression vector encoding the AncSZ
kinase domain (residues 352–627) with an N-terminal
His6-tag and TEV protease recognition sequence
Recombinant DNA reagent pET23a-His6-TEV-Fer(KD) this paper
bacterial expression vector encoding the mouse
Fer kinase domain (residues 553–823) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagent pET-His6-TEV-FGFR1(KD) PMID:30004690 Addgene: 79719 bacterial expression vector encoding the human
FGFR1 kinase domain (residues 456–763) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagent pET-His6-TEV-FGFR3(KD) PMID:30004690 Addgene: 79731 bacterial expression vector encoding the human
FGFR3 kinase domain (residues 449–759) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagent pET-His6-TEV-EPHB1(KD) PMID:30004690 Addgene: 79694 bacterial expression vector encoding the human
EPHB1 kinase domain (residues 602–896) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagent pET-His6-TEV-EPHB2(KD) PMID:30004690 Addgene: 79697 bacterial expression vector encoding the human
EPHB2 kinase domain (residues 604–898) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagent pET-His6-TEV-MERTK(KD) PMID:30004690 Addgene: 79705 bacterial expression vector encoding the human
MERTK kinase domain (residues 570–864) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagent pCDF-YopH PMID:16260764
bacterial expression vector for co-expression of
untagged YopH phosphatase with tyrosine kinases
Recombinant DNA reagent pET28-His6-TEV-SHP2-C459E-no tail this paper
bacterial expression vector encoding the human SHP2
(residues 1–526) with the C459E mutation, an N-terminal
His6-tag, and TEV protease recognition sequence
Recombinant DNA reagent pET28-His6-TEV-SHP2-C459E-no tail-D61V this paper
bacterial expression vector encoding the human SHP2
(residues 1–526) with C459E and D61V mutations, an
N-terminal His6-tag, and TEV protease recognition sequence
Recombinant DNA reagent pET28-His6-TEV-SHP2-C459E-no tail-D61N this paper
bacterial expression vector encoding the human
SHP2 (residues 1–526) with C459E and D61N mutations,
an N-terminal His6-tag, and TEV protease recognition sequence
Recombinant DNA reagent pCDFDuet-BirA-WT this paper
bacterial expression vector encoding BirA biotin ligase,
used to coexpress with SH2 domain expression
vector for biotinylation of SH2 domain
Recombinant DNA reagent pET-His6-SUMO-Src(SH2) this paper
bacterial expression vector encoding the human
cSrc SH2 domain (residues 143–250) with an
N-terminal His6-SUMO tag
Recombinant DNA reagent pET-His6-SUMO-SHP2(CSH2) this paper
bacterial expression vector encoding the human SHP2
CSH2 domain (residues 105–220) with an N-terminal His6-SUMO tag
Recombinant DNA reagent pET-His6-SUMO-Grb2(SH2) this paper
bacterial expression vector encoding the human Grb2
SH2 domain (residues 56–152) with an N-terminal His6-SUMO tag
Recombinant DNA reagent pULTRA CMF PMID:28604693
bacterial expression vector encoding the tRNA/syntetase
pair for incorporation of 4-carboxymethyl phenylalanine
via Amber suppression
Recombinant DNA reagent pEVOL pAzFRS.2.t1 PMID:26571098 Addgene: 73546 bacterial expression vector encoding the tRNA/syntetase
pair for incorporation of 4-azido phenylalanine and other
Phe derivatives via Amber suppression
Recombinant DNA reagent pULTRA chAcKRS3 PMID:29544052
bacterial expression vector encoding the tRNA/syntetase
pair for incorporation of acetyl-lysine via Amber suppression;
gift from Abhishek Chatterjee at Boston College
Recombinant DNA reagent pULTRA-Amp CMF this paper
bacterial expression vector encoding the tRNA/syntetase
pair for incorporation of 4-carboxymethyl phenylalanine
via Amber suppression, altered to have an ampicillin resistance marker
Recombinant DNA reagent pULTRA-Amp pAzFRS.2.t1 this paper
bacterial expression vector encoding the tRNA/syntetase
pair for incorporation of 4-azido phenylalanine and other
Phe derivatives via Amber suppression, altered to have
an ampicillin resistance marker
Recombinant DNA reagent pULTRA-Amp chAcKRS3 this paper
bacterial expression vector encoding the tRNA/syntetase
pair for incorporation of acetyl-lysine via Amber suppression,
altered to have an ampicillin resistance marker
Sequence-based reagent X5-Y-X5 library oligo; eCPX-rand-lib this paper, purchased from Millipore Sigma
primer sequence: 5’-GCTGGCCAGTCTGGCCAGNNS
NNSNNSNNSNNStatNNSNNSNNSNNSNNSGGAGG
GCAGTCTGGGCAGTCTG 3’
Sequence-based reagent Oligopool-fwd-primer this paper, purchased from Millipore Sigma
primer sequence: 5’-GCTGGCCAGTCTG-3’
Sequence-based reagent Oligopool-rev-primer this paper, purchased from Millipore Sigma
primer sequence: 5’-CAGACTGCCCAGACT-3’
Sequence-based reagent link-eCPX-fwd this paper, purchased from Millipore Sigma
5’-GGAGGGCAGTCTGGGCAGTCTG-3’
Sequence-based reagent link-eCPX-rev this paper, purchased from Millipore Sigma
5’-GCTTGGCCACCTTGGCCTTATTA-3’
Sequence-based reagent BB-fwd-primer this paper, purchased from Millipore Sigma
5’-TAATAAGGCCAAGGTGGCCAAGC-3’
Sequence-based reagent BB-rev primer this paper, purchased from Millipore Sigma
5’-CTGGCCAGACTGGCCAGCTACG-3’
Sequence-based reagent TruSeq-eCPX-Fwd sequence from PMID:29547119, purchased from Millipore Sigma round one amplicon PCR primer primer sequence: 5’-TGACTGGAGTTCAGACGTG
TGCTCTTCCGATCTNNNNNNACCGCA
GGTACTTCCGTAGCT-3’
Sequence-based reagent TruSeq-eCPX-Rev sequence from PMID:29547119, purchased from Millipore Sigma round one amplicon PCR primer primer sequence: 5’-CACTCTTTCCCTACACGACG
CTCTTCCGATCTNNNNNN
TTTTGTTGTAGTCACCAGACTG-3’
Sequence-based reagent D701 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-CAAGCAGAAGACGG
CATACGAGATcgagtaatGTG
ACTGGAGTTCAGACGTG-3'
Sequence-based reagent D702 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-CAAGCAGAAGA
CGGCATACGAGATtctccgga
GTGACTGGAGTTCAGACGTG-3'
Sequence-based reagent D703 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-CAAGCAGAAGA
CGGCATACGAGATaatgagcg
GTGACTGGAGTTCAGACGTG-3'
Sequence-based reagent D704 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-CAAGCAGAAGAC
GGCATACGAGATggaatctcG
TGACTGGAGTTCAGACGTG-3'
Sequence-based reagent D705 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-CAAGCAGA
AGACGGCATACGAGA
TttctgaatGTGACTGGAGT
TCAGACGTG-3'
Sequence-based reagent D706 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-CAAGCAGAAGA
CGGCATACGAGATacgaattc
GTGACTGGAGTTCAGACGTG-3'
Sequence-based reagent D707 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-CAAGCAGAAG
ACGGCATACGAGATagcttcag
GTGACTGGAGTTCAGACGTG-3'
Sequence-based reagent D708 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-CAAGCAGAAGACG
GCATACGAGATgcgcattaGT
GACTGGAGTTCAGACGTG-3'
Sequence-based reagent D709 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-CAAGCAGAAG
ACGGCATACGAGATcatagccg
GTGACTGGAGTTCAGACGTG-3'
Sequence-based reagent D710 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-CAAGCAGA
AGACGGCATACGAGATttcgcgga
GTGACTGGAGTTCAGACGTG-3'
Sequence-based reagent D711 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-CAAGCAGAAGACG
GCATACGAGATgcgcgaga
GTGACTGGAGTTCAGACGTG-3'
Sequence-based reagent D712 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-CAAGCAGAA
GACGGCATACGAGATctatcgctGT
GACTGGAGTTCAGACGTG-3'
Sequence-based reagent D501 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-AATGATACGGCGA
CCACCGAGATCTACACtatagcct
ACACTCTTTCCCTACACGAC-3'
Sequence-based reagent D502 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-AATGATACGGCG
ACCACCGAGATCTACACatagaggc
ACACTCTTTCCCTACACGAC-3'
Sequence-based reagent D503 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-AATGATACGGCGA
CCACCGAGATCTACACcctatcct
ACACTCTTTCCCTACACGAC-3'
Sequence-based reagent D504 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-AATGATACGGCGA
CCACCGAGATCTACACggctctga
ACACTCTTTCCCTACACGAC-3'
Sequence-based reagent D505 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-AATGATACGGC
GACCACCGAGATCTACACaggcgaag
ACACTCTTTCCCTACACGAC-3'
Sequence-based reagent D506 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-AATGATACGG
CGACCACCGAGATCTACACtaatctta
ACACTCTTTCCCTACACGAC-3'
Sequence-based reagent D507 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-AATGATACGGC
GACCACCGAGATCTACACcaggacgt
ACACTCTTTCCCTACACGAC-3'
Sequence-based reagent D508 sequence from Illumina, purchased from Millipore Sigma round two amplicon/indexing PCR primer primer sequence: 5'-AATGATACG
GCGACCACCGAGATCTACAC
gtactgacACACTCTTTCCCTACACGAC-3'
Peptide, recombinant protein Src(KD) this paper, expressed/purified in-house
human c-Src kinase domain (residues 260–528)
Peptide, recombinant protein Fyn(KD) this paper, expressed/purified in-house
human Fyn kinase domain (residues 261–529)
Peptide, recombinant protein Hck(KD) this paper, expressed/purified in-house
human Hck kinase domain (residues 252–520)
Peptide, recombinant protein Abl(KD) this paper, expressed/purified in-house
mouse c-Abl kinase domain (residues 232–502)
Peptide, recombinant protein JAK2 Protein, active Millipore Sigma Millipore Sigma: 14–640 M Active, C-terminal His6-tagged,
recombinant, human JAK2, amino
acids 808-end, expressed by baculo
virus in Sf21 cells, for use in Enzyme Assays.
Peptide, recombinant protein AncSZ(KD) this paper, expressed/purified in-house
AncSZ kinase domain (residues 352–627)
designed by ancestral sequence reconstruction
Peptide, recombinant protein Fer(KD) this paper, expressed/purified in-house
mouse Fer kinase domain (residues 553–823)
Peptide, recombinant protein FGFR1(KD) this paper, expressed/purified in-house
human FGFR1 kinase domain (residues 456–763)
Peptide, recombinant protein FGFR3(KD) this paper, expressed/purified in-house
human FGFR3 kinase domain (residues 449–759)
Peptide, recombinant protein EPHB1(KD) this paper, expressed/purified in-house
human EPHB1 kinase domain (residues 602–896)
Peptide, recombinant protein EPHB2(KD) this paper, expressed/purified in-house
human EPHB2 kinase domain (residues 604–898)
Peptide, recombinant protein MERTK(KD) this paper, expressed/purified in-house
human MERTK kinase domain (residues 570–864)
Peptide, recombinant protein Src(SH2) this paper, expressed/purified in-house
human c-Src SH2 domain (residues 143–250)
Peptide, recombinant protein SHP2(C-SH2) this paper, expressed/purified in-house
human SHP2 C-SH2 domain (residues 105–220)
Peptide, recombinant protein Grb2(SH2) this paper, expressed/purified in-house
human Grb2 SH2 domain (residues 56–152)
Peptide, recombinant protein SHP2(PTP; C459E) this paper, expressed/purified in-house
human full-length SHP2 (residues 1–526; C459E)
Peptide, recombinant protein SHP2(PTP; C459E, D61V) this paper, expressed/purified in-house
human full-length SHP2 (residues 1–526; C459E, D61V)
Peptide, recombinant protein SHP2(PTP; C459E, D61N) this paper, expressed/purified in-house
human full-length SHP2 (residues 1–526; C459E, D61N)
Peptide, recombinant protein SHP2(PTP; C459E, G60V) this paper, expressed/purified in-house
human full-length SHP2 (residues 1–526; C459E, G60V)
Peptide, recombinant protein Src Consensus this paper, synthesized in-house
peptide sequence: Ac-GPDECIYDMFPFKKKG-NH2
Peptide, recombinant protein Src Consensus (P-5C, D+1 G) this paper, synthesized in-house
peptide sequence: Ac-GCDECIYGMFPFRRRG-NH2
Peptide, recombinant protein Abl Consensus this paper, synthesized in-house
peptide sequence: Ac-GPDEPIYAVPPIKKKG-NH2
Peptide, recombinant protein Fer Consensus this paper, synthesized in-house
peptide sequence: Ac-GPDEPIYEWWWIKKKG-NH2
Peptide, recombinant protein EPHB1 Consensus this paper, synthesized in-house
peptide sequence: Ac-GPPEPNYEVIPPKKKG-NH2
Peptide, recombinant protein EPHB2 Consensus this paper, synthesized in-house
peptide sequence: Ac-GPPEPIYEVPPPKKKG-NH2
Peptide, recombinant protein SrcTide (1995) sequence from PMID:7845468, synthesized in-house
peptide sequence: Ac-GAEEEIYGEFEAKKKG-NH2
Peptide, recombinant protein SrcTide (2014) sequence from PMID:25164267, purchased from Synpeptide
peptide sequence: Ac-GAEEEIYGIFGAKKKG-NH2
Peptide, recombinant protein AblTide (2014) sequence from PMID:7845468, synthesized in-house
peptide sequence: Ac-GAPEVIYATPGAKKKG-NH2
Peptide, recombinant protein HRAS_Y64 sequence from PMID:35606422, purchased from Synpeptide
peptide sequence: Ac-AGQEEYSAMRD-NH2
Peptide, recombinant protein HRAS_Y64_E63K sequence from PMID:35606422, purchased from Synpeptide
peptide sequence: Ac-AGQEKYSAMRD-NH2
Peptide, recombinant protein CDK13_Y716_YF this paper, synthesized in-house
peptide sequence: Ac-IGEGTYGQVFK-NH2
Peptide, recombinant protein CDK13_Y716_G717R_YF this paper, synthesized in-house
peptide sequence: Ac-IGEGTYRQVFK-NH2
Peptide, recombinant protein CDK5_Y15 sequence from PMID:35606422, purchased from Synpeptide
peptide sequence: Ac-IGEGTYGTVFK-NH2
Peptide, recombinant protein CDK5_Y15_G16R sequence from PMID:35606422, purchased from Synpeptide
peptide sequence: Ac-IGEGTYRTVFK-NH2
Peptide, recombinant protein PLCG1_Y210 this paper, synthesized in-house
peptide sequence: Ac-SGDITYGQFAQ-NH2
Peptide, recombinant protein PLCG1_Y210_T209N this paper, synthesized in-house
peptide sequence: Ac-SGDINYGQFAQ-NH2
Peptide, recombinant protein GLB1_Y294 this paper, synthesized in-house
peptide sequence: Ac-VASSLYDILAR-NH2
Peptide, recombinant protein GLB1_Y294_L297F this paper, synthesized in-house
peptide sequence: Ac-VASSLYDIFAR-NH2
Peptide, recombinant protein MISP_Y95 this paper, synthesized in-house
peptide sequence: Ac-EGWQVYRLGAR-NH2
Peptide, recombinant protein HLA-DPB1_Y59_F64L_YF this paper, synthesized in-house
peptide sequence: Ac-LERFIYNREEL-NH2
Peptide, recombinant protein PEAK1_Y797 this paper, synthesized in-house
peptide sequence: Ac-SVEELYAIPPD-NH2
Peptide, recombinant protein SIRPA_Y496_P491L this paper, synthesized in-house
peptide sequence: Ac-LFSEYASVQV-NH2
Peptide, recombinant protein HGD_Y166_F169L this paper, synthesized in-house
peptide sequence: Ac-GNLLIYTELGK-NH2
Peptide, recombinant protein ITGA3_Y237_YF this paper, synthesized in-house
peptide sequence: Ac-WDLSEYSFKDP-NH2
Peptide, recombinant protein ITGA3_Y237_S235P_YF this paper, synthesized in-house
peptide sequence: Ac-WDLPEYSFKDP-NH2
Peptide, recombinant protein Src Consensus (C-2S) this paper, synthesized in-house
peptide sequence: Ac-GPDESIYDMFPFKKKG-NH2
Peptide, recombinant protein Src Consensus (C-2P) this paper, synthesized in-house
peptide sequence: Ac-GPDEPIYDMFPFKKKG-NH2
Peptide, recombinant protein ACTA1_Y171_YF this paper, synthesized in-house
peptide sequence: Ac-QPIFEG(pY)ALPHAG-NH2
Peptide, recombinant protein ACTA1_Y171_A172G_YF this paper, synthesized in-house
peptide sequence: Ac-QPIFEG(pY)GLPHAG-NH2
Peptide, recombinant protein ACTB_Y240 this paper, synthesized in-house
peptide sequence: Ac-QSLEKS(pY)ELPDGG-NH2
Peptide, recombinant protein ACTB_Y240_P243L this paper, synthesized in-house
peptide sequence: Ac-QSLEKS(pY)ELLDGG-NH2
Peptide, recombinant protein CCDC39_Y593 this paper, synthesized in-house
peptide sequence: Ac-QRKQQL(pY)TAMEEG-NH2
Peptide, recombinant protein CLIP2_Y972 this paper, synthesized in-house
peptide sequence: Ac-QSDQRR(pY)SLIDRG-NH2
Peptide, recombinant protein CLIP2_Y972_R977P this paper, synthesized in-house
peptide sequence: Ac-QSDQRR(pY)SLIDPG-NH2
Peptide, recombinant protein CBS_Y308 this paper, synthesized in-house
peptide sequence: Ac-QVEGIG(pY)DFIPTG-NH2
Peptide, recombinant protein CBS_Y308_G307S this paper, synthesized in-house
peptide sequence: Ac-QVEGIS(pY)DFIPTG-NH2
Peptide, recombinant protein fluorescently-labeled c-Src-SH2 consensus peptide sequence from PMID:7680959
peptide sequence: FITC-Ahx-GDG(pY)EEISPLLL-NH2; gift from Jeanine Amacher at Western Washignton University
Peptide, recombinant protein Src Consensus (D+1 K) this paper, synthesized in-house
peptide sequence: Ac-GPDECIYKMFPFKKKG-NH2
Peptide, recombinant protein Src Consensus (D1AcK) this paper, synthesized in-house
peptide sequence: Ac-GPDECIY(AcK)MFPFKKKG-NH2
Peptide, recombinant protein Src Consensus (C-2K) this paper, synthesized in-house
peptide sequence: Ac-GPDEKIYDMFPFKKKG-NH2
Peptide, recombinant protein Src Consensus (C-2AcK) this paper, synthesized in-house
peptide sequence: Ac-GPDE(AcK)IYDMFPFKKKG-NH2
Peptide, recombinant protein Abl Consensus (A+1 K) this paper, synthesized in-house
peptide sequence: Ac-GPDEPIYKVPPIKKKG-NH2
Peptide, recombinant protein Abl Consensus (A+1 AcK) this paper, synthesized in-house
peptide sequence: Ac-GPDEPIY(AcK)VPPIKKKG-NH2
Peptide, recombinant protein Abl Consensus (I+5 K) this paper, synthesized in-house
peptide sequence: Ac-GPDEPIYAVPPKKKKG-NH2
Peptide, recombinant protein Abl Consensus (I+5 AcK) this paper, synthesized in-house
peptide sequence: Ac-GPDEPIYAVPP(AcK)KKKG-NH2
Commercial assay or kit MiSeq Reagent Kit v3 (150 cycles) Illumina Illumina:
MS-102–3001

Commercial assay or kit NextSeq 500 Mid-Output v2 Kit (150 cycles) Illumina Illumina:
FC-404–2001

Commercial assay or kit Promega QuantiFluor dsDNA Sample Kit Promega Promega:
E2671

Commercial assay or kit ADP Quest Assay Kit Eurofins Discoverx Eurofins Discoverx:
90–0071

Commercial assay or kit Dynabeads FlowComp Flexi Kit ThermoFisher Scientific ThermoFisher Scientific:
11061D

Chemical compound, drug 4-carboxymethyl phenylalanine (CMF) Millipore Sigma Millipore
Sigma:
ENA423210770

Chemical compound, drug 4-azido-L-phenylalanine (AzF) Chem-Impex International Chem-Impex:
06162

Chemical compound, drug N-ε-Acetyl-L-Lysine (AcK) MP Biomedicals MP
Biomedicals:
02150235.2

Chemical compound, drug Click-iT sDIBO -Alexa fluor 555 ThermoFisher Thermo: C20021
Other Creatine Phosphokinase from rabbit muscle Millipore Sigma Millipore Sigma:
C3755-500UN
purified enzyme extracted from rabbit muscle
Software, algorithm FLASH (version FLASH2-2.2.00) PMID:21903629
https://ccb.jhu.edu/software/FLASH/
Software, algorithm Cutadapt (version 3.5) DOI:10.14806/ej.17.1.200
https://cutadapt.readthedocs.io/en/stable/
Software, algorithm Python scripts for processing and analysis of adeep sequencing data this paper (Li et al., 2023)
https://github.com/nshahlab/2022_Li-et-al_peptide-display
Software, algorithm Logomaker PMID:31821414
https://logomaker.readthedocs.io/en/latest/index.html

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Neel H Shah, Email: neel.shah@columbia.edu.

Tony Hunter, Salk Institute for Biological Studies, United States.

Jonathan A Cooper, Fred Hutchinson Cancer Center, United States.

Funding Information

This paper was supported by the following grants:

  • National Institute of General Medical Sciences R35GM138014 to Neel H Shah.

  • Damon Runyon Cancer Research Foundation DFS 31-18 to Neel H Shah.

Additional information

Competing interests

No competing interests declared.

No competing interests declared.

Author contributions

Conceptualization, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing – review and editing.

Conceptualization, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – review and editing.

Conceptualization, Validation, Investigation, Methodology, Writing – review and editing.

Validation, Investigation, Methodology, Writing – review and editing.

Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Visualization, Methodology, Writing - original draft, Project administration, Writing – review and editing.

Additional files

MDAR checklist

Data availability

All of the processed data from the high-throughput specificity screens are provided as source data files. The raw fastq and fasta sequencing files are available as a Dryad repository (DOI:https://doi.org/10.5061/dryad.0zpc86727). Custom code used to process/analyze screening data can be found in a GitHub repository (copy archived at Li et al., 2023) as specified in the manuscript.

The following dataset was generated:

Li A, Voleti R, Lee M, Gagoski D, Shah NH. 2023. Data from: High-throughput profiling of sequence recognition by tyrosine kinases and SH2 domains using bacterial peptide display. Dryad Digital Repository.

References

  1. Albanese SK, Parton DL, Işık M, Rodríguez-Laureano L, Hanson SM, Behr JM, Gradia S, Jeans C, Levinson NM, Seeliger MA, Chodera JD. An open library of human kinase domain constructs for automated bacterial expression. Biochemistry. 2018;57:4675–4689. doi: 10.1021/acs.biochem.7b01081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alexander J, Lim D, Joughin BA, Hegemann B, Hutchins JRA, Ehrenberger T, Ivins F, Sessa F, Hudecz O, Nigg EA, Fry AM, Musacchio A, Stukenberg PT, Mechtler K, Peters JM, Smerdon SJ, Yaffe MB. Spatial exclusivity combined with positive and negative selection of phosphorylation motifs is the basis for context-dependent mitotic signaling. Science Signaling. 2011;4:ra42. doi: 10.1126/scisignal.2001796. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Alfaro-Lopez J, Yuan W, Phan BC, Kamath J, Lou Q, Lam KS, Hruby VJ. Discovery of a novel series of potent and selective substrate-based inhibitors of p60c-src protein tyrosine kinase: conformational and topographical constraints in peptide design. Journal of Medicinal Chemistry. 1998;41:2252–2260. doi: 10.1021/jm9707885. [DOI] [PubMed] [Google Scholar]
  4. Amanchy R, Zhong J, Molina H, Chaerkady R, Iwahori A, Kalume DE, Grønborg M, Joore J, Cope L, Pandey A. Identification of c-Src tyrosine kinase substrates using mass spectrometry and peptide microarrays. Journal of Proteome Research. 2008;7:3900–3910. doi: 10.1021/pr800198w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Amiram M, Haimovich AD, Fan C, Wang YS, Aerni HR, Ntai I, Moonan DW, Ma NJ, Rovner AJ, Hong SH, Kelleher NL, Goodman AL, Jewett MC, Söll D, Rinehart J, Isaacs FJ. Evolution of translation machinery in recoded bacteria enables multi-site incorporation of nonstandard amino acids. Nature Biotechnology. 2015;33:1272–1279. doi: 10.1038/nbt.3372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Barber KW, Miller CJ, Jun JW, Lou HJ, Turk BE, Rinehart J. Kinase substrate profiling using a proteome-wide serine-oriented human peptide library. Biochemistry. 2018;57:4717–4725. doi: 10.1021/acs.biochem.8b00410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Begley MJ, Yun C, Gewinner CA, Asara JM, Johnson JL, Coyle AJ, Eck MJ, Apostolou I, Cantley LC. EGF-receptor specificity for phosphotyrosine-primed substrates provides signal integration with src. Nature Structural & Molecular Biology. 2015;22:983–990. doi: 10.1038/nsmb.3117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bose R, Holbert MA, Pickin KA, Cole PA. Protein tyrosine kinase-substrate interactions. Current Opinion in Structural Biology. 2006;16:668–675. doi: 10.1016/j.sbi.2006.10.012. [DOI] [PubMed] [Google Scholar]
  9. Bradley D, Viéitez C, Rajeeve V, Selkrig J, Cutillas PR, Beltrao P. Sequence and structure-based analysis of specificity determinants in eukaryotic protein kinases. Cell Reports. 2021;34:108602. doi: 10.1016/j.celrep.2020.108602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cantor AJ, Shah NH, Kuriyan J. Deep mutational analysis reveals functional trade-offs in the sequences of EGFR autophosphorylation sites. PNAS. 2018;115:E7303–E7312. doi: 10.1073/pnas.1803598115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chapelat J, Berst F, Marzinzik AL, Moebitz H, Drueckes P, Trappe J, Fabbro D, Seebach D. The substrate-activity-screening methodology applied to receptor tyrosine kinases: a proof-of-concept study. European Journal of Medicinal Chemistry. 2012;57:1–9. doi: 10.1016/j.ejmech.2012.08.038. [DOI] [PubMed] [Google Scholar]
  12. Chatterjee A, Sun SB, Furman JL, Xiao H, Schultz PG. A versatile platform for single- and multiple-unnatural amino acid mutagenesis in Escherichia coli. Biochemistry. 2013;52:1828–1837. doi: 10.1021/bi4000244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Chou MF, Prisic S, Lubner JM, Church GM, Husson RN, Schwartz D. Using bacteria to determine protein kinase specificity and predict target substrates. PLOS ONE. 2012;7:e52747. doi: 10.1371/journal.pone.0052747. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Corwin T, Woodsmith J, Apelt F, Fontaine JF, Meierhofer D, Helmuth J, Grossmann A, Andrade-Navarro MA, Ballif BA, Stelzl U. Defining human tyrosine kinase phosphorylation networks using yeast as an in vivo model substrate. Cell Systems. 2017;5:128–139. doi: 10.1016/j.cels.2017.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Creixell P, Palmeri A, Miller CJ, Lou HJ, Santini CC, Nielsen M, Turk BE, Linding R. Unmasking determinants of specificity in the human kinome. Cell. 2015a;163:187–201. doi: 10.1016/j.cell.2015.08.057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Creixell P, Schoof EM, Simpson CD, Longden J, Miller CJ, Lou HJ, Perryman L, Cox TR, Zivanovic N, Palmeri A, Wesolowska-Andersen A, Helmer-Citterich M, Ferkinghoff-Borg J, Itamochi H, Bodenmiller B, Erler JT, Turk BE, Linding R. Kinome-wide decoding of network-attacking mutations rewiring cancer signaling. Cell. 2015b;163:202–217. doi: 10.1016/j.cell.2015.08.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Cujec TP, Medeiros PF, Hammond P, Rise C, Kreider BL. Selection of v-abl tyrosine kinase substrate sequences from randomized peptide and cellular proteomic libraries using mrna display. Chemistry & Biology. 2002;9:253–264. doi: 10.1016/s1074-5521(02)00098-4. [DOI] [PubMed] [Google Scholar]
  18. Cunningham JM, Koytiger G, Sorger PK, AlQuraishi M. Biophysical prediction of protein-peptide interactions and signaling networks using machine learning. Nature Methods. 2020;17:175–183. doi: 10.1038/s41592-019-0687-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Cushing PR, Fellows A, Villone D, Boisguérin P, Madden DR. The relative binding affinities of PDZ partners for CFTR: a biochemical basis for efficient endocytic recycling. Biochemistry. 2008;47:10084–10098. doi: 10.1021/bi8003928. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Deng Y, Alicea-Velázquez NL, Bannwarth L, Lehtonen SI, Boggon TJ, Cheng HC, Hytönen VP, Turk BE. Global analysis of human nonreceptor tyrosine kinase specificity using high-density peptide microarrays. Journal of Proteome Research. 2014;13:4339–4346. doi: 10.1021/pr500503q. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Dente L, Vetriani C, Zucconi A, Pelicci G, Lanfrancone L, Pelicci PG, Cesareni G. Modified phage peptide libraries as a tool to study specificity of phosphorylation and recognition of tyrosine containing peptides. Journal of Molecular Biology. 1997;269:694–703. doi: 10.1006/jmbi.1997.1073. [DOI] [PubMed] [Google Scholar]
  22. Douglass J, Gunaratne R, Bradford D, Saeed F, Hoffert JD, Steinbach PJ, Knepper MA, Pisitkun T. Identifying protein kinase target preferences using mass spectrometry. American Journal of Physiology. Cell Physiology. 2012;303:C715–C727. doi: 10.1152/ajpcell.00166.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Encinas M, Crowder RJ, Milbrandt J, Johnson EM. Tyrosine 981, a novel RET autophosphorylation site, binds c-Src to mediate neuronal survival. The Journal of Biological Chemistry. 2004;279:18262–18269. doi: 10.1074/jbc.M400505200. [DOI] [PubMed] [Google Scholar]
  24. Filippakopoulos P, Kofler M, Hantschel O, Gish GD, Grebien F, Salah E, Neudecker P, Kay LE, Turk BE, Superti-Furga G, Pawson T, Knapp S. Structural coupling of SH2-kinase domains links FeS and Abl substrate recognition and kinase activation. Cell. 2008;134:793–803. doi: 10.1016/j.cell.2008.07.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Finneran P, Soucheray M, Wilson C, Otten R, Buosi V, Krogan NJ, Swaney DL, Theobald DL, Kern D. Bimodal Evolution of Src and Abl Kinase Substrate Specificity Revealed Using Mammalian Cell Extract as Substrate Pool. bioRxiv. 2020 doi: 10.1101/2020.08.12.248104. [DOI]
  26. Gillette MA, Satpathy S, Cao S, Dhanasekaran SM, Vasaikar SV, Krug K, Petralia F, Li Y, Liang WW, Reva B, Krek A, Ji J, Song X, Liu W, Hong R, Yao L, Blumenberg L, Savage SR, Wendl MC, Wen B, Li K, Tang LC, MacMullan MA, Avanessian SC, Kane MH, Newton CJ, Cornwell M, Kothadia RB, Ma W, Yoo S, Mannan R, Vats P, Kumar-Sinha C, Kawaler EA, Omelchenko T, Colaprico A, Geffen Y, Maruvka YE, da Veiga Leprevost F, Wiznerowicz M, Gümüş ZH, Veluswamy RR, Hostetter G, Heiman DI, Wyczalkowski MA, Hiltke T, Mesri M, Kinsinger CR, Boja ES, Omenn GS, Chinnaiyan AM, Rodriguez H, Li QK, Jewell SD, Thiagarajan M, Getz G, Zhang B, Fenyö D, Ruggles KV, Cieslik MP, Robles AI, Clauser KR, Govindan R, Wang P, Nesvizhskii AI, Ding L, Mani DR, Carr SA, Clinical Proteomic Tumor Analysis Consortium Proteogenomic characterization reveals therapeutic vulnerabilities in lung adenocarcinoma. Cell. 2020;182:200–225. doi: 10.1016/j.cell.2020.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Gram H, Schmitz R, Zuber JF, Baumann G. Identification of phosphopeptide ligands for the Src-homology 2 (SH2) domain of Grb2 by phage display. European Journal of Biochemistry. 1997;246:633–637. doi: 10.1111/j.1432-1033.1997.00633.x. [DOI] [PubMed] [Google Scholar]
  28. Gräslund S, Savitsky P, Müller-Knapp S. In vivo biotinylation of antigens in E. coli. Methods in Molecular Biology. 2017;1586:337–344. doi: 10.1007/978-1-4939-6887-9_22. [DOI] [PubMed] [Google Scholar]
  29. Henriques ST, Thorstholm L, Huang YH, Getz JA, Daugherty PS, Craik DJ. A novel quantitative kinase assay using bacterial surface display and flow cytometry. PLOS ONE. 2013;8:e80474. doi: 10.1371/journal.pone.0080474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Hobbs HT, Shah NH, Shoemaker SR, Amacher JF, Marqusee S, Kuriyan J. Saturation mutagenesis of a predicted ancestral syk-family kinase. Protein Science. 2022;31:e4411. doi: 10.1002/pro.4411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Hornbeck PV, Zhang B, Murray B, Kornhauser JM, Latham V, Skrzypek E. PhosphoSitePlus, 2014: mutations, ptms and recalibrations. Nucleic Acids Research. 2015;43:D512–D520. doi: 10.1093/nar/gku1267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Hornbeck PV, Kornhauser JM, Latham V, Murray B, Nandhikonda V, Nord A, Skrzypek E, Wheeler T, Zhang B, Gnad F. 15 years of phosphositeplus: integrating post-translationally modified sites, disease variants and isoforms. Nucleic Acids Research. 2019;47:D433–D441. doi: 10.1093/nar/gky1159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Huang H, Li L, Wu C, Schibli D, Colwill K, Ma S, Li C, Roy P, Ho K, Songyang Z, Pawson T, Gao Y, Li SSC. Defining the specificity space of the human Src homology 2 domain. Molecular & Cellular Proteomics. 2008;7:768–784. doi: 10.1074/mcp.M700312-MCP200. [DOI] [PubMed] [Google Scholar]
  34. Hutti JE, Jarrell ET, Chang JD, Abbott DW, Storz P, Toker A, Cantley LC, Turk BE. A rapid method for determining protein kinase phosphorylation specificity. Nature Methods. 2004;1:27–29. doi: 10.1038/nmeth708. [DOI] [PubMed] [Google Scholar]
  35. Imamura H, Sugiyama N, Wakabayashi M, Ishihama Y. Large-Scale identification of phosphorylation sites for profiling protein kinase selectivity. Journal of Proteome Research. 2014;13:3410–3419. doi: 10.1021/pr500319y. [DOI] [PubMed] [Google Scholar]
  36. Imhof D, Wavreille AS, May A, Zacharias M, Tridandapani S, Pei D. Sequence specificity of SHP-1 and SHP-2 src homology 2 domains. critical roles of residues beyond the py+3 position. The Journal of Biological Chemistry. 2006;281:20271–20282. doi: 10.1074/jbc.M601047200. [DOI] [PubMed] [Google Scholar]
  37. Johnson JL, Yaron TM, Huntsman EM, Kerelsky A, Song J, Regev A, Lin T-Y, Liberatore K, Cizin DM, Cohen BM, Vasan N, Ma Y, Krismer K, Robles JT, van de Kooij B, van Vlimmeren AE, Andrée-Busch N, Käufer NF, Dorovkov MV, Ryazanov AG, Takagi Y, Kastenhuber ER, Goncalves MD, Hopkins BD, Elemento O, Taatjes DJ, Maucuer A, Yamashita A, Degterev A, Uduman M, Lu J, Landry SD, Zhang B, Cossentino I, Linding R, Blenis J, Hornbeck PV, Turk BE, Yaffe MB, Cantley LC. An atlas of substrate specificities for the human serine/threonine kinome. Nature. 2023;613:759–766. doi: 10.1038/s41586-022-05575-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Jones RB, Gordus A, Krall JA, MacBeath G. A quantitative protein interaction network for the erbb receptors using protein microarrays. Nature. 2006;439:168–174. doi: 10.1038/nature04177. [DOI] [PubMed] [Google Scholar]
  39. Kaneko T, Huang H, Zhao B, Li L, Liu H, Voss CK, Wu C, Schiller MR, Li SSC. Loops govern SH2 domain specificity by controlling access to binding pockets. Science Signaling. 2010;3:ra34. doi: 10.1126/scisignal.2000796. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Kang X, Kim J, Deng M, John S, Chen H, Wu G, Phan H, Zhang CC. Inhibitory leukocyte immunoglobulin-like receptors: immune checkpoint proteins and tumor sustaining factors. Cell Cycle. 2016;15:25–40. doi: 10.1080/15384101.2015.1121324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Keilhack H, David FS, McGregor M, Cantley LC, Neel BG. Diverse biochemical properties of shp2 mutants. implications for disease phenotypes. The Journal of Biological Chemistry. 2005;280:30984–30993. doi: 10.1074/jbc.M504699200. [DOI] [PubMed] [Google Scholar]
  42. Kessels H, Ward AC, Schumacher TNM. Specificity and affinity motifs for grb2 SH2-ligand interactions. PNAS. 2002;99:8524–8529. doi: 10.1073/pnas.142224499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Kettenbach AN, Wang T, Faherty BK, Madden DR, Knapp S, Bailey-Kellogg C, Gerber SA. Rapid determination of multiple linear kinase substrate motifs by mass spectrometry. Chemistry & Biology. 2012;19:608–618. doi: 10.1016/j.chembiol.2012.04.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Koytiger G, Kaushansky A, Gordus A, Rush J, Sorger PK, MacBeath G. Phosphotyrosine signaling proteins that drive oncogenesis tend to be highly interconnected. Molecular & Cellular Proteomics. 2013;12:1204–1213. doi: 10.1074/mcp.M112.025858. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Krassowski M, Paczkowska M, Cullion K, Huang T, Dzneladze I, Ouellette BFF, Yamada JT, Fradet-Turcotte A, Reimand J. ActiveDriverDB: human disease mutations and genome variation in post-translational modification sites of proteins. Nucleic Acids Research. 2018;46:D901–D910. doi: 10.1093/nar/gkx973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Kundu K, Costa F, Huber M, Reth M, Backofen R. Semi-supervised prediction of SH2-peptide interactions from imbalanced high-throughput data. PLOS ONE. 2013;8:e62732. doi: 10.1371/journal.pone.0062732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Jang W, Karapetyan K, Katz K, Liu C, Maddipatla Z, Malheiro A, McDaniel K, Ovetsky M, Riley G, Zhou G, Holmes JB, Kattman BL, Maglott DR. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Research. 2018;46:D1062–D1067. doi: 10.1093/nar/gkx1153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Li A, Voleti R, Lee M, Gagoski D, Shah NH. 2022_li-et-al_peptide-display. swh:1:rev:c82bb91c9c02040831a5583176d1586d4b158b79Software Heritage. 2023 https://archive.softwareheritage.org/swh:1:dir:a6e64c6b113a0131ea3274472ebd9d57594b69d8;origin=https://github.com/nshahlab/2022_Li-et-al_peptide-display;visit=swh:1:snp:bcca3e9f15b71dbf069b4ecadfe2ca33b0e25818;anchor=swh:1:rev:c82bb91c9c02040831a5583176d1586d4b158b79
  49. Lim WA, Pawson T. Phosphotyrosine signaling: evolving a new cellular communication system. Cell. 2010;142:661–667. doi: 10.1016/j.cell.2010.08.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Lin W, Mehta S, Zhang J. Genetically encoded fluorescent biosensors illuminate kinase signaling in cancer. The Journal of Biological Chemistry. 2019;294:14814–14822. doi: 10.1074/jbc.REV119.006177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Liu X, Brodeur SR, Gish G, Songyang Z, Cantley LC, Laudano AP, Pawson T. Regulation of c-Src tyrosine kinase activity by the Src SH2 domain. Oncogene. 1993;8:1119–1126. [PubMed] [Google Scholar]
  52. Liu H, Huang H, Voss C, Kaneko T, Qin WT, Sidhu S, Li SSC. Surface loops in a single SH2 domain are capable of encoding the spectrum of specificity of the SH2 family. Molecular & Cellular Proteomics. 2019;18:372–382. doi: 10.1074/mcp.RA118.001123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Lo WL, Shah NH, Rubin SA, Zhang W, Horkova V, Fallahee IR, Stepanek O, Zon LI, Kuriyan J, Weiss A. Slow phosphorylation of a tyrosine residue in LAT optimizes T cell ligand discrimination. Nature Immunology. 2019;20:1481–1493. doi: 10.1038/s41590-019-0502-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Lu Y, Yu Q, Liu JH, Zhang J, Wang H, Koul D, McMurray JS, Fang X, Yung WKA, Siminovitch KA, Mills GB. Src family protein-tyrosine kinases alter the function of PTEN to regulate phosphatidylinositol 3-kinase/Akt cascades. The Journal of Biological Chemistry. 2003;278:40057–40066. doi: 10.1074/jbc.M303621200. [DOI] [PubMed] [Google Scholar]
  55. Lubner JM, Balsbaugh JL, Church GM, Chou MF, Schwartz D. Characterizing protein kinase substrate specificity using the proteomic peptide library (propel) approach. Current Protocols in Chemical Biology. 2018;10:e38. doi: 10.1002/cpch.38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Lundby A, Franciosa G, Emdal KB, Refsgaard JC, Gnosa SP, Bekker-Jensen DB, Secher A, Maurya SR, Paul I, Mendez BL, Kelstrup CD, Francavilla C, Kveiborg M, Montoya G, Jensen LJ, Olsen JV. Oncogenic mutations rewire signaling pathways by switching protein recruitment to phosphotyrosine sites. Cell. 2019;179:543–560. doi: 10.1016/j.cell.2019.09.008. [DOI] [PubMed] [Google Scholar]
  57. Magoč T, Salzberg SL. Flash: fast length adjustment of short reads to improve genome assemblies. Bioinformatics. 2011;27:2957–2963. doi: 10.1093/bioinformatics/btr507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Marholz LJ, Zeringo NA, Lou HJ, Turk BE, Parker LL. In silico design and in vitro characterization of universal tyrosine kinase peptide substrates. Biochemistry. 2018;57:1847–1851. doi: 10.1021/acs.biochem.8b00044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.Journal. 2011;17:10. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
  60. Miller ML, Jensen LJ, Diella F, Jørgensen C, Tinti M, Li L, Hsiung M, Parker SA, Bordeaux J, Sicheritz-Ponten T, Olhovsky M, Pasculescu A, Alexander J, Knapp S, Blom N, Bork P, Li S, Cesareni G, Pawson T, Turk BE, Yaffe MB, Brunak S, Linding R. Linear motif atlas for phosphorylation-dependent signaling. Science Signaling. 2008;1:ra2. doi: 10.1126/scisignal.1159433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Miller CJ, Turk BE. Homing in: mechanisms of substrate targeting by protein kinases. Trends in Biochemical Sciences. 2018;43:380–394. doi: 10.1016/j.tibs.2018.02.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Mok J, Im H, Snyder M. Global identification of protein kinase substrates by protein microarray analysis. Nature Protocols. 2009;4:1820–1827. doi: 10.1038/nprot.2009.194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Mok J, Kim PM, Lam HYK, Piccirillo S, Zhou X, Jeschke GR, Sheridan DL, Parker SA, Desai V, Jwa M, Cameroni E, Niu H, Good M, Remenyi A, Ma JLN, Sheu YJ, Sassi HE, Sopko R, Chan CSM, De Virgilio C, Hollingsworth NM, Lim WA, Stern DF, Stillman B, Andrews BJ, Gerstein MB, Snyder M, Turk BE. Deciphering protein kinase specificity through large-scale analysis of yeast phosphorylation site motifs. Science Signaling. 2010;3:ra12. doi: 10.1126/scisignal.2000482. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Morimatsu M, Takagi H, Ota KG, Iwamoto R, Yanagida T, Sako Y. Multiple-state reactions between the epidermal growth factor receptor and grb2 as observed by using single-molecule analysis. PNAS. 2007;104:18013–18018. doi: 10.1073/pnas.0701330104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Ning X, Guo J, Wolfert MA, Boons GJ. Visualizing metabolically labeled glycoconjugates of living cells by copper-free and fast huisgen cycloadditions. Angewandte Chemie. 2008;120:2285–2287. doi: 10.1002/ange.200705456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Obenauer JC, Cantley LC, Yaffe MB. Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Research. 2003;31:3635–3641. doi: 10.1093/nar/gkg584. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Oh D, Ogiue-Ikeda M, Jadwin JA, Machida K, Mayer BJ, Yu J. Fast rebinding increases dwell time of src homology 2 (SH2) -containing proteins near the plasma membrane. PNAS. 2012;109:14024–14029. doi: 10.1073/pnas.1203397109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Parker BL, Shepherd NE, Trefely S, Hoffman NJ, White MY, Engholm-Keller K, Hambly BD, Larsen MR, James DE, Cordwell SJ. Structural basis for phosphorylation and lysine acetylation cross-talk in a kinase motif associated with myocardial ischemia and cardioprotection. The Journal of Biological Chemistry. 2014;289:25890–25906. doi: 10.1074/jbc.M114.556035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Pawson T, Nash P. Protein-Protein interactions define specificity in signal transduction. Genes & Development. 2000;14:1027–1047. [PubMed] [Google Scholar]
  70. Pfeiffer A, Franciosa G, Locard-Paulet M, Piga I, Reckzeh K, Vemulapalli V, Blacklow SC, Theilgaard-Mönch K, Jensen LJ, Olsen JV. Phosphorylation of SHP2 at tyr62 enables acquired resistance to SHP2 allosteric inhibitors in FLT3-ITD-driven AML. Cancer Research. 2022;82:2141–2155. doi: 10.1158/0008-5472.CAN-21-0548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Rahuel J, Gay B, Erdmann D, Strauss A, Garcia-Echeverría C, Furet P, Caravatti G, Fretz H, Schoepfer J, Grütter MG. Structural basis for specificity of Grb2-SH2 revealed by a novel ligand binding mode. Nature Structural Biology. 1996;3:586–589. doi: 10.1038/nsb0796-586. [DOI] [PubMed] [Google Scholar]
  72. Ren L, Chen X, Luechapanichkul R, Selner NG, Meyer TM, Wavreille AS, Chan R, Iorio C, Zhou X, Neel BG, Pei D. Substrate specificity of protein tyrosine phosphatases 1B, RPTPα, SHP-1, and SHP-2. Biochemistry. 2011;50:2339–2356. doi: 10.1021/bi1014453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Rice JJ, Daugherty PS. Directed evolution of a biterminal bacterial display scaffold enhances the display of diverse peptides. Protein Engineering, Design & Selection. 2008;21:435–442. doi: 10.1093/protein/gzn020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Rube HT, Rastogi C, Feng S, Kribelbauer JF, Li A, Becerra B, Melo LAN, Do BV, Li X, Adam HH, Shah NH, Mann RS, Bussemaker HJ. Prediction of protein-ligand binding affinity from sequencing data with interpretable machine learning. Nature Biotechnology. 2022;40:1520–1527. doi: 10.1038/s41587-022-01307-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Rust HL, Thompson PR. Kinase consensus sequences: a breeding ground for crosstalk. ACS Chemical Biology. 2011;6:881–892. doi: 10.1021/cb200171d. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Schutkowski M, Reimer U, Panse S, Dong L, Lizcano JM, Alessi DR, Schneider-Mergener J. High-content peptide microarrays for deciphering kinase specificity and biology. Angewandte Chemie. 2004;43:2671–2674. doi: 10.1002/anie.200453900. [DOI] [PubMed] [Google Scholar]
  77. Shah NH, Wang Q, Yan Q, Karandur D, Kadlecek TA, Fallahee IR, Russ WP, Ranganathan R, Weiss A, Kuriyan J. An electrostatic selection mechanism controls sequential kinase signaling downstream of the T cell receptor. eLife. 2016;5:e20105. doi: 10.7554/eLife.20105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Shah NH, Löbel M, Weiss A, Kuriyan J. Fine-Tuning of substrate preferences of the Src-family kinase Lck revealed through a high-throughput specificity screen. eLife. 2018;7:e35190. doi: 10.7554/eLife.35190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Songyang Z, Shoelson SE, Chaudhuri M, Gish G, Pawson T, Haser WG, King F, Roberts T, Ratnofsky S, Lechleider RJ, Neel BG, Birge RB, Fajardo JE, Chou MM, Hanafusa H, Schaffhausen B, Cantley LC. Sh2 domains recognize specific phosphopeptide sequences. Cell. 1993;72:767–778. doi: 10.1016/0092-8674(93)90404-e. [DOI] [PubMed] [Google Scholar]
  80. Songyang Z, Shoelson SE, McGlade J, Olivier P, Pawson T, Bustelo XR, Barbacid M, Sabe H, Hanafusa H, Yi T. Specific motifs recognized by the SH2 domains of Csk, 3BP2, fps/fes, GRB-2, HCP, Shc, Syk, and Vav. Molecular and Cellular Biology. 1994;14:2777–2785. doi: 10.1128/mcb.14.4.2777-2785.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Songyang Z, Carraway KL, Eck MJ, Harrison SC, Feldman RA, Mohammadi M, Schlessinger J, Hubbard SR, Smith DP, Eng C. Catalytic specificity of protein-tyrosine kinases is critical for selective signalling. Nature. 1995;373:536–539. doi: 10.1038/373536a0. [DOI] [PubMed] [Google Scholar]
  82. Stein A, Fowler DM, Hartmann-Petersen R, Lindorff-Larsen K. Biophysical and mechanistic models for disease-causing protein variants. Trends in Biochemical Sciences. 2019;44:575–588. doi: 10.1016/j.tibs.2019.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Sugiyama N, Imamura H, Ishihama Y. Large-scale discovery of substrates of the human kinome. Scientific Reports. 2019;9:10503. doi: 10.1038/s41598-019-46385-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Sweeney MC, Wavreille AS, Park J, Butchar JP, Tridandapani S, Pei D. Decoding protein-protein interactions through combinatorial chemistry: sequence specificity of SHP-1, SHP-2, and SHIP SH2 domains. Biochemistry. 2005;44:14932–14947. doi: 10.1021/bi051408h. [DOI] [PubMed] [Google Scholar]
  85. Taft JM, Georgeon S, Allen C, Reckel S, DeSautelle J, Hantschel O, Georgiou G, Iverson BL. Rapid screen for tyrosine kinase inhibitor resistance mutations and substrate specificity. ACS Chemical Biology. 2019;14:1888–1895. doi: 10.1021/acschembio.9b00283. [DOI] [PubMed] [Google Scholar]
  86. Tartaglia M, Martinelli S, Stella L, Bocchinfuso G, Flex E, Cordeddu V, Zampino G, Burgt I, Palleschi A, Petrucci TC, Sorcini M, Schoch C, Foa R, Emanuel PD, Gelb BD. Diversity and functional consequences of germline and somatic PTPN11 mutations in human disease. American Journal of Human Genetics. 2006;78:279–290. doi: 10.1086/499925. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Tian H, Naganathan S, Kazmi MA, Schwartz TW, Sakmar TP, Huber T. Bioorthogonal fluorescent labeling of functional G-protein-coupled receptors. Chembiochem. 2014;15:1820–1829. doi: 10.1002/cbic.201402193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Till JH, Annan RS, Carr SA, Miller WT. Use of synthetic peptide libraries and phosphopeptide-selective mass spectrometry to probe protein kinase substrate specificity. The Journal of Biological Chemistry. 1994;269:7423–7428. doi: 10.1016/S0021-9258(17)37302-7. [DOI] [PubMed] [Google Scholar]
  89. Till JH, Chan PM, Miller WT. Engineering the substrate specificity of the Abl tyrosine kinase. The Journal of Biological Chemistry. 1999;274:4995–5003. doi: 10.1074/jbc.274.8.4995. [DOI] [PubMed] [Google Scholar]
  90. Tinti M, Kiemer L, Costa S, Miller ML, Sacco F, Olsen JV, Carducci M, Paoluzi S, Langone F, Workman CT, Blom N, Machida K, Thompson CM, Schutkowski M, Brunak S, Mann M, Mayer BJ, Castagnoli L, Cesareni G. The SH2 domain interaction landscape. Cell Reports. 2013;3:1293–1305. doi: 10.1016/j.celrep.2013.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Trinh TB, Xiao Q, Pei D. Profiling the substrate specificity of protein kinases by on-bead screening of peptide libraries. Biochemistry. 2013;52:5645–5655. doi: 10.1021/bi4008947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Uttamchandani M, Chan EWS, Chen GYJ, Yao SQ. Combinatorial peptide microarrays for the rapid determination of kinase specificity. Bioorganic & Medicinal Chemistry Letters. 2003;13:2997–3000. doi: 10.1016/s0960-894x(03)00633-4. [DOI] [PubMed] [Google Scholar]
  93. Wang X, Wei X, Yuan Y, Sun Q, Zhan J, Zhang J, Tang Y, Li F, Ding L, Ye Q, Zhang H. Src-Mediated phosphorylation converts FHL1 from tumor suppressor to tumor promoter. The Journal of Cell Biology. 2018;217:1335–1351. doi: 10.1083/jcb.201708064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Wavreille AS, Garaud M, Zhang Y, Pei D. Defining SH2 domain and PTP specificity by screening combinatorial peptide libraries. Methods. 2007;42:207–219. doi: 10.1016/j.ymeth.2007.02.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Xie J, Supekova L, Schultz PG. A genetically encoded metabolically stable analogue of phosphotyrosine in Escherichia coli. ACS Chemical Biology. 2007;2:474–478. doi: 10.1021/cb700083w. [DOI] [PubMed] [Google Scholar]
  96. Xue L, Wang WH, Iliuk A, Hu L, Galan JA, Yu S, Hans M, Geahlen RL, Tao WA. Sensitive kinase assay linked with phosphoproteomics for identifying direct kinase substrates. PNAS. 2012;109:5615–5620. doi: 10.1073/pnas.1119418109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Yaffe MB, Leparc GG, Lai J, Obata T, Volinia S, Cantley LC. A motif-based profile scanning approach for genome-wide prediction of signaling pathways. Nature Biotechnology. 2001;19:348–353. doi: 10.1038/86737. [DOI] [PubMed] [Google Scholar]
  98. Yeh RH, Lee TR, Lawrence DS. From consensus sequence peptide to high affinity ligand, a library scan strategy. The Journal of Biological Chemistry. 2001;276:12235–12240. doi: 10.1074/jbc.M011232200. [DOI] [PubMed] [Google Scholar]
  99. Zheng Y, Gilgenast MJ, Hauc S, Chatterjee A. Capturing post-translational modification-triggered protein-protein interactions using dual noncanonical amino acid mutagenesis. ACS Chemical Biology. 2018;13:1137–1141. doi: 10.1021/acschembio.8b00021. [DOI] [PMC free article] [PubMed] [Google Scholar]

Editor's evaluation

Tony Hunter 1

This paper reports an improved bacterial surface peptide display technology and its use to survey the primary sequence specificities of a broad range of tyrosine kinases and to assess the effects of naturally-occurring positional variations around sites of tyrosine phosphorylation on the efficiency of phosphorylation. The versatility of this approach was demonstrated by using expanded genetic code technology to investigate the consequences of installing post-translationally modified amino acids, such as acetyl-lysine, at positions upstream and downstream of a target tyrosine on the efficiency of phosphorylation by different tyrosine kinases. In addition, pre-phosphorylated surface peptide display libraries were exploited to interrogate the primary sequence binding specificities of SH2 phosphotyrosine-binding domains.

Decision letter

Editor: Tony Hunter1
Reviewed by: Tony Hunter2

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Decision letter after peer review:

Thank you for sending your article entitled "High-throughput profiling of sequence recognition by tyrosine kinases and SH2 domains using bacterial peptide display" for peer review at eLife. Your article is being evaluated by 4 peer reviewers, including Tony Hunter as the Reviewing Editor and Reviewer #1, and the evaluation is being overseen by Jonathan Cooper as the Senior Editor.

The reviewers were impressed by the new sequence specificities you have obtained for additional tyrosine kinases and SH2 domains by using your improved bacterial peptide surface display technology. However, in their opinion, the technical improvements you describe are not a significant enough advance to warrant publication of the paper as it stands. The reviewers indicate that validation of the biological significance of at least one of your novel findings is required to establish that the specificities obtained by your approach will be useful to the community. For this reason, prior to requesting you to submit a revised version, we ask you to submit a written plan outlining additional experiments that you could do within a reasonable time frame to validate one of your new findings. Each of the reviewers has suggestions for the sort of experiments you might be able to do.

Reviewer #1 (Recommendations for the authors):

Here, the authors report an improved version of the X5-Tyr-X5 peptide bacterial surface display technology, which they had developed previously to determine the primary sequence specificities of the LCK and ZAP70 tyrosine kinases (TKs) and the EGF receptor. Taking advantage of the new protocol they defined the primary sequence specificities of five additional TKs: the c-Src, c-Abl, and Fer nonreceptor TKs, and the EPHB1 and EPHB2 RTKs. Consensus peptide substrates were synthesized for the five TKs using the most favorable amino acid at every position, and their kinetic properties and TK selectivity were compared with those of the SrcTide and AblTide substrate peptides. The consensus peptides had reasonable (10-200 μM Km) affinities and were relatively selective substrates for the c-Src, cAbl, and Fer TKs, respectively. However, although consensus sequences were deduced for EPHB1 and EPHB2, these kinases showed relatively little sequence selectivity. Next, they used their specificity data to predict the consequences of known naturally occurring single amino acid sequence variations – either disease-associated mutations or polymorphisms – in the five residues on either side of c-Src phosphosites on phosphorylation, and compared these predictions with actual experimental data, showing that they could predict relative rates of phosphorylation of the variant peptides with reasonable accuracy. Based on this success, they built a 10,000 member MYC-tagged variant expression library (pTyr-Var), which excluded Tyr in the X5 flanking residues, and tested the ability of the c-Src, Fyn, and Hck SFKs, c-Abl, Fer, JAK2, and AncSZ, an engineered homologue of the SYK/ZAP-70 family members, and five RTKs – EPHB1, EPHB2, FGFR1, FGFR3, and MERTK. They found that the different TKs exhibited distinct patterns of sensitivity to sequence variation at each position around the Tyr, consistent with individual sequence preferences. As an example, the variant R982C RET peptide was strongly preferred by several TKs compared to the Tyr981 reference peptide. They also showed that there were sequence context-dependent effects of mutations proximal to a phosphosite, which were not predicted based on the X5-Y-X5 library results. In addition, they exploited expanded genetic code technology to incorporate CMF, a pTyr analogue, or acetyl-lysine (AcK) residues randomly in the X5-Y-X5 library. They found that unlike Lys, AcK was not only tolerated but preferred in some positions for c-Src phosphorylation, whereas CMF could not replace Phe at preferred positions in c-Src peptide substrates. The authors had previously used pre-phosphorylated peptide surface displays to survey the binding specificity of the GRB2 SH2 domain, and here they extended this to two additional SH2 domains, SHP2-C and c-Src. For this purpose, they pre-phosphorylated the X5-Y-X5 library with a mix of c-Src, c-Abl, AncSZ, and EPHB1 TKs, and then used biotinylated SH2 constructs, tandemized to increase binding avidity, to screen the c-Src, SHP2-C and GRB2 SH2 domains. With the exception of SHP2-C, they found sequence preferences largely concordant with those reported using other approaches. By screening the pTyr-Var library for SH2 binding they also found natural variant phosphosite sequences that exhibited gain of function for SH2 binding that could be of functional significance.

This extension of the authors' previous studies with bacterial surface peptide display technology has provided some additional insights into the primary sequence specificities of the large tyrosine kinase family by examining a broader range of TKs, and by checking the consequences of natural positional variations around sites of Tyr phosphorylation on the efficiency of phosphorylation. They demonstrated the versatility of this approach by using expanded genetic code technology to investigate the consequences of installing post-translationally modified amino acids at specific positions upstream and downstream of the target Tyr on the efficiency of phosphorylation by different TKs. They also exploited pre-phosphorylated peptide display libraries to interrogate the primary sequence specificities of two additional SH2 domains.

One advantage of the surface display method is that in principle it provides individual sequences that are preferred substrates for a TK, but in practice this information is not used, and preferred residues at each position are obtained. The new TK specificities they have determined reveal some interesting motifs, and the use of this method to define the effects of sequence variants in the vicinity of target phosphosites on their phosphorylation or SH2/PTB domain binding will be valuable in predicting possible functional consequences of such variants, for instance in disease. The use of the display method to detect the consequences of posttranslational modifications of amino acids in the vicinity of a phosphosite is also an advance, but it is limited by the availability of cell permeant, stable unnatural amino acid analogues and cognate evolved tRNA charging enzymes

Overall, given their prior publications using the original version of this technology, this paper is a relatively modest technical advance, but the significant amount of new TK and SH2 specificity information they have obtained will no doubt be useful to aficionados of tyrosine phosphorylation signaling systems. Even though this paper was submitted for the Tools and Resources category, it would be strengthened by inclusion of a follow-up experimental analysis of at least one or two instances where a novel specificity was observed to demonstrate its biological relevance.

Points:

1. The potential issues with including Cys in the peptide libraries were not discussed. For instance, a Cys residue in a peptide may be partially oxidized on the surface because of its exposure to oxygen, or, because the Cys is unpaired, disulfide bonds may form between adjacent peptide molecules on the surface. In addition, having multiple Tyr in addition to the central Tyr and using high phosphorylation stoichiometries means that the extra tyrosine in the peptide may be the preferred target.

2. As the authors showed, the method will be useful for studying the influence of neighboring modified amino acids on TK phosphosite selectivity. Indeed, it is already known that pTyr can serve as a positive determinant for TK phosphorylation. Presumably, it was for this reason that the authors tested whether incorporation of CMF, a pTyr mimic, at different positions affected the ability of the c-Src kinase to phosphorylate preferrred sequences. However, CMF is not a very good pTyr mimic, and if the authors wanted to determine possible roles for pTyr itself they should have used one of the recently described methods for incorporating pTyr as an unnatural amino acid into proteins expressed in bacteria (e.g., PMID: 28604693; PMID: 28604697).

3. From a methods perspective, some of the recombinant catalytic domains were preactivated by incubation with ATP, presumably leading to autophosphorylation. The sites phosphorylated on the activation loop of TKs can affect primary sequence specificity. Electrospray MS was used to show these preparations were multiply phosphorylated, but which autophosphorylation sites were occupied in the protein that was used to phosphorylate the library were not determined.

4. It is not clear why the surface display method should be "tuned to select a high Kcat". Does the E. coli DH5a strain used here secrete a nonspecific phosphatase that could act on the phosphorylated peptides; the assay buffer included orthovanadate, but it is not clear whether this would inhibit such a generic phosphatase?

5. The authors' c-Src results showed a strong preference for a +1 Asp/Glu/Ser in addition to the +1 Gly previously reported using oriented peptide libraries. However, in the end the improvements in kinetic parameters for their Src and Abl consensus peptides were relatively modest compared to prior published examples.

6. Figure 2B and Figure 2S1: Src, Abl, EPHB1 and EPHB2 all showed a preference in their display consensus for Pro at +4. While a preference for Pro at +4 is observed in natural Src and Abl substrates, one wonders whether this is a hidden constraint/bias of the method. In addition, the +2 – +4 WWW motif preferred by Fer is curious. Are there any reported (Fer) TK sites with this WWW motif, and does the closely related Fes TK also exhibit this preference? No Trp residues are found in the PhosphoSitePlus motif logo for Fer. Do these Trp residues contribute to the low μM Km for the Fer consensus peptide? Can the WWW peptide be modeled into the active site of the FES TK catalytic domain? Finally, in this regard, it would help the reader if this panel included -5 to +5 numbering under the alignment.

7. The SRC consensus has a preferred Cys at -2, which was not discussed – do the authors know whether this Cys was oxidized on the bacterial surface. If so, it could have provided a partial negative charge and serve a similar purpose to the -2 Glu in SrcTide. How important is the -2 Cys in the synthetic peptide substrates?

8. Figure 2C: The authors' data showed that EPHB1 displayed little selectivity across the consensus peptides and did not even prefer its own cognate consensus sequence. Moreover, although they did not comment on this, the same seems to be true for EPHB2. The basis for this lack of selectivity needs fuller discussion.

9. Figure 4F: The authors did not explain why the inclusion of Cys at +1 in the Ret Y981 peptide made it a better substrate for several of the 13 tested TKs.

10. While the authors showed that unnatural amino acids incorporation is compatible with the surface display method, only those modified amino acids for which a cell-permeant modified amino acid and a cognate tRNA charging enzyme have been developed can readily be used. In contrast, any modified amino acid can be used in the position-oriented peptide display approach.

Reviewer #2 (Recommendations for the authors):

This is a well-designed study addressing important biological issues – what are the specificities of the tyrosine kinases and SH2 domains which work in tandem in cell signal transduction and deregulation of the TK-SH2 signaling axis is associated with a host of diseases, notably cancer. Although these issues have been investigated in previous studies, the current study established a platform that is complementary to previous ones (eg., those based on synthetic peptide libraries) and potentially more quantitative. The application of this approach to known Tyr/pTyr sites with mutations in the flanking residues is novel and provides insights into the biochemical and functional consequences of the mutations. Although the work is comprehensive, and the data presented are of high-quality and supportive of the conclusions in general, there are a few concerns as enumerated below.

1. The authors appear to focus on justifying how their platform identifies the same specificity profiles as previous methods when it is more important to highlight the differences and discuss what they mean. While it is challenging to verify whether the specificity profiles obtained from the current study are closer to the physiological specificity of the TKs/SH2s, it'd be helpful to use the specificity information to predict in vivo substrates and to find out how many known substrates can be identified and how many are missed. After all, the specificity profile is valuable only when it can predict in vivo targets and the effect of mutations on TK/SH2-target interaction (the authors are to be commended for doing a decent job on the latter).

2. A systematic comparison of the context-independent specificity (obtained from the X5-Y-X5 library) with context-dependent effect (obtained from the pTyr-var library) is missing. It would be important to systematically compare the specificity profiles obtained from the two libraries and identify the common and distinct features (and explain why).

3. The platform seems to work better for certain TKs (eg., Src, Abl) than others (eg., EphB1/B2). Is this due to the distinct specificity of these kinases or differences in enzymatic activity? In this regard, how is the activity of the TKs benchmarked?

4. By the same token, what's the affinity of an SH2 domain used in the study for its cognate ligand? It should also be discussed/justified in more detail why the SHP2-C SH2 domain was picked when the tandem SH2 domains in SHP2 often work together in ligand recognition.

5. The manuscript would be significantly improved if one or more novel sites/substrates predicted using the specificity data are validated on the peptide and protein level.

6. The heatmaps (eg., Figure 5-SI5) look similar at a glance. Is there another way to present the data and show the differences in specificity in a more conspicuous manner? Bar graph – which is more tedious- may be a better choice especially for important positions. Grouping residues based on their physiochemical properties may also simplify the profile pattern and make it easier to read and understand.

7. The inclusion of Tyr in the X positions of the X5-Y-X5 library may obscure the specificity patter as Tyr at any position may be phosphorylated. I'd suggest repeating a couple of TKs using a library that contains only a central Tyr.

8. The AA read counts (eg. Figure S1) varied widely, why? Does this have an effect on the screening?

9. Figure 5-SI1, provide correlation coefficient; why is the apparent correlation for Abl poorer compared to EphB1 when the former appears to have a more defined specificity pattern?

10. Figure 6 – SI6 – The orthogonal specificity pattern for the CTK kinase and SH2 domains seems at odds with the established notion that the CTK kinase and SH2 domain specificities are related., eg, the pTyr sites created by the kinase are preferably bound by its own SH2 domain or closely related SH2 domains (Songyang et al. Nature 1995; 373:536-9). This needs to be discussed.

Reviewer #3 (Recommendations for the authors):

This manuscript reports modifications to previously published methods in which the substrate specificity of tyrosine kinases and binding specificity of phosphotyrosine interaction domains is analyzed using a bacterial surface display coupled to next-generation sequencing. This report modifies these methods in that phosphorylated bacteria are selected by magnetic bead immunoprecipitation rather than fluorescence-activated cell sorting (FACS). This allows for larger libraries of higher complexity to be used and for parallel processing to increase throughput. The strength of this manuscript lies in the use of a cutting edge technique offering substantial benefit over other methods. Its main weakness is that the major capabilities of the method are similar to those of the FACS-based approach, and the conclusions drawn here were also reached across the series of papers originally reporting that method. For example, the previous papers reported good correlations between enrichment scores and phosphorylation rates measured in vitro, the capacity to infer changes in phosphorylation rate from single amino acid substitutions, and the context-dependent impact of those substitutions; the orthogonal nature of SH2 binding and kinase specificity observed here has been seen in other studies using different methods. Given the precedent, one would have hoped to see the method applied in a way that new insight is gained to the nature of Tyr kinase specificity in general, or to how a particular kinase signals. One modification to the protocol that had not been previously reported was the use of an amber suppression-based orthogonal translation system allowing incorporation of non-natural or modified amino acids. Libraries incorporating carboxymethyl-Phe (a pTyr mimic) and acetylated lysine residues were generated and screened with the kinase c-Src. This is an interesting extension of the technique, and one could imagine bringing in other known PTMs such as methylated Lys or Arg. However, there is a potential limitation in that each non-native residue requires a separate screen to be performed. Overall, this manuscript employs state-of-the-art screening methods, but its impact is tempered by literature precedent.

Manuscript presentation:

1. The introduction provides an overview of existing methods for determining kinase specificity. However, the authors neglect to mention a couple of approaches that arguably best rival methods employed in the current manuscript from the standpoint of identifying context-dependent selectivity. These include MS-based approaches that use proteome-derived peptide libraries, from either protease digests of cell lysates or genetically encoded libraries (the cited Barber et al. paper is mischaracterized as performing analysis on cell lysates, and these papers were not cited: Kettenbach, et al. PMID: 22633412, Douglass et al. PMID: 22723110, Imamura et al. PMID: 24869485 and Xue, et al. PMID: 22451900). In addition, an approach using yeast surface display in which Tyr kinases are targeted to the secretory pathway deserves mention (Taft et al. PMID: 31339688). The authors could also consider less extensive referencing of historical methods from the 1990s that are no longer used.

Data availability:

2. I could not find the deep sequencing data for kinase selections of the X5-Y-X5 library. Did they apply a cutoff for the number of read counts to be included in their analysis?

Technical points:

3. Two versions of the X5-Y-X5 library were made – one with a strep tag and one with a myc tag. Were both subjected to quality control sequencing as shown in Figure 1 supp 1, and which is shown in the table? Were both of them used for screens?

4. It is not clear that the authors estimated the representation of the X5-Y-X5 library other than from the standpoint of pooled representation of each residue at each position ("1-10 million unique peptide sequences" were observed by sequencing). How many transformants were recovered during library cloning, and how does this compare to the number of sequencing reads used for quality control?

5. Selection of Tyr residues from the X5-Y-X5 library are likely due to "off target" phosphorylation of the non-central residue. This could muddy the specificity analysis as there would contribution of other selected residues at the "wrong" positions. For the Tyr-Var library, the authors avoided this issue by including Tyr to Phe substitutions. Do any of the results for X5-Y-X5 change if they exclude all peptides with Tyr at a non-central position?

6. Based on the concentration bacterial cells used in the kinase reactions, it seems likely that the authors are observing single turnover rather than multiple turnover kinetics. This could explain some of the differences between the results described here and prior analyses with the same kinases. Can the authors estimate the effective substrate concentration in their experiments? Can they comment on the potential impact of single vs multiple turnover kinetics on their results?

7. For Tyr kinases that were analyzed with both libraries, how do the specificity matrices derived from the Tyr-Var data (Figure 4 – Figure supp 5) compare to those from the X5-Y-X5 data (Figure 1B and Figure 2A)?

8. The observation that there is context dependence – the impact of a given amino acid substitution differs depending on the surrounding sequence – is not surprising given the author's previous work. However, the specific example shown here should be interpreted with caution, as undersampling may impact what is construed to be the "average" signal in the X5-Y-X5 library. It would be most convincing if there were examples where the impact of a given amino acid substitution were observed on two different peptide sequences, i.e. where enrichment scores for both the "WT" and variant sequence could be calculated. One would also want to see some verification by in vitro kinase assay that the same substitution caused different effects on reaction rates in the context of two different peptides.

Reviewer #4 (Recommendations for the authors):

In work by Li and colleagues, they utilized phage display technology to characterize phosphorylation site motifs of tyrosine kinases and pY binding motifs of SH2 domains. Building on previously published work by the corresponding author, they have significantly improved and broadened the application of their platform. The experimental approach here involves the external display of peptides on the surfaces of bacteria. These peptides contain tyrosines surrounded by defined amino acid sequences where each bacterium encodes a single sequence. The cells are subjected to in vitro kinase phosphorylation reactions, and the phosphorylated population is examined by deep sequencing to quantify the enrichment of amino acid residues and infer phosphorylation site motifs. In this manuscript, the authors have improved the sampling rate of their platform from <5000 to 1-10 million unique peptide sequences analyzed per experiment, made it more accessible (no longer requiring FACS instrumentation), combined it with amber codon suppression methods to expand the list of amino acids that can be incorporated, and repurposed it to profile SH2 binding.

Overall, this is a very nice extension of the corresponding author's previous work and is likely to make helpful contributions to the tyrosine kinase signaling field. Their manuscript surveys multiple applications of their phage display platform and provides reliable supporting experimental data. Additionally, the authors have identified 50-400 disease-associated mutation variants proximal to tyrosine phosphorylation sites that potentially alter phosphorylation by the kinases examined in this study, and 50-300 that potentially alter binding to the SH2 domains examined, which are all provided as resources. Along the way, they made interesting observations (e.g., mutations on RET and PTEN predicted to enhance tyrosine phosphorylation). Lastly, the authors utilized Amber codon suppression to incorporate non-canonical and PTM amino acids into their displayed peptides. They then showed that the SRC kinase prefers to phosphorylate tyrosine substrates nearby acetylated lysines versus unmodified lysines. This was perhaps to be expected, considering that SRC and many other tyrosine kinases generally disfavor positively charged amino acids, but it also highlights the interesting possibility of this form of PTM crosstalk between metabolism and growth factor signaling.

1. Concerning the peptide design in their random library (sequence: X5-Y-X5, where X = 20 natural amino acids falling within their 'NNS' codon constraint), neighboring tyrosines (TAC codons) theoretically account for ~3% (1/32) of residues at all random positions, meaning that as much as 30% of the peptide pool contains at least two potential tyrosine phosphoacceptors per peptide. For the heatmap motifs presented in Figure 1, Figure 2, Figure 4—figure supplement 5, and Figure 7, we can observe strong enrichment of tyrosines at multiple positions, especially for JAK2 and the RTKs, and it is not certain whether this is due to positional selection (that facilitates phosphorylation of the tyrosine at position zero) or direct phosphorylation of the neighboring tyrosine. This double phosphorylation effect may limit interpretation of the motif heatmaps presented in the figures that assume phosphorylation occurs only at the tyrosine at position zero. Have the authors considered generating separate heatmaps that omit the subset of enriched peptides containing two or more tyrosines? That might reduce background and further improve the quality of the data.

2. Figure 6B: Given that the authors have demonstrated essentially complete phosphorylation of the displayed peptides by their kinase cocktail in Figure 6—figure supplement 2, it seems more appropriate to replace the label above the heatmaps in 6B "position relative to tyrosine" with "position relative to phosphotyrosine". Moreover, this indicates that they have probably phosphorylated most of the neighboring tyrosines in the displayed peptides containing two or more tyrosines. How would the heatmap motifs for the SH2 domains appear if the authors excluded the subset of enriched peptides containing two or more (likely to be phosphorylated) tyrosines?

3. The authors have not included a single sequence logo to represent their phosphorylation site motifs or pY binding motifs. Is there a specific reason for this? Researchers less familiar with these approaches generally have an easier time interpreting sequence logos than heatmaps.

4. An advantage of the authors' platform over alternative approaches is its potential to decipher pairwise interactions between amino acids on substrate peptides. The authors report that, depending on the kinase, 5-15% of all significant mutations in the pTyr-Var screen (which are pairwise comparisons of single substitutions in fixed sequences) had the opposite effect of predictions from the randomized library (ensembles). And indeed, the correlation plots in Figure 5 —figure supplement 1 show quite a bit of scattering and hence negatively correlated sites between their predictions and their results. The authors explored one example of this for SRC, which preferred proline over serine at -2 in the randomized library, yet it more efficiently phosphorylated a site containing -2 serine versus proline in the context of the sequence XEYSFK, where X is the substituted position. Interestingly, this is the opposite effect of what has been previously published and referred to by the authors for EGFR, where -2 proline was preferred over serine in the context of -1 acidic residues (Cantor et al. 2018), indicating that these pairwise rules may not only be unique to specific kinase groups but mutually exclusive between different groups. Are the authors able to identify kinase-specific trends from their datasets for the pairwise selection of amino acids at positions -1 and -2 that negatively correlate with predictions?

5. Cannot access the code used to process and analyze the deep sequence data in the GitHub repository: (https://github.com/nshahlab/2022_Li-etal_peptide-display)

6. Based on the format of this journal, this work seems appropriate as a "Research Advance."

eLife. 2023 Mar 16;12:e82345. doi: 10.7554/eLife.82345.sa2

Author response


Reviewer #1 (Recommendations for the authors):

Points:

1. The potential issues with including Cys in the peptide libraries were not discussed. For instance, a Cys residue in a peptide may be partially oxidized on the surface because of its exposure to oxygen, or, because the Cys is unpaired, disulfide bonds may form between adjacent peptide molecules on the surface. In addition, having multiple Tyr in addition to the central Tyr and using high phosphorylation stoichiometries means that the extra tyrosine in the peptide may be the preferred target.

TCEP is used in the peptide display assay in order to avoid the potential issues discussed above. We agree that having multiple tyrosines in addition to the central Tyr in our peptide display screen is a valid concern. To address this, peptides in which non-central tyrosine residues were mutated to phenylalanine were also included in the pTyr-Var library (labeled “YF” sequences). Furthermore, in the data analysis pipeline for any library used with our platform, sequences containing cysteine residues or multiple tyrosine residues can trivially be filtered out prior to calculating enrichment values. In the revised manuscript, we show for the X5-Y-X5 libraries that this filtering of multi-Tyr sequences has no impact on our specificity maps. We have added the following text to the manuscript:

“Notably, our library includes peptides containing Cys residues and non-central Tyr residues, both of which are often excluded from tyrosine kinase specificity screens to avoid oxidation-related artifacts and challenges in interpreting signal from multi-Tyr sequences (Deng et al. 2014). These sequences can be filtered during data analysis, if needed, although they did not pose significant issues in our studies.”

2. As the authors showed, the method will be useful for studying the influence of neighboring modified amino acids on TK phosphosite selectivity. Indeed, it is already known that pTyr can serve as a positive determinant for TK phosphorylation. Presumably, it was for this reason that the authors tested whether incorporation of CMF, a pTyr mimic, at different positions affected the ability of the c-Src kinase to phosphorylate preferrred sequences. However, CMF is not a very good pTyr mimic, and if the authors wanted to determine possible roles for pTyr itself they should have used one of the recently described methods for incorporating pTyr as an unnatural amino acid into proteins expressed in bacteria (e.g., PMID: 28604693; PMID: 28604697).

In this study, we did not use CMF for the purposes of studying phospho-priming. CMF was used to show that non-canonical amino acids could be incorporated in our screens. We agree that if we were to study the possible roles of phosphotyrosine on substrate specificity, the most ideal amino acid would be phosphotyrosine itself. To avoid confusion, we no longer refer to CMF as a “phosphotyrosine analog” in the main text. Furthermore, the studies mentioned by the reviewer would definitely be a useful starting point for incorporating phosphotyrosine (or its analogs) into our libraries. The method in PMID 28604697 may prove challenging, as it requires a chemical transformation that is not likely to be tolerated by E. coli. Our preliminary efforts with the method described in PMID 28604693 have not been successful and require further optimization beyond the scope of this study.

3. From a methods perspective, some of the recombinant catalytic domains were preactivated by incubation with ATP, presumably leading to autophosphorylation. The sites phosphorylated on the activation loop of TKs can affect primary sequence specificity. Electrospray MS was used to show these preparations were multiply phosphorylated, but which autophosphorylation sites were occupied in the protein that was used to phosphorylate the library were not determined.

The determination of the autophosphorylation sites was not a primary focus for us because we did not think the phosphorylation of the tyrosine kinases would significantly alter substrate recognition. We were not aware of papers showing that tyrosine kinase activation loop phosphorylation alters sequence specificity at the peptide level. Our primary objective was to have kinase domains with sufficient activity for measurements (which often depends on their activation loop phosphorylation status). For the Src-family kinases, under the screening conditions used, the kinase domains rapidly autophosphorylate. This is not likely to be true for c-Abl or AncSZ, but these kinases showed sufficient activity without pre-activation.

4. It is not clear why the surface display method should be "tuned to select a high Kcat". Does the E. coli DH5a strain used here secrete a nonspecific phosphatase that could act on the phosphorylated peptides; the assay buffer included orthovanadate, but it is not clear whether this would inhibit such a generic phosphatase?

In this statement, we were pointing out that for some kinases (e.g. c-Src but not c-Abl), our screens yielded consensus sequences with higher kcat, but weaker KM values than other methods. We simply point this out as an interesting observation for c-Src, but the precise reason for this is unclear. (We discuss this briefly in another response about single-turnover kinetics, below.) Furthermore, we note that the E. coli DH5a strain does not secrete a nonspecific phosphatase. We use sodium orthovanadate to inhibit any residual tyrosine phosphatase, YopH, from co-expression with the tyrosine kinase. While our purification methods remove any detectable amounts of YopH, orthovanadate is added as a precaution. This is a standard practice for many tyrosine kinase activity assays. See some of the following examples: (PMIDs 27700984, 29547119, 25699547, 32479050, 22928736).

5. The authors' c-Src results showed a strong preference for a +1 Asp/Glu/Ser in addition to the +1 Gly previously reported using oriented peptide libraries. However, in the end the improvements in kinetic parameters for their Src and Abl consensus peptides were relatively modest compared to prior published examples.

Based on our screening data, we expect no major difference between +1 Gly and +1 Asp/Glu/Ser in the specificity of c-Src. However, we do notice there is a slight preference for +1 Gly in c-Src when we use the PY20 antibody for labeling (see Author response image 1, which shows the enrichment for each amino acid at the -1 and +1 position). This difference in the antibody used may account for why we see less of an exclusive preference for +1 Gly. Regardless, we have characterized sequences with a change in the +1 position, and the effects on kinetic activity are marginal (Table 1). We also note this broader +1 preference has been observed for c-Src previously (PMID: 29547119). These experiments collectively suggest that the +1 Gly preference for c-Src is not as exclusive as previously observed using oriented peptide libraries.

Author response image 1.

Author response image 1.

6. Figure 2B and Figure 2S1: Src, Abl, EPHB1 and EPHB2 all showed a preference in their display consensus for Pro at +4. While a preference for Pro at +4 is observed in natural Src and Abl substrates, one wonders whether this is a hidden constraint/bias of the method. In addition, the +2 – +4 WWW motif preferred by Fer is curious. Are there any reported (Fer) TK sites with this WWW motif, and does the closely related Fes TK also exhibit this preference? No Trp residues are found in the PhosphoSitePlus motif logo for Fer. Do these Trp residues contribute to the low μM Km for the Fer consensus peptide? Can the WWW peptide be modeled into the active site of the FES TK catalytic domain? Finally, in this regard, it would help the reader if this panel included -5 to +5 numbering under the alignment.

The observation about the +4 Pro is very interesting. We do not know if this is a hidden bias/constraint of the method, but we do note that Pro is not the most preferred amino acid at the +4 position for every kinase tested (see Fer in the X5-Y-X5 library screens, and this can be seen for other kinases from the pTyr-Var screens). To our knowledge, the +2 to +4 WWW motif preferred by Fer has not been reported elsewhere, nor does it appear to be a feature of the few reported natural Fer substrates. The tryptophan residues seem to be contributing to high binding affinity (low KM), but this might manifest in two different ways that warrant a follow-up study: (1) For Fer, specifically, there is very strong +2 Trp enrichment, and this is also seen in the pTyr-Var library, where +2 Trp sequences are uniquely enriched for Fer. (2) It appears that downstream Trp residues show moderate enrichment for many of the kinases tested against both libraries, particularly at +3 and +4 positions. While this could be an artifact of the screens, indirect evidence suggests that these Trp residues actually contribute to substrate binding. For example, when the Fer consensus peptide is measured against c-Src, we observe a low KM value (12 μM), similar to that of Fer (8 μM). This is much tighter than the Src consensus peptide (196 μM). However, the kcat value for the Fer consensus peptide with c-Src is much lower than that of the Src consensus peptide (0.74 s-1 vs 4.9 s-1). Additionally, at low substrate concentrations, the Fer consensus peptide appears to be the preferred substrate for EPHB1, of the consensus peptides tested. These observations point to a potential role for Trp residues in enhancing kinase-substrate interactions in a productive way.

7. The SRC consensus has a preferred Cys at -2, which was not discussed – do the authors know whether this Cys was oxidized on the bacterial surface. If so, it could have provided a partial negative charge and serve a similar purpose to the -2 Glu in SrcTide. How important is the -2 Cys in the synthetic peptide substrates?

We expect that the cysteine is predominately reduced in our screening conditions due to the addition of TCEP. In regards to the importance of the -2 cysteine, we have synthesized variants of the Src consensus peptide in which the cysteine was replaced with an aspartate and did not see a substantial effect on the catalytic activity. Additionally, with the recent revisions, we measured the activity of the Src consensus peptide with a -2 proline using the RP-HPLC assay and observed activity comparable to that seen for the original Src consensus. Therefore, we do not think the -2 cysteine plays a critical role for c-Src, but we also do not think that its enrichment is due to oxidation.

8. Figure 2C: The authors' data showed that EPHB1 displayed little selectivity across the consensus peptides and did not even prefer its own cognate consensus sequence. Moreover, although they did not comment on this, the same seems to be true for EPHB2. The basis for this lack of selectivity needs fuller discussion.

It was surprising to us that both EPHB1 and EPHB2 displayed little selectivity across the consensus peptides. It is unclear why this is the case for EPHB1, however, upon closer inspection, we find that EPHB2 actually does show some selectivity. This was difficult to see in our original figures, where data for every kinase was displayed on an absolute scale. We have reformatted those graphs so that the data for each kinase is normalized to its own consensus peptide. Now, we can see that EPHB2 actually phosphorylates its own consensus, and the c-Abl consensus, preferentially over the other three peptides. Notably, the c-Abl and EPHB2 peptides are very similar, as noted in the main text.

We also wondered if our poor consensus designs for EPHB1/2 might reflect some unfavorable coupling between the most enriched amino acids in the position-weighted scoring matrices. To test this, we looked at the EPHB2 consensus sequence and calculated the enrichment scores of every possible residue pair in that sequence from data generated in the X5-Y-X5 screen. Based on this analysis, we observed that the -3 Glu sequences were enriched overall, but were distinctly depleted in the context of a +1 Glu. Thus, we modified the -3 position in EPHB2 sequence to an apparently more favorable residue in the +1 Glu context, a tryptophan. Measurement of the activity of EPHB2 against this peptide showed an enhancement of activity with the -3 Glu to Trp substitution. This suggests that the combination of the most favorable amino acids in each position does not always yield the best substrate. Finally, although this is an enticing approach to sequence design, we hope that the reviewers appreciate that we are still developing this idea and an in-depth exposition of this approach is out of the scope of this manuscript.

Author response image 2.

Author response image 2.

9. Figure 4F: The authors did not explain why the inclusion of Cys at +1 in the Ret Y981 peptide made it a better substrate for several of the 13 tested TKs.

The mutational effect is not true for every tyrosine kinase, which is why we think that it is a real kinase-specific effect and not an artifact of the assay. Furthermore, this mutational effect is conserved for Src, FGFR1, FGFR3, and MERTK in another peptide: CRYAA_Y48, and Src and MERTK in the peptide U2SURP_Y634. This might be due to the removal of an unfavorable positive charge from the +1 Arg, coupled with the other favorable sequence features in the resulting peptide. It is also possible that other mutations to +1 Arg, such as Ser, could show the same enhancement, but these other substitutions are not represented in our pTyr-Var library. Due to the addition of new data from revision experiments, we have moved the Ret Y981 panel to a figure supplement.

Author response image 3.

Author response image 3.

10. While the authors showed that unnatural amino acids incorporation is compatible with the surface display method, only those modified amino acids for which a cell-permeant modified amino acid and a cognate tRNA charging enzyme have been developed can readily be used. In contrast, any modified amino acid can be used in the position-oriented peptide display approach.

We agree that this is a limitation to the incorporation of non-canonical amino acids in the peptide display platform.

Reviewer #2 (Recommendations for the authors):

1. The authors appear to focus on justifying how their platform identifies the same specificity profiles as previous methods when it is more important to highlight the differences and discuss what they mean. While it is challenging to verify whether the specificity profiles obtained from the current study are closer to the physiological specificity of the TKs/SH2s, it'd be helpful to use the specificity information to predict in vivo substrates and to find out how many known substrates can be identified and how many are missed. After all, the specificity profile is valuable only when it can predict in vivo targets and the effect of mutations on TK/SH2-target interaction (the authors are to be commended for doing a decent job on the latter).

To address this, we compared a curated list of reported kinase-substrate pairs from the PhosphositePlus database with our screening results. Although there is not a lot of overlap between our library and the substrates reported in this curated list, we were able to compare dozens of sequences for c-Src, Abl, and Fyn. Based on this analysis, we find that approximately ~30-40% of the reported substrates are efficiently phosphorylated in our screen (cSrc: 26/79, c-Abl: 8/21, Fyn: 6/17). This disparity is not surprising because we know there are other mechanisms for gaining kinase specificity and the curated list may not accurately represent bona-fide substrates for each kinase. We have added this discussion to the main text and added this annotated list to Figure 4-source data 1.

2. A systematic comparison of the context-independent specificity (obtained from the X5-Y-X5 library) with context-dependent effect (obtained from the pTyr-var library) is missing. It would be important to systematically compare the specificity profiles obtained from the two libraries and identify the common and distinct features (and explain why).

The position-specific amino acid preferences obtained using both libraries are very similar, as shown in Author response image 4. Importantly, those residues that are significantly enriched or depleted in one library show the same effect in the other library, and there are generally very few outlier features for the five kinases tested. While we can observe specific cases where amino acid substitutions have a context-dependent effect, as shown in Figure 5 and discussed in the associated text, the pTyr-Var library does not sample a large enough number of these mutations to extract specific rules. We can envision future experiments that focus on a small number of cognate sequences for each kinase, where libraries of single, double, and triple mutant peptides could be screened to dissect the rules for sequence context dependence.

Author response image 4.

Author response image 4.

3. The platform seems to work better for certain TKs (eg., Src, Abl) than others (eg., EphB1/B2). Is this due to the distinct specificity of these kinases or differences in enzymatic activity? In this regard, how is the activity of the TKs benchmarked?

We did not base the success of our platform for each kinase on whether we were able to obtain good selectivity for its consensus sequence. As mentioned earlier, we could not determine the best consensus sequences for EPHB1 and EPHB2 potentially due to unfavorable coupling between amino acid residues in those sequences. We believe our platform worked just as well for EPHB1/2 as the other TKs because we obtained a distinct specificity profile for EPHB1/2 which showed concordance between the two libraries tested. This is further supported by the fact that the two enzymes have very similar specificity (as expected for close paralogs), and that these enzymes showed distinct specificity when compared with other families of TKs. Finally, the activity of each TK was benchmarked by monitoring library phosphorylation rates using flow cytometry under a standardized set of conditions. We adjusted the concentration of each tyrosine kinase so that the phosphorylation levels of the libraries were similar to what we had achieved with c-Src. We chose c-Src as a reference point because much of the methods development and validation was done using c-Src.

4. By the same token, what's the affinity of an SH2 domain used in the study for its cognate ligand? It should also be discussed/justified in more detail why the SHP2-C SH2 domain was picked when the tandem SH2 domains in SHP2 often work together in ligand recognition.

The affinity of an SH2 domain for one of its cognate ligands is usually in the nM range, however this value can vary from single-digit to triple-digit nM. For c-Src, for which validation experiments were shown in Figure 6, the “cognate” ligand that was fluorescently labeled for competition binding assays had a KD of 160 nM. For SHP2, while we acknowledge that the tandem SH2 domains do work together in ligand recognition, there is a benefit to studying the specificity of the SH2 domains independently. There is evidence that the C-SH2 domain of SHP2 can act independently of the N-SH2 domain. For example, the C-SH2 domain of SHP2 can bind one phosphosite on PD1 with high affinity (13 nM), driving SHP2 localization to PD1, whereas the N-SH2 domain of SHP2 binds ligands (e.g. other sites on PD1) with weaker affinity (2 μM) to drive phosphatase activation (see PMID: 32064351). This is in contrast to the more tightly-coupled tandem SH2 domains, such as those in ZAP-70 and Syk, where there is more interdependence. A follow up study to further disentangle the functional importance of SHP2 N- and C-SH2 specificity is currently underway.

5. The manuscript would be significantly improved if one or more novel sites/substrates predicted using the specificity data are validated on the peptide and protein level.

We agree with the reviewer’s comment and we attempted to validate the phosphorylation of a near-full-length version of SHP2 by c-Src, Fyn, and FGFR1. In this context, we also tested whether the mutational effects of D61V and D61N in our screen could be observed in the context of the full length protein. We have included these experiments in our manuscript (Figure 4F and Figure 4—figure supplement 9).

6. The heatmaps (eg., Figure 5-SI5) look similar at a glance. Is there another way to present the data and show the differences in specificity in a more conspicuous manner? Bar graph – which is more tedious- may be a better choice especially for important positions. Grouping residues based on their physiochemical properties may also simplify the profile pattern and make it easier to read and understand.

For the X5-Y-X5 datasets, we added sequence logos to the figure supplements to accompany heatmaps. We have chosen not to change the pTyr-Var data visualization to logos, both for space considerations, and because we feel that they reflect the data more clearly. We note that the amino acids in our heatmaps have already been ordered in one way that correlates with their physiochemical properties.

7. The inclusion of Tyr in the X positions of the X5-Y-X5 library may obscure the specificity patter as Tyr at any position may be phosphorylated. I'd suggest repeating a couple of TKs using a library that contains only a central Tyr.

We agree that the inclusion of non-central tyrosines could have potentially obscured the specificity patterns. We re-analyzed our data by filtering out peptides with more than one tyrosine residue. We see that the specificity patterns are retained for each tyrosine kinase. This is now reflected in the main text and in figure supplements.

8. The AA read counts (eg. Figure S1) varied widely, why? Does this have an effect on the screening?

The variance in read counts are primarily dependent on the quality of the uniformity of the degenerate oligonucleotide mixtures used to construct the library, but they also reflect codon redundancy in an NNS context (some amino acids are still encoded by 2 or 3 codons). The amino acid read counts for each position can be measured with high reproducibility (see the revised Figure 1—figure supplement 1). The range of frequency values are also narrow; almost all position-specific amino acid frequencies within 5-fold of the mean value (see the revised Figure 1—figure supplement 1). The variation in read counts does not have an effect on the screen because we normalize our read counts in a selected sample with that of an unselected (input) sample.

9. Figure 5-SI1, provide correlation coefficient; why is the apparent correlation for Abl poorer compared to EphB1 when the former appears to have a more defined specificity pattern?

The correlation coefficient has been included in the revised figure (now Figure 5-supplement figure 2). We do not know exactly why the apparent correlation for Abl is poorer compared to the other tyrosine kinases. We speculate that inter-residue coupling may play more of a role in the mutational effects observed in the screen than the other tyrosine kinases.

10. Figure 6 – SI6 – The orthogonal specificity pattern for the CTK kinase and SH2 domains seems at odds with the established notion that the CTK kinase and SH2 domain specificities are related., eg, the pTyr sites created by the kinase are preferably bound by its own SH2 domain or closely related SH2 domains (Songyang et al. Nature 1995; 373:536-9). This needs to be discussed.

We were also initially surprised by the orthogonality in specificity between the kinase and SH2 domain of c-Src, based on the mentioned Songyang et al. paper. However, we noticed our c-Src kinae and SH2 domain specificity patterns matched previously reported data (see the Scansite database). The major difference for c-Src appears to be that the kinase domain prefers a +3 Phe, whereas the SH2 domain prefers a +3 aliphatic residue. By contrast, much of the old data in this field focused on c-Abl, where both the kinase and SH2 domains have a distinctive +3 Pro preference. This is now explicitly addressed in the main text.

Reviewer #3 (Recommendations for the authors):

Manuscript presentation:

1. The introduction provides an overview of existing methods for determining kinase specificity. However, the authors neglect to mention a couple of approaches that arguably best rival methods employed in the current manuscript from the standpoint of identifying context-dependent selectivity. These include MS-based approaches that use proteome-derived peptide libraries, from either protease digests of cell lysates or genetically encoded libraries (the cited Barber et al. paper is mischaracterized as performing analysis on cell lysates, and these papers were not cited: Kettenbach, et al. PMID: 22633412, Douglass et al. PMID: 22723110, Imamura et al. PMID: 24869485 and Xue, et al. PMID: 22451900). In addition, an approach using yeast surface display in which Tyr kinases are targeted to the secretory pathway deserves mention (Taft et al. PMID: 31339688). The authors could also consider less extensive referencing of historical methods from the 1990s that are no longer used.

The manuscript was edited to include the relevant citations and also omit a few older ones. We have also altered the phrasing describing the Barber et al. to indicate that some of these studies have been done using purified genetically-encoded peptide libraries.

Data availability:

2. I could not find the deep sequencing data for kinase selections of the X5-Y-X5 library. Did they apply a cutoff for the number of read counts to be included in their analysis?

All of our fastq and fasta files are now freely accessible in this publicly accessible dataset: https://doi.org/10.5061/dryad.0zpc86727. A cutoff for the number of read counts was not included in the analysis of the X5-Y-X5 library. When sequencing, we aimed for around a million reads per sample and that determined the number of read counts for each amino acid in our analysis. You can see an example of the read counts per amino acid per position in Figure 1—figure supplement 1, which shows thousands of reads per feature (except at the central tyrosine, where other amino acids were omitted by design).

Technical points:

3. Two versions of the X5-Y-X5 library were made – one with a strep tag and one with a myc tag. Were both subjected to quality control sequencing as shown in Figure 1 supp 1, and which is shown in the table? Were both of them used for screens?

Both the myc and strep-tag versions of the X5-Y-X5 libraries were sequenced for quality control and were found to be comparable in quality. The data shown in Figure 1—figure supplement 1 was obtained using the strep-tag library. For the kinase domain screens, we used the myc-tagged X5-Y-X5 library to avoid any potential background binding between strep-tag and avidin bead. We used the strep-tagged library for the SH2 domain screens since our protocol was modified so that the avidin beads were already saturated with SH2 domains, although both libraries work fine in this format.

4. It is not clear that the authors estimated the representation of the X5-Y-X5 library other than from the standpoint of pooled representation of each residue at each position ("1-10 million unique peptide sequences" were observed by sequencing). How many transformants were recovered during library cloning, and how does this compare to the number of sequencing reads used for quality control?

Based on our estimations, we recovered ~40,000,000 transformants for the X5-Y-X5 library during library screening. We get ~1,000,000 reads per sample by deep sequencing, which does not cover the number of possible sequences in the library. While each replicate samples a different subset of the library, we find that the distribution of amino acid frequencies is conserved across replicates (see the correlation graph in Figure 1—figure supplement 1 for an example). The way we have estimated the total sequence diversity of the library is through an analysis of the unique sequences observed across all of our input and selected sequencing runs. Over the course of dozens of sequencing runs with the X5-Y-X5 library, we have observed roughly 10 million unique translated sequences.

5. Selection of Tyr residues from the X5-Y-X5 library are likely due to "off target" phosphorylation of the non-central residue. This could muddy the specificity analysis as there would contribution of other selected residues at the "wrong" positions. For the Tyr-Var library, the authors avoided this issue by including Tyr to Phe substitutions. Do any of the results for X5-Y-X5 change if they exclude all peptides with Tyr at a non-central position?

As mentioned in the responses to previous reviewers’ comments, we amended our X5-Y-X5 analysis to exclude peptides with tyrosines at non-central positions. We observed that this amendment of our analysis did not significantly alter the specificity profiles obtained for each kinase.

6. Based on the concentration bacterial cells used in the kinase reactions, it seems likely that the authors are observing single turnover rather than multiple turnover kinetics. This could explain some of the differences between the results described here and prior analyses with the same kinases. Can the authors estimate the effective substrate concentration in their experiments? Can they comment on the potential impact of single vs multiple turnover kinetics on their results?

We have also suspected that the concentration of bacterial cells might dictate what the resulting consensus sequences are and how they differ from other reported consensus sequences. In the past, we have conducted experiments with the library at ~10-fold lower cell densities and do not see a major difference in specificity. Unfortunately, it is difficult to achieve higher cell densities. In our experiments, the cells are approximately at an OD600 of 1, which equates to round 109 cells/mL (1-2 pM cells). Even at a surface display density of 1,000 or 10,000 molecules per cell, this could put the kinase (500 nM) in excess of substrate, putting us in a single-turnover regime. Notably, both the kinase and substrate concentrations are significantly below typical tyrosine kinase-substrate KM values. Taken together, it might be the case that our screens are reporting on sequence parameters that dictate enzyme-substrate complex formation and the phosphoryl transfer reaction, but not product release, which is critical for a high kcat value. Strangely, our Src consensus peptide has a higher kcat value than previously reported SrcTide peptides, although it is unclear if those peptides were designed from screens done in the single- or multi-turnover regime.

7. For Tyr kinases that were analyzed with both libraries, how do the specificity matrices derived from the Tyr-Var data (Figure 4 – Figure supp 5) compare to those from the X5-Y-X5 data (Figure 1B and Figure 2A)?

For each kinase analyzed, specificity profiles obtained from both the pTyr-Var and the X5-Y-X5 datasets were similar (Figure 1B, 2A, Figure 4—figure supplement 5). A comparison of the position-specific amino acid enrichments from each library, excluding multi-Tyr sequences, is shown above, in response to another reviewer. In the manuscript, we chose to emphasize the specificity profile obtained using the X5-Y-X5 library because the amino acid composition of the peptides in this library is not biased. In contrast, the amino acid composition of peptides in the pTyr-Var library is constrained by the ~10,000 sequences derived from the human proteome. So, we reason that any discrepancies in the specificity profiles between the pTyr-Var and X5-Y-X5 library may be due to biases from the pTyr-Var library itself.

8. The observation that there is context dependence – the impact of a given amino acid substitution differs depending on the surrounding sequence – is not surprising given the author's previous work. However, the specific example shown here should be interpreted with caution, as undersampling may impact what is construed to be the "average" signal in the X5-Y-X5 library. It would be most convincing if there were examples where the impact of a given amino acid substitution were observed on two different peptide sequences, i.e. where enrichment scores for both the "WT" and variant sequence could be calculated. One would also want to see some verification by in vitro kinase assay that the same substitution caused different effects on reaction rates in the context of two different peptides.

We agree that the specific example in our paper would be most convincing if we observed the opposite effect of the same amino acid substitution in a different sequence context. To address this, we made -2 proline and -2 serine mutations to the Src consensus peptide sequence and tested the activity of Src against these peptides. We found that the mutations had the same effect to what we would have predicted from the X5-Y-X5 library and this effect is opposite to what we observed in our specific example. This new data can be found in Figure 5 of the revised manuscript.

Reviewer #4 (Recommendations for the authors):

1. Concerning the peptide design in their random library (sequence: X5-Y-X5, where X = 20 natural amino acids falling within their 'NNS' codon constraint), neighboring tyrosines (TAC codons) theoretically account for ~3% (1/32) of residues at all random positions, meaning that as much as 30% of the peptide pool contains at least two potential tyrosine phosphoacceptors per peptide. For the heatmap motifs presented in Figure 1, Figure 2, Figure 4—figure supplement 5, and Figure 7, we can observe strong enrichment of tyrosines at multiple positions, especially for JAK2 and the RTKs, and it is not certain whether this is due to positional selection (that facilitates phosphorylation of the tyrosine at position zero) or direct phosphorylation of the neighboring tyrosine. This double phosphorylation effect may limit interpretation of the motif heatmaps presented in the figures that assume phosphorylation occurs only at the tyrosine at position zero. Have the authors considered generating separate heatmaps that omit the subset of enriched peptides containing two or more tyrosines? That might reduce background and further improve the quality of the data.

The strong enrichment of tyrosines at multiple positions, especially for JAK2 and the RTKs, could potentially be evidence for phospho-priming. This has also been observed in past screens with another receptor kinase, EGFR (PMIDs: 26551075, 30012625). As mentioned in the response to previous reviewers’ comments, we amended our X5-Y-X5 analysis to exclude peptides with tyrosines at non-central positions (figure supplements associated with Figures 1 and 2). This amendment of our analysis did not significantly alter the specificity profiles obtained for each kinase.

2. Figure 6B: Given that the authors have demonstrated essentially complete phosphorylation of the displayed peptides by their kinase cocktail in Figure 6—figure supplement 2, it seems more appropriate to replace the label above the heatmaps in 6B "position relative to tyrosine" with "position relative to phosphotyrosine". Moreover, this indicates that they have probably phosphorylated most of the neighboring tyrosines in the displayed peptides containing two or more tyrosines. How would the heatmap motifs for the SH2 domains appear if the authors excluded the subset of enriched peptides containing two or more (likely to be phosphorylated) tyrosines?

We agree with the reviewer and have replaced the label on the heatmaps in Figure 6B with “position relative to phosphotyrosine.” We also constructed the heatmap profiles of the SH2 domains with the exclusion of peptides with non-central tyrosine residues and saw no difference in the specificity (Figure 6-supplement figure 3). There does not appear to be any significant impact of omitting sequences with non-central tyrosine (probably phosphotyrosine) sequences from these analyses.

3. The authors have not included a single sequence logo to represent their phosphorylation site motifs or pY binding motifs. Is there a specific reason for this? Researchers less familiar with these approaches generally have an easier time interpreting sequence logos than heatmaps.

The specific reason for this is that our group prefers heatmaps over logos. Nonetheless, we appreciate the reviewer’s point that others might find logos more intuitive, so we have added sequence logos for the X5-Y-X5 screens to the figure supplements. In addition, we have provided the numerical position-weighted matrices used to make every heatmap and logo in the paper (for both libraries) as source data files, so that other researchers can render the data in the way that best suits their needs.

4. An advantage of the authors' platform over alternative approaches is its potential to decipher pairwise interactions between amino acids on substrate peptides. The authors report that, depending on the kinase, 5-15% of all significant mutations in the pTyr-Var screen (which are pairwise comparisons of single substitutions in fixed sequences) had the opposite effect of predictions from the randomized library (ensembles). And indeed, the correlation plots in Figure 5 —figure supplement 1 show quite a bit of scattering and hence negatively correlated sites between their predictions and their results. The authors explored one example of this for SRC, which preferred proline over serine at -2 in the randomized library, yet it more efficiently phosphorylated a site containing -2 serine versus proline in the context of the sequence XEYSFK, where X is the substituted position. Interestingly, this is the opposite effect of what has been previously published and referred to by the authors for EGFR, where -2 proline was preferred over serine in the context of -1 acidic residues (Cantor et al. 2018), indicating that these pairwise rules may not only be unique to specific kinase groups but mutually exclusive between different groups. Are the authors able to identify kinase-specific trends from their datasets for the pairwise selection of amino acids at positions -1 and -2 that negatively correlate with predictions?

The opposite coupling effect for Src and EGFR is a very good point. This likely reflects a combination of the different sequence specificities for Src and EGFR, coupled with differences in what kinds of peptide conformations are tolerated by each kinase. This definitely warrants a deeper structural investigation. Unfortunately, there was not enough “depth” across mutations and sites in the pTyr-Var library to infer any significant rules for coupling (certainly not kinase-specific rules). We did notice that many of the mutations that could not be predicted well using the X5-Y-X5 library were at the -2, -1, +1 and +2 positions, further suggesting a role for local sequence context dictating peptide conformation.

5. Cannot access the code used to process and analyze the deep sequence data in the GitHub repository: (https://github.com/nshahlab/2022_Li-etal_peptide-display)

Due to a formatting error, a hyphen was missing from the link. Here is the correct link: https://github.com/nshahlab/2022_Li-et-al_peptide-display

6. Based on the format of this journal, this work seems appropriate as a "Research Advance."

We agree that this paper could be a Research Advance, and indeed a precursor to this paper (eLife 7:e35190) was a Research Advance coupled with (eLife 5:e20105). However, given the significant expansion in technical scope for the peptide screening platform described in this paper, relative to the previous articles, we felt that a “Tools and Resources” article was appropriate. We defer to editors as to whether this work should be reported as a “Research Advance” or a “Tools and Resources” article.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Li A, Voleti R, Lee M, Gagoski D, Shah NH. 2023. Data from: High-throughput profiling of sequence recognition by tyrosine kinases and SH2 domains using bacterial peptide display. Dryad Digital Repository. [DOI] [PMC free article] [PubMed]

    Supplementary Materials

    Figure 1—figure supplement 1—source data 1. Counts table corresponding to one sequence run from an input X5-Y-X5 library.
    Figure 2—source data 1. Position-specific amino acid enrichment matrices from the tyrosine kinase X5-Y-X5 library screens.

    Matrices calculated with and without inclusion of multi-tyrosine sequences are provided.

    Figure 4—source data 1. Enrichment scores from tyrosine kinase pTyr-Var screens.

    Data are provided in a flat sheet with average and standard deviation values for all kinase-substrate pairs. Data are also provided for each kinase as a side-by-side comparison of enrichment scores reference and variant sequences and whether the mutation was considered a significant in our analysis. Three sheets are provided listing substrates for c-Src, Fyn, and c-Abl that are also found in a curated list of kinase-substrate pairs in the PhosphositePlus database.

    Figure 4—source data 2. Position-specific amino acid enrichment matrices from the tyrosine kinase pTyr-Var library screens for sequences containing a single central tyrosine residue.
    Figure 5—source data 1. Peptide sequences and their phosphorylation rates by c-Src or c-Abl, measured using the RP-HPLC kinetic assay.
    Figure 5—source data 2. Mutational effects measured from the pTyr-Var library screens and their corresponding predictions based on the X5-Y-X5 library screening data.

    Only those sequence pairs with high-quality sequencing data (read counts >100) and a single central tyrosine were included in the analysis.

    elife-82345-fig5-data2.xlsx (144.2KB, xlsx)
    Figure 6—source data 1. Position-specific amino acid enrichment matrices from the SH2 domain X5-Y-X5 library screens.

    Matrices calculated with and without inclusion of multi-tyrosine sequences are provided.

    Figure 6—source data 2. Enrichment scores from SH2 domain pTyr-Var screens.

    Data are provided in a flat sheet with average and standard deviation values for all SH2-ligand pairs. Data are also provided for each SH2 domain as a side-by-side comparison of enrichment scores reference and variant sequences and whether the mutation was considered a significant in our analysis.

    Figure 6—source data 3. Position-specific amino acid enrichment matrices from the SH2 domain pTyr-Var library screens for sequences containing a single central tyrosine residue.
    MDAR checklist

    Data Availability Statement

    All of the processed data from the high-throughput specificity screens are provided as source data files. The raw fastq and fasta sequencing files are available as a Dryad repository (DOI:https://doi.org/10.5061/dryad.0zpc86727). Custom code used to process/analyze screening data can be found in a GitHub repository (copy archived at Li et al., 2023) as specified in the manuscript.

    The following dataset was generated:

    Li A, Voleti R, Lee M, Gagoski D, Shah NH. 2023. Data from: High-throughput profiling of sequence recognition by tyrosine kinases and SH2 domains using bacterial peptide display. Dryad Digital Repository.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES