Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2025 Sep 18.
Published in final edited form as: Cell. 2019 Oct 3;179(2):543–560.e26. doi: 10.1016/j.cell.2019.09.008

Oncogenic mutations rewire signaling pathways by switching protein recruitment to phosphotyrosine sites

Alicia Lundby 1,2,#, Giulia Franciosa 1, Kristina B Emdal 1, Jan C Refsgaard 1, Sebastian P Gnosa 3, Dorte B Bekker-Jensen 1, Anna Secher 1,3, Svetlana R Maurya 2, Indranil Paul 1, Blanca L Mendez 1, Christian D Kelstrup 1, Chiara Francavilla 1, Marie Kveiborg 3, Guillermo Montoya 1, Lars J Jensen 1, Jesper V Olsen 1,5,#
PMCID: PMC7618132  EMSID: EMS208661  PMID: 31585087

Summary

Tyrosine phosphorylation regulates multi-layered signaling networks with broad implications in (patho)physiology, but high-throughput methods for functional annotation of phosphotyrosine sites are lacking. To decipher phosphotyrosine signaling directly in tissue samples, we developed a mass spectrometry-based interaction proteomics approach. We measured the in-vivo EGF-dependent signaling network in lung tissue quantifying >1000 phosphotyrosine sites. To assign function to all EGF-regulated sites we determined their recruited protein signaling complexes in lung tissue by interaction proteomics. We demonstrate how mutations near tyrosine residues introduce molecular switches that rewire cancer signaling networks, and we revealed oncogenic properties of such a lung cancer EGFR mutant. To demonstrate the scalability of the approach we performed >1000 phosphopeptide pulldowns, analyzed them by rapid mass spectrometric analysis revealing tissue-specific differences in interactors. Our approach is a general strategy for functional annotation of phosphorylation sites in tissues enabling in-depth mechanistic insights into oncogenic rewiring of signaling networks.

Introduction

Tyrosine phosphorylation controls physiological signaling networks and represents a mechanism for cells to transiently alter protein function, such as enzymatic activity, protein-protein interactions and protein localization. In absence of signaling events, tyrosine phosphorylation is maintained at low stoichiometric levels (Sharma et al., 2014). Despite its low abundance, it is a central post-translational modification (Hunter and Sefton, 1980; Sefton et al., 1980) that upon deregulation is critically involved in disease, notably cancer. Tyrosine kinases have accordingly become prominent drug targets (Cohen, 2002; Klaeger et al., 2017; Rix and Superti-Furga, 2009; Zhou et al., 2013). For example, oncogenic driver mutations are prevalent in the epidermal growth factor receptor (EGFR) in lung adenocarcinomas (Kandoth et al., 2013), which are targeted with anti-EGFR therapies (Paez et al., 2004; Soria et al., 2018). The physiological importance of tyrosine kinases are recognized, but our knowledge of the molecular consequences of aberrant wiring of phosphotyrosine signaling in pathophysiological states is limited. The signaling response downstream of tyrosine kinases is complex and difficult to measure. The immediate signaling response induces changes in phosphorylation site stoichiometry. Such changes have been analyzed by quantitative mass spectrometry-based phosphoproteomics, which can readily analyze thousands of phosphorylation sites (Francavilla et al., 2013; Olsen et al., 2006; Sharma et al., 2014). For the more abundant serine and threonine phosphorylation responses, some studies have even done so in tissues (Humphrey et al., 2015; Liu et al., 2018; Lundby et al., 2013). Investigating the signaling network in a tissue-specific context is preferential as it provides insight into the mechanisms encoding specificity (Liu et al., 2018). Site-specific tyrosine phosphorylation predominantly occurs within intrinsically disordered protein regions (IDRs), which are peptide segments without stable tertiary structure (van der Lee et al., 2014). The accessibility of short linear peptide motifs consisting of 3-7 amino acid stretches in IDRs allow peptide-protein interactions to occur (Tompa et al., 2014). Upon phosphorylation, tyrosine residues often function as docking sites for recruitment of adaptor proteins containing SH2 or PTB domains (Pawson and Scott, 1997). That is, as consequence to the phosphorylation site stoichiometry changes dynamical protein-protein complexes at regulated phosphorylation sites are formed. This recruitment of adaptor proteins to regulated phosphotyrosine sites wire an assembly of protein complexes. The dynamically recruited protein complexes exert their individual functions and expand the signaling response. It remains a formidable challenge to assign molecular function, such as recruitment of protein complexes, to phosphorylated residues. An essential step in understanding the complex wiring of phosphotyrosine signaling is to develop methods to detect which phosphotyrosine sites are regulated upon a stimulus, as well as detecting which protein complexes are recruited to the regulated sites. Here we have developed a proteomics based approach to assess i) which tyrosine residues are phosphorylated in-vivo upon tyrosine kinase activation, and ii) which interacting proteins are recruited as readers of the regulated phosphotyrosine sites (Fig. 1a). We present a first depiction of the EGF-dependent phosphotyrosine signaling network in-vivo in lung tissue and unravel the protein complexes assembled at EGF-dependent phosphotyrosine sites. We demonstrate how the lung cancer mutation EGFR P1019L induces a switch in adaptor protein interactions at position pY1016, which leads to sustained activation of downstream kinase signaling pathways ultimately resulting in enhanced cell migration and invasiveness. Cancer mutations in vicinity of phosphotyrosine sites are recurrent, and as demonstrated for EGFR P1019L their detrimental effect can be caused by introduction of molecular switches that alter protein signaling networks. To enable rapid and high-throughput analysis of such mutations, and of protein interactions at phosphotyrosine sites in general, we present a strategy based on cutting-edge mass spectrometry instrumentation and data independent acquisition. For nine cancer mutations in vicinity of phosphotyrosine sites we show that their functional consequence is introduction of switches in the molecular wiring of recruited protein complexes. Our strategy presents an approach for multilayered tissue-based investigation of phosphotyrosine signaling in general, and for functional assignment of cancer mutations in proximity to phosphotyrosine sites specifically, providing a molecular explanation of their mechanism of action.

Figure 1. Quantitative proteomics of EGF-dependent signaling in rat lung tissue.

Figure 1

(A) When a phosphotyrosine signaling response is elicited it results in i) changed phosphorylation site stoichiometry induced by kinases, ii) binding of adaptor proteins to the phopshotyrosine sites and iii) removal of phosphate groups by phosphatases. Here, we performed a MS-based quantitative phosphoproteomics analysis of EGF dependent signaling in lung tissue (two biological replicates of n=4 rats in each group injected with either saline or EGF). Total peptide mixtures, titanium dioxide (TiO2)-enriched serine/threonine phosphopeptides and antibody-enriched tyrosine phosphopeptides extracted from the lung tissues were analyzed by high-resolution LC-MS/MS. Tyrosine phosphorylation sites regulated upon EGF stimulation were identified, and for each regulated phosphotyrosine site we performed peptide based pulldowns in tissue lysate to identify protein complexes recruited to each site. From these data, we can build the signaling network activated upon EGF stimulation, covering both regulated phosphorylation sites but also the function of regulated phosphotyrosines in terms of their recruited protein complexes. (B) Immunoblots for EGFR, downstream signaling molecules (SHC and ERK) and tubulin as a loading control for four rats in each of the two groups (control versus EGF stimulated). (C) Volcano plot analysis of the phosphotyrosine-proteomics data. Each dot represents a peptide, those in blue are unmodified peptides from the proteome measurements and those in red are tyrosine phosphorylated peptides. Significantly regulated peptides are highlighted as filled circles, and for some of them the gene name and the regulated site is indicated (significance criteria, student’s t-test p<0.05 and ratio>4 for EGF-stimulated versus control rats, are shown as dashed lines). (D) We synthesized peptides corresponding to the regulated phosphotyrosine residues and their six flanking amino acids for all regulated phosphotyrosine sites. These peptides were used as baits in pulldown experiments in three different lung tissue homogenates in a 96-well plate format. Proteins bound to synthesized peptides were digested on beads and subsequently analyzed by high-resolution LC-MS/MS. See also Figures S1 and S2, and Table S1 and S2.

Results

Phosphotyrosine signaling activated in lung tissue by EGF stimulation

Amplified EGFR signaling is a common causal oncogenic mechanism in non-small cell lung cancer (Janne et al., 2005; Rikova et al., 2007), but the EGF-dependent signaling response in lungs is poorly understood. To evaluate the signaling layer orchestrated by tyrosine phosphorylation in-vivo, we designed animal experiments, where two groups of rats received intravenous saline or EGF injections for five minutes, respectively (Figure 1A). As expected (Zhang et al., 2005), a strong phosphorylation response was elicited and validated by immunoblotting (Figure 1B). From each set of lungs three samples were prepared: (1) a sample for proteome measurements, (2) titanium dioxide (TiO2)-enriched phosphopeptides for serine/threonine phosphoproteomics and (3) antibody-enriched phosphotyrosine peptides for tyrosine phosphoproteomics (Rush et al., 2005). All samples were analyzed by high-resolution LC-MS/MS on a Q-Exactive Orbitrap mass spectrometer by single-shot measurements. Raw LC-MS/MS files were processed with MaxQuant (www.maxquant.org) and led to identification of >45,000 unique peptides in the proteome samples, >13,000 unique serine/threonine phosphorylated peptides and more than a thousand quantified unique tyrosine phosphorylated peptides (Table S1, Figure S1A). In previous in-vivo phosphoproteomics studies, phosphotyrosines made up approximately 1% of phosphorylation sites (Humphrey et al., 2015; Huttlin et al., 2010; Lundby et al., 2012). Here, using specific antibodies the number of identified phosphotyrosine sites increased 20-fold representing 13% of the total phosphoproteome (Figure S1B). Motif analysis of the amino acid sequences proximal to regulated phosphosites confirm that the signaling response is mainly mediated by ERK (PxS/TP sequence motif) and basophilic kinases such as AKT (RxxS sequence motif) (Figure S1C).

To evaluate the importance of investigating the signaling response in lung tissue rather than in a cell line to ensure relevance to lung physiology, we compared our EGF-dependent lung phosphotyrosine response with corresponding measurements from A549 human non-small cell lung cancer cells (Table S2) and HeLa human cervix carcinoma cells (Francavilla et al., 2016b). The tyrosine phosphoproteome of A549 cells resembled that of HeLa cells more than that of orthologous sites in rat lung tissue (Figure S1D). For five regulated sites that were conserved between lung tissue and A549 cells, we confirmed their EGF-dependence by immunoblotting (Figure S1E-H). Yet, less than half of the proteins with regulated tyrosine phosphorylation sites in rat lung tissue were EGF-dependent in A549 cells. As previously suggested by Liu and colleagues (Liu et al., 2018), these observations underscore the importance of performing tissue-specific investigations of signaling pathways to ensure physiological relevance of the findings.

All three sample types (proteomes, serine/threonine phosphoproteomes and tyrosine phosphoproteomes) were reproducible between biological replicates with Pearson correlation coefficients 0.89<R<0.99 for quantile-based normalized peptide MS signal intensities (Figure S2A). Based on label-free quantification we identified more than eighty tyrosine residues with increased phosphorylation stoichiometry in response to 5 min EGF stimulation (t-test p < 0.05 and fold change > 4, Figure 1C). The cut-offs applied in this analysis are stringent and correspond to an estimated false discovery rate of 0.002 (Figure S2B). Gene Ontology enrichment analysis on the set of proteins harboring the 88 regulated phosphotyrosines couples the response to an overrepresentation of tyrosine kinases and adaptor protein activity, which supports a comprehensive activation of the EGF-dependent signaling pathway (Figure S2C). Proteins with regulated phosphotyrosine sites are distributed across the entire protein abundance spectrum, despite EGFR being among the least abundant proteins detected in our lung proteomes (Figure S2D). All EGF-dependent phosphotyrosine sites identified in lung tissue are provided in Table 1, where proteins are grouped by their biological function. That is, an in-vivo quantitative phosphoproteomics approach allowed us to identify 88 tyrosine residues with increased phosphorylation stoichiometry in lungs in response to 5 minutes EGF stimulation.

Table 1. List of all proteins with significantly regulated phosphotyrosine sites upon EGF stimulation in lung tissue.

The gene name of the proteins with significantly regulated phosphotyrosines sites are listed along with the site information for the regulated phosphotyrosine site. The label-free log2-transformed intensity ratio measurements between EGF stimulated and control rats are given. The proteins are grouped according to their known biological functions. The amino acid position of tyrosine phosphorylation sites for EGFR are indicated according to the human amino acid sequence. In five cases ratios measured in non-small lung cancer A549 cells (*) or HeLa cells (**) are provided due to lack of coverage in the rat lung tissue. Related to Figure 2.

Protein kinases and regulators Protein phosphatases and regulators Actin and cytoskelton organization Membrane ruffles
EGFR Y869 2.1* PRDX1 Y193 2 EPB41L2 Y606 2.9 CLASP2 Y1360 2.5
EGFR Y998 5.2* PTPN11 Y546 2.6 DLG2 Y726 2.2 FGD5 Y1450 2.4
EGFR Y1016 3.5** PTPRC Y652 4.1 DLG3 Y368 2.8 FGD5 Y898 2
EGFR Y1069 6.1* SLAMF6 Y322 7.5 LMO7 Y133 3.4 FRMD4B Y871 2.9
EGFR Y1092 4.2* SH3/SH2 adaptor proteins SPTAN1 Y942 4.3 Replication and transciption regulators
EGFR Y1110 6.5 CRK Y108 3.9 WDR1 Y72 2.2 ERH Y92 4.2
EGFR Y1138 6.7 CRK Y136 3.6 WDR6 Y332 2.6 HNRNPUL1 Y510 2.5
EGFR Y1172 7.3 CRKL Y127 2 Focal adhesion proteins PCNA Y211 3.2
EGFR Y1197 2.6 GAB1 Y317 2.5 EVL Y38 2.5 PTBP1 Y126 2
EPHA1 Y782 3.1 GAB1 Y659 4.4 SDCBP Y92 2.5 Calcium binding proteins
FGR Y197 2.5 GAB2 Y603 4.6 TNS2 Y652 2.6 CALM1 Y100 4.6
GAREM Y453 4.3 HCLS1 Y323 3 TNS2 Y668 2.7 CALU Y47 3.2
LYN Y244 2.4 PLCG1 Y1253 3.6 Metabolic processes Endocytic proteins
MAPK1 Y185 3 SH2B1 Y55 3.2 ACTB Y169 2.1 CPNE8 Y423 2.1
MAPK13 Y182 2.6 SHANK3 Y197 2.1 BPGM Y92 3.6 EHD4 Y451 5.3
MAPK8 Y185 2.4 SHC1 Y423 6.1 CBR1 Y194 3.6 RUFY1 Y314 4.8
PIK3AP1 Y694 2.9 STAM Y199 5.5 DBI Y29 4.3 TRAPPC9 Y573 4.8
PIK3R1 Y431 2.9 VAV1 Y826 3.1 EIF2B5 Y578 3.1 Apoptotic and proteolytic proteins
PIK3R1 Y580 3.5 Small GTPase mediated signaling FBP2 Y216 2 EFHD2 Y83 2.1
PRPF4B Y849 2.2 ARHGDIA Y133 2.8 PABPC1 Y54 5.5 GSTP1 Y109 2.7
PTK2 Y928 2.2 ARHGEF6 Y715 2.2 PABPC1 Y56 5.5 TJP2 Y1093 2.2
SGK269 Y631 5.6 GPSM1 Y150 3.7 PGAM2 Y96 2.9 USP14 Y285 3.2
TAOK1 Y43 2 RAB10 Y6 2.4 PGM1 Y353 3.1
TNK2 Y518 3.3 TBC1D9B Y854 2.7 RPSA Y139 3.5
TNK2 Y859 3.5
ZAP70 Y392 3.3

Peptide-based interaction screen of regulated phosphotyrosines

Regulated phosphotyrosines often lead to formation of dynamical protein complexes (Blagoev et al., 2003) by presenting docking sites for adaptor proteins containing for instance SH2 domains (Bae et al., 2009). Cellular specificity of phosphotyrosine-binding proteins is achieved through the amino acid sequences surrounding the phosphotyrosine site, the higher order structures and co-expression patterns (Schlessinger and Lemmon, 2003; Songyang et al., 1993). Recruitment of such protein binders expands the signaling response. For a few phosphotyrosines, protein binders have been identified in lysates from cell lines by phosphopeptide pulldown strategies with quantitative mass spectrometry as read-out (Boersema et al., 2010; Schulze et al., 2005) or by two-hybrid assays (Petschnigg et al., 2014). To identify protein complex formations assembled at EGF-dependent phosphotyrosines directly in lung tissue, we synthesized peptides corresponding to the 88 regulated phosphotyrosine sites and the six flanking amino acids on either side of each phosphotyrosine site and used these as baits in pulldown experiments in lysates from lungs. Each peptide was synthesized with a biotin tag and a hydrophilic linker. Phosphopeptide pulldowns were performed in a 96-well plate format based on an established strategy (Eberl et al., 2013) that we modified for large-scale interaction proteomics in tissue samples. Pulldowns were performed in triplicates from three sets of lung tissue lysates resulting in a total of 264 peptide pulldowns (Figure 1D). Interacting proteins were digested on-beads and the 264 pulldown samples were subsequently analyzed by high-resolution LC-MS/MS with 1h gradients.

Web platform with analysis tool for quantitative interaction proteomics data

In pulldown experiments, it is essential to discriminate between specific and unspecific binding partners. For phosphotyrosine sites this is particularly challenging as unspecific interactions occur even in the low micro-molar range (Sharma et al., 2009). To overcome this challenge, we developed an analytical framework that takes into consideration quantitative information from multiple pulldown experiments as well as the abundance of each protein in the proteome. The principle of the strategy is explained here and illustrated in Figure 2A, but extensive details on the analytical strategy and computational framework are provided in the STAR Methods. To enable others to apply this analysis strategy, we present a web platform, where analyses of quantitative interaction proteomics datasets can be performed.

Figure 2. Protein complexes recruited to regulated phosphotyrosine sites.

Figure 2

(A) Schematic representation of the analytical framework we developed for analysis of pulldown datasets. Bioinformatics analyses of the pulldowns are performed in the web-interface we developed (https://pulldown.jensenlab.org). Protein intensities are normalized and their variances are calculated as a function of intensity. Missing values in control experiments are inferred according to a hierarchy and a modified Welch’s t-test is performed for each pulldown. Statistically significant protein interactors for each pulldown are determined based on Significance C and represented in volcano-plots. Fractional stoichiometries of significant protein interactors are provided, highlighting the likely direct interaction partner. An example result is shown for peptide pulldowns of a peptide flanking the regions of EGFR phosphorylated on tyrosine 1172 using this approach, highlighting GRB2 as direct interaction partner. (B) Clustering of interaction partners identified in the 88 interaction complexes. Proteins are color-coded by their fractional stoichiometry in the complex. For six of the 88 complexes the information contained in the clustering plot is illustrated. Results represent three replicates per experiment group with p < 0.01. (C) The core signaling complex of the in-vivo EGF dependent signaling response in lung tissue. Schematic of EGFR with the tyrosine residues that we identified as significantly regulated upon EGF stimulation is shown. For each site, the proteins identified to significantly interact with the phosphotyrosine residue are depicted together with regulated phosphorylation sites that we identified on these proteins. (D) Western blot validation of EGF-dependent interaction partners of EGFR by co-immunoprecipitation. (E) Protein interaction network of the 11 key protein complexes in the EGFR signaling network are depicted along with protein–protein interactions we have measured. The 11 key protein complexes are color-coded and the shape indicates if the protein contains a SH2, SH3/PTB, kinase or phosphatase domain. P denotes a regulated phosphorylation site. See also Figure S3 and Table S3.

The analytical framework first normalizes log-transformed protein intensities by median subtraction and calculates the protein variance for each set of pulldowns as a function of the protein intensity (Figure 2A). We exploit the strong interdependency between MS-based protein intensities and variance to determine a sigmoid function that explains the relationship between the two parameters. A key feature of our analytical tool is that the statistical test we perform to identify specific interactors is based on these estimated variances. Phosphotyrosine-interacting proteins are frequently missed in classical control experiments due to their low abundance, which hampers accurate quantitation. We solved this missing value problem by imputing values according to a hierarchy, in which median protein intensity measurements across all other pulldown experiments are prioritized over empty bead control experiments, which is prioritized over scaled total proteome measurements. The benefit of this imputation scheme comes from the empirical observation that the abundance of an unspecific background binder in a pulldown strongly correlates with its expression level in the proteome (see STAR Methods for details). For each protein in a pulldown, a modified Welch’s t-test termed significance C is performed using the estimated variance, and the data are represented in a volcano plot. A combined evaluation of the protein p-value and intensity ratio measurement determines whether the protein is classified as a specific interactor or not.

Benchmarking our analysis strategy against the classical approach of phosphopeptide pulldown versus matched unmodified peptide pulldown demonstrates an improved ability to discriminate for unspecific binding of SH2 domain containing proteins to phosphotyrosine sites (Figure S3). For example, our strategy reduced the number of significant interactors of EGFR pY1172 from several hundred to only six, among these were the two known interactors of EGFR pY1172, SHC1 and GRB2 (Schulze et al., 2005). Our approach effectively filters the data, highlighting fewer but likely more relevant interactors. Importantly, our strategy has the advantage that it eliminates the necessity of pulldowns using unmodified peptides by exploiting the quantitative information from multiple phosphopeptide pulldowns, thereby reducing the number of peptides that need to be synthesized and tested to half.

To identify which particular protein interactor is the most abundant in each pulldown, and thereby the likely direct interactor, we calculated the fractional stoichiometry of the significant proteins based on their iBAQ intensities (Schwanhausser et al., 2011) from each pulldown (Hein et al., 2015). The fractional stoichiometry can be used to prioritize direct binders among the significant interactors and is provided on a normalized scale next to each volcano plot (Figure 2A).

Protein complexes recruited to EGF-dependent phosphotyrosine sites in lung tissue

We applied the analytical framework presented above to analyze all of our 264 phosphotyrosine peptide pulldown experiments from lung tissues. This enabled us to determine specific protein complexes recruited to the 88 EGF-regulated phosphotyrosine sites in lung tissue (Table S3). Volcano plots for all 88 peptide pulldowns performed are provided on the web platform. Volcano plots for two representative examples including validation experiments by co-immunoprecipitation are shown in Figure S4. For each of the recruited protein complexes we calculated the relative stoichiometry of the protein interactors, thereby allowing us to pinpoint the most likely direct interactor. The relative stoichiometries are provided next to each volcano plot. The recruited interaction partners – direct as well as indirect – cover hundreds of proteins (Figure 2B), which underscores the complexity of the EGF-dependent signaling response beyond regulation of phosphorylation site stoichiometry. Some of the complexes identified are highlighted in Figure 2B. We integrated our dataset with information in the most comprehensive protein-protein interaction databases (BioGRID (Stark et al., 2006) and InWeb (Li et al., 2017)) and phosphoprotein resources (PhosphoSitePlus (Hornbeck et al., 2012) and Uniprot (UniProt, 2015)). Only 27 of the 88 sites were known to be EGF-dependent and of the 503 protein-protein interactions we identified, 72 were previously reported but only 16 were known to depend on the specific phosphotyrosine site (Table S4). The quantitative dataset of site interactions of in-vivo detected phophotyrosine sites represented here is the largest resource of EGF-dependent and phosphorylation site-specific protein-protein interactions.

Core protein complexes in EGFR signaling

The EGF response is initiated by dimerization and auto-phosphorylation of EGFR at numerous tyrosine sites, which both lead to an amplification of its kinase activity and creates docking sites for adaptor proteins that wire the assembly of dynamic protein complexes. To visualize the two types of regulation on EGFR, we depicted all regulated tyrosine sites together with the proteins that we identified as interacting with them (Figure 2C). Our data show that protein interactors of EGFR are site-specific. For instance, a CBL complex interacts with EGFR pY1069, a CRK/VAV complex interacts with EGFR pY1016, and a SHIP complex interacts with EGFR pY998. Seven EGFR interaction partners were independently confirmed by co-immunoprecipitations to be EGF-dependent (Figure 2D). We observe that a GRB2/GRAP protein complex interacts with multiple phosphotyrosine sites on the receptor, such as pY1091, pY1109, pY1137 and pY1172. GRB2/GRAP is the key adaptor protein activating the RAS-RAF-MEK-ERK pathway, which controls cell proliferation downstream of EGFR (Francavilla et al., 2016b; Kolch, 2005). The redundancy of GRB2/GRAP complex recruitment to multiple EGFR sites represents a means by which the cell can ensure a robust ERK response and indicates that this signaling axis is the most important part of the EGF response.

The protein interaction network of key players in the EGF response is illustrated in Figure 2E. Information to build this network was extracted by clustering protein interactors identified in at least three different peptide pulldowns (Figure 3A). We identified 11 key protein complexes, which given their frequency of co-occurrence are deemed central for the EGF response in lung tissue. For the 74 proteins in these key complexes, we depicted their protein interaction network (Figure 2E). For six of the high stoichiometry interactions shown, we measured the binding affinities by isothermal calorimetric measurements (Figure 3B). All six measurements showed direct interactions between the individual phosphotyrosine sites and the SH2 domains at sub-micromolar affinities confirming our MS-based stoichiometries. We also confirmed EGF-dependence of three interactions by co-immunoprecipitations (Figure 3C-E). From the network representation in Figure 2E, it is evident that the majority of core members of the EGFR signaling network harbor interaction domains, in particular SH2/PTB and SH3, or are kinases or phosphatases. The amino acids flanking phosphotyrosine sites are important determinants of the affinity for individual SH2 domain containing proteins (Kavanaugh et al., 1995; Schlessinger and Lemmon, 2003), and this led us to evaluate for sequence motifs in the protein complexes recruited to regulated phosphotyrosine sites. To identify sequence preferences for each of the 11 key protein complexes we analyzed the amino acid sequences surrounding the regulated phosphotyrosine sites they interact with and identified overrepresented sequence motifs using IceLogo (Colaert et al., 2009). We found specific sequence motifs for different phosphotyrosine-interacting protein complexes: GRB2/GRAP based complexes preferentially bind to the amino acid sequence motif pYxN, whereas PTPN6/PTPN11 based complexes preferentially binds to pYVxL (Figure 3F). These findings underscore the importance of the amino acid sequence in vicinity of phosphotyrosine sites for recruitment of specific protein complexes. Fifty one of the proteins in the network in Figure 2E also contain EGF-regulated phosphorylation sites, highlighting their role in extending the EGF signaling response. This combined representation of regulated tyrosine phosphorylation sites and their interactors highlights the complex core network of signaling proteins involved in the early EGF-dependent response in lung tissue.

Figure 3. Validation and bioinformatic analysis of core complexes.

Figure 3

(A) Clustering of protein co-occurrence in peptide pulldowns. To identify key players in the EGF dependent signaling response, we clustered protein occurrence for proteins identified as significant interactors in at least 3 peptide pulldowns by cosine clustering. We identify 11 protein complexes that co-occur in at least 3 pull downs. Given the frequency of their co-occurrence these 11 protein complexes are deemed central complexes in the EGF signaling response in lung tissue. (B) For three phosphotyrosine sites their binding affinities for CRK and CRKL were evaluated by isothermal titration calorimetry measurements of the corresponding phosphopeptide and purified SH2 domains from the two proteins. For each site the dissociation constant is indicated as well as the relative stoichiometry calculated from our mass spectrometry measurements. (C) Western blot confirmation of EGF-dependent interaction of GAB2 and EGFR by co-immunoprecipitation in A549 lung cancer cells. (D) Western blot confirmation of EGF-dependent interaction of PLCG1 and PIK3CA by co-immunoprecipitation in A549 lung cancer cells. (E) Western blot confirmation of EGF-dependent interaction of SH2B1 and EGFR by co-immunoprecipitation in A549 lung cancer cells. (F) For each of the key protein complexes we analyzed the amino acid sequences flanking the regulated phosphotyrosine residue that were used in the peptide pulldown experiment, where the proteins were identified. A sequence motif was generated with IceLogo. See also Figure S4 and Table S4.

MS method for functional assignment of phosphotyrosines at large scale

The ability to identify protein complexes recruited to regulated phosphotyrosines is important for depicting a more complete picture of the complex signaling networks downstream of tyrosine kinases in general. The methodological approach we have developed represents a general strategy to functionally annotate phosphotyrosine sites. To enable larger scale studies, we optimized the MS-based workflow for high-throughput analyses. In doing so, we used state-of-the-art data-independent acquisition (DIA) in combination with short LC gradients on an Evosep One system. The Evosep system is based on a fundamentally new LC concept (Bache et al., 2018) that significantly increases protein coverage when analyzed on a Q Exactive HF-X mass spectrometer with fast MS/MS scanning capabilities (Kelstrup et al., 2018). These advancements provide the possibility for a scalable methodology, where 60 pulldown experiments can be analyzed in just one day of LC-MS instrument time. To demonstrate the scalability of this method for large-scale DIA-based interaction analyses, we synthesized more than three hundred biotin-tagged peptides, which were selected based on a literature analysis prioritizing known phosphotyrosine sites in signaling proteins such as receptor tyrosine kinases and adaptor proteins. Peptide pulldowns using these 300 peptides were performed in four biological replicates of tissue lysates (Figure 4A). That is, more than 1,200 peptide based pulldown experiments were performed. These experiments were performed in liver tissue to investigate for potential tissue-specific differences in recruited protein complexes. Bound proteins were digested with trypsin on-bead and resulting peptides analyzed with 21-minute LC-MS gradients. To generate project specific spectral libraries for the DIA matching and identification, we performed deep proteome profiling of the tissue lysates by massive offline high pH chromatography and analyzed each fraction with the same LC-MS setup (Bekker-Jensen et al., 2017). Proteins identified in each individual pulldown were analyzed against all other pulldowns by label-free quantitation to identify significantly-enriched interacting proteins. This resulted in 918 significant interactors for 225 different phosphopeptide baits (Figure 4B, Table S5). As expected based on the lung pulldown experiments, several SH2 domain containing adaptor proteins were also identified as interactors of multiple phosphopeptide baits in liver lysates. For example, PTPN6, VAV2, GRB7 and SH2B1 were identified as interactors in more than fifty different pulldowns. Knowledge of the protein interactors of phosphotyrosine sites in tissues is important for identifying their tissue-specific roles and phenotypes. However, most phosphosite-protein interactions reported have no tissue-specific context. For a subset of baits, we evaluated their interaction partners in both lung and liver tissue lysates. From this data, we evaluated potential tissue-specific compositions of the interacting protein complexes. There are indeed differences in the protein complexes recruited to a particular phosphotyrosine site across tissues. For instance, for EGFR pY1109 we identified a GRB2/GRAP/GRAP2 complex as the strongest interaction in lung tissue (Figure 4C), whereas the same site preferentially bound a GRB7/GRB14 complex in liver tissue (Figure 4D). For this site, the tissue-specific differences in interactors can be explained by their differential proteome abundance. We find that GRB7 is highly expressed in liver tissue but hardly detectable in lung and, conversely, that GRAP and GRAP2 are expressed in lung tissue but not detected in liver proteomes. This tissue-specific difference in interactors highlights the importance of performing interaction screens in the appropriate tissue and context-specific lysates (Figure 4E).

Figure 4. Method optimization for large scale studies and tissue dependency of recruited protein complexes.

Figure 4

(A) Experimental outline for large-scale pulldown experiment. More than 300 peptide based pulldowns were performed in four biological replicates in lysates from either liver or lung tissues. The MS-based method was based on data independent acquisition (DIA) and fast online LC chromatography using the Evosep One system coupled to a Q Exactive HF-X orbitrap tandem mass spectrometer. (B) For 225 phosphotyrosine sites, specific interaction partners were identified. These are represented as a clustering profile highlighting the fractional stoichiometry of the interacting proteins. For five example complexes identified, the adaptor proteins recruited are illustrated. (C) In lung tissue, phosphorylation of EGFR Y1109 interacts with GRB2/GRAP/GRAP2 based complex. Results represent three replicates per experiment group with significant interactors determined by significance C. (D) In liver tissue, the same phosphotyrosine site, EGFR Y1109, interacts with a GRB7 based complex. Results represent four replicates per experiment group and significant differential interactors were determined using significance thresholds (FDR < 0.05, s0 = 0.1). (E) Model representation of the tissue specific interaction partners of EGFR Y1109 in lung and liver respectively. See also Table S5.

Oncogenic mutation near phosphotyrosine site causes molecular switch in recruited protein complex

The finding of sequence motifs for recruitment of protein complexes to phosphotyrosine sites suggest that mutations near a regulated phosphotyrosine site may impact the protein complex recruited to the site. It is known that disease-associated missense mutations can affect short linear motifs in IDRs of proteins and thereby interfere with their functions by disrupting or changing protein interactions (Vacic et al., 2012). EGFR P1019L is reported as a lung cancer mutation (Bamford et al., 2004), and it affects a highly conserved amino acid in an IDR of EGFR (Figure 5A). The P1019L mutation is located three residues downstream from a regulated phosphotyrosine site that interacts with a CRK complex also harboring VAV and RASA1 (see resource on web-portal). The CRK complex preferentially binds to a pYxxP sequence (Figure 3F), as also reported previously (Miller et al., 2008). Given this knowledge, we hypothesized that the EGFR P1019L patient mutation may hinder the protein complex formation between EGFR and CRK upon activation. Peptide-based proteomic pulldown screens have recently been employed to investigate the impact of mutations in IDRs on protein-protein interactions (Meyer et al., 2018). Accordingly, we tested the hypothesis by synthesizing a phosphorylated peptide carrying the patient mutation and compared the protein interactors between this peptide to that of the corresponding wildtype peptide. The cancer mutation completely changes the recruited protein complex. The cancer mutation introduce a molecular switch that abolish the CRK interaction and instead leads to binding of a SH2B1/SHIP2 complex (Fig. 5B, Table S6). This is consistent with our sequence motif analysis that revealed an amino acid sequence preference of VAV3, RASA1, CRK and CRKL for binding to pYxxP, and a preference of SH2B1, ZAP70, SYK and SHIP for pYxxL (Figure 5C). To confirm the specific interactions of the patient mutant versus wildtype, we produced and purified SH2 domains of the identified binding partners and performed isothermal titration calorimetric experiments to measure their affinities for the wildtype and the patient-mimicking mutant (Figure S5, Table S7). All interactions were confirmed to require tyrosine phosphorylation, and high affinities for CRK and CRKL were established for wildtype EGFR pY1016, whereas the affinities of patient mutant P1019L were switched towards SH2B1 and SYK (Figure 5D-E). That is, the lung cancer mutation P1019L introduce a molecular switch that alters the outcome of phosphorylation at EGFR Y1016.

Figure 5. Molecular switch caused by EGFR lung cancer mutation P1019L.

Figure 5

(A) Amino acid sequence conservation for EGFR P1019 site across species. (B) Interaction partners for EGFR phosphorylated at Y1016 (left) and interaction partners for the P1019L lung cancer mutation of EGFR phosphorylated at Y1016 (right). Significant interactors are determined by Significance C and their gene names are displayed and phosphotyrosine-binding domains are indicated with stars. Note CRK interaction with wildtype and SHIP2 interaction with P1019L mutant. (C) Sequence motifs generated from our large-scale pulldown dataset show amino acid sequence preference of YxxP for peptides significantly interacting with CRK, whereas there is an amino acid sequence preference of YxxL for peptides significantly interacting with SHIP. (D) ITC measurements of binding affinities between the SH2 domain of SH2B1 and EGFR peptides covering phosphorylated Y1016 in both wildtype and P1019L-mutant versions. Black circles indicates mutant peptide, whereas red triangles are wildtype. (E) Binding affinities determined by ITC for SH2 domains of CRK, CRKL, SH2B1 and SYK against EGFR peptides covering phosphorylated Y1016 in both wildtype and P1019L mutant versions. (F) Western blot validation of differential EGF-dependent interactors for wildtype and mutant EGFR by co-immunoprecipitation. Parental A549 are used as negative control of the GFP-based pull down experiment. Results are quantified in the bar-graph displaying fold-change of mutant relative to WT (mean ±SEM of two independent experiments, each representing two WT and two P1019L clones, p-values evaluated by two-sided Student’s T-test). (G) Western blot evaluation of SHC1 interaction from GFP-based pulldown of EGFR-P1019L in EGF stimulated cells after SHIP2 knock-down. (H) Immunoblots of EGFR and ERK phosphorylation dynamics for wildtype and mutant receptor as function of EGF stimulation. (I) Wound-healing assay to quantify cell migration. Images of wildtype and mutant EGFR expressing cells after EGF stimulation for 0, 24h and 48h. Quantitation of cell migration calculated as the absolute migration rate, where 0 indicates no migration and 1 equals full migration. Quantification is displayed to the right as mean ±SEM of two independent experiments, each including three WT and two P1019L clones, p-values evaluated by two-sided Student’s T-test. (J) Mutant EGFR P1019L expressing cells were transfected with Ctrl or SHIP2 siRNA and invasion assay was performed. Representative images of the transwell matrigel-based invasion assay are shown on the left.. EGF-dependent invasion rate is shown on the right as mean ±SEM of two independent experiments, each with a different P1019L clone, p-values evaluated by two-sided Student’s T-test. (K) Example of local and tail foci determination in embryonic zebrafish injected with EGFR P1019L expressing cells. (L) Quantification of foci in local and tail regions 48h after injection with wildtype or EGFR P1019L expressing cells. (M) Model representing the molecular switch introduced by the cancer mutation EGFR P1019L. Upon phosphorylation of Y1016, EGFR interacts with a CRK complex which induce a transient ERK and AKT activation response. For the cancer mutant, instead a SHIP2 complex is recruited to the receptor leading to sustained activation of ERK and AKT. See also Figures S5, S6, S7 and Table S6 and S7.

Oncogenic properties of lung cancer mutation EGFR P1019L

To evaluate the functional consequence of the molecular switch caused by the EGFR P1019L cancer mutation, we created a CRISPR/Cas9 EGFR knockout version of the lung cancer A549 cell line (Figure S6) and reintroduced GFP-tagged EGFR in either wildtype or P1019L versions (Figure S7A-C). First, to investigate differential interaction partners of the wildtype and P1019L mutant receptor we performed quantitative interaction proteomics of the receptors as a function of EGF stimulation. This experiment confirmed the specific interaction of the wildtype receptor and VAV observed in the peptide pulldown. Our results further demonstrated that GRB2 and SHC1 interacted stronger with the mutant receptor (Figure S7D-E). By co-immunoprecipitation experiments, we confirmed the increased interaction with GRB2 and SHC1 for the mutant receptor as well as the specific interaction between SHIP2 and the mutant receptor identified in the peptide pulldown experiment (Figure 5F). The lipid phosphatase SHIP2 has been reported to enhance EGFR signaling in breast cancer (Prasad, 2009), and we therefore hypothesized that binding of SHIP2 to the mutant receptor may be responsible for the increased interactions with GRB2 and SHC1. We confirmed a SHIP2-mediated interaction between SHC1 and EGFR P1019L by co-immunoprecipitation experiments showing decreased interaction between SHC1 and the mutant receptor in SHIP2 depleted cells (Figure 5G, Figure S7F). The SHC1-GRB2 complex is the master regulator of ERK activity in EGFR signaling (Bisson et al., 2011; van Biesen et al., 1995). Accordingly, the mutant receptor altered ERK signaling dynamics by changing it from a transient to a sustained response (Figure 5H, Figure S7G). Furthermore, in contrast to the wildtype receptor, activation of the mutant receptor also resulted in sustained AKT phosphorylation dynamics (Figure S7H-I). We verified that the sustained ERK signaling dynamics is mediated by the EGFR P1019L interaction with SHIP2, as the dynamics was switched back to a transient response upon SHIP2 depletion (Figure 5H). We have previously shown that sustained ERK and AKT signaling downstream of EGFR can lead to increased cell migration and proliferation (Francavilla et al., 2016b). To evaluate if the mutant receptor encodes oncogenic properties, we compared the migratory potential of lung cancer cells expressing mutant versus wildtype receptors. Indeed, expression of EGFR P1019L resulted in significantly increased cell migration (Figure 5I). Similarly, cells expressing mutant receptor also had increased cell proliferation rate (Figure S7J). We also found the cell migration properties of EGFR P1019L expressing cells to be SHIP2-dependent (Figure S7K). The SHIP2 dependency is likely mediated by its adaptor protein function as inhibition of its catalytic activity does not significantly alter migration rates (Figure S7L). To further validate the SHIP2 dependency of the oncogenic properties of EGFR P1019L, we performed a matrigel-based cell invasion assay in EGFR P1019L expressing cells after SHIP2 knock-down. The EGF-dependent invasion potential was significantly impaired in SHIP2-knockdown cells (Figure 5J). Finally, to verify the EGFR P1019L-enhanced invasion in-vivo, we made use of a zebrafish xenograft invasion model (Rouhi et al., 2010). Briefly, wildtype EGFR or EGFR-P1019L expressing cells were DiI-labelled and injected into the perivitelline space of eGFP-transgenic zebrafish embryos 48h post-fertilization in the presence of EGF ligand. Local tumor invasion and dissemination to the tail was quantified 48h later (Figure 5K). A significantly greater number of cells expressing EGFR P1019L compared to wildtype EGFR invaded locally and disseminated to the tail (Figure 5L). Accordingly, the in-vivo as well as the cell-based experiments validate and confirm the molecular switch introduced by the cancer mutation in vicinity of a regulated EGFR phosphotyrosine site. The mutation changes the recruited interaction complex from a CRK complex to a SHIP2 complex. This change leads to increased binding of SHC1, which ultimately results in a sustained activation of AKT and ERK, which endows the receptor with oncogenic properties (Fig. 5M).

Molecular switches in vicinity of phosphotyrosine sites as mechanism of action of cancer mutations

Identification of the molecular switch introduced by the oncogenic mutation EGFR P1019L led us to ask, whether this could represent a general mechanism of action. That is, in light of the oncogenic molecular switch identified, the question arises whether this is important to investigate as a general mechanism of action in disease contexts. We thus set out to address, if molecular switches in protein complex assembly at phopshotyrosine sites could represent a mechanism of action for oncogenic mutations. To probe for additional cancer-specific molecular switches, we evaluated a set of phosphopeptides covering 12 known cancer mutations (Bamford et al., 2004) by peptide pulldown experiments. For these experiments we compared the mutated phosphopeptide pulldowns to their corresponding wildtype phosphopeptide experiment. For eight of the twelve cancer mutations tested, we identified switches in their recruited protein complexes (Table S5). For example, GAB1 phosphorylated at Y317 specifically pulls down a complex consisting of CRKL and RASA1, whereas the cancer mutation GAB1 P320S completely abolishes this interaction (Figure 6A). The importance of proline in the +3 position to the phosphorylated site for the interaction with CRKL was validated by ITC experiments by which we established that the interaction between the wildtype GAB1 pY317 peptide and the SH2 domain of CRKL was 650 nM, whereas no binding was observed with the P320S mutated phosphopeptide (Figure 6B). Likewise, analyzing the protein interactions of phosphorylated CRK Y136 identified the lipid phosphatase Sacm1l as the main binder, whereas the cancer mutation CRK A134V, two amino acids upstream of Y136, changed the specific interactions to the tyrosine phosphatases, PTPN6 (SHP-1) and PTPN11 (SHP-2). This molecular switch was found in both lung and liver tissues. We further identified molecular switches introduced by five EGFR cancer mutations: EGFR P1019S, EGFR N1094Y, EGFR N1112S, EGFR N1140S and EGFR P1170S. The EGFR P1019S mutant is an alternative amino acid substitution compared to P1019L, which affects the interactors of the pY1016 site in a different manner. Both mutants result in loss of RASA/CRK complex recruitment to pY1016. However, where P1019L leads to recruitment of a SHIP2 complex P1019S silences the signaling response. EGFR N1094Y, N1112S, and N1140S all affect asparagines in +2 position to a phosphotyrosine site. Consequently, for all three mutations recruitment of GRB7 in liver is abolished. For the cancer mutation EGFR N1094Y, the molecular switch has different outcomes depending on whether the introduced tyrosine residue is phosphorylated or not, where the doubly phosphorylated pY1092/pY1094 mutant recruits a GRB14/SRC complex. The last of the molecular switches that we have profiled is EGFR P1170. This mutation leads to a gain in recruitment of the oncogenic transcription factors STAT1/2/5 to EGFR pY1172. As illustrated by the eight oncogenic mutations investigated here, cancer mutations near phosphotyrosine sites can indeed introduce molecular switches in recruited signaling complexes.

Figure 6. Molecular switches introduced by cancer mutations.

Figure 6

(A) The GAB1 P320S cancer mutation abolishes interaction at Y317 with CRKL and RASA1. Peptide based pulldown experiments were performed in liver tissue. Interaction partners for GAB1 phosphorylated at Y317 are seen on the right side of the plot and interaction partners for the cancer mutation P320S are seen on the left side. Fold enrichment of P320S over WT is plotted against the t test p value (-log10). Black lines indicate significance thresholds (FDR < 0.05, s0 = 0.1). Four replicates were performed for each peptide pulldown experiment. (B) ITC experiments measuring the affinity of GAB1 phosphorylated at Y317, wildtype (left) and P320S (right) for CRKL. High affinity interaction is only measured for wildtype. Binding affinity (KD) for wildtype was 0.65 μM ± 0.06. No binding was observed for P302S. (C) Model representations of the eight molecular switches for cancer mutations: CRK A134V, GAB1 P320S, EGFR P1019S, EGFR N1094Y, EGFR N1112S, EGFR N1140S and EGFR P1170S. Models are based on phosphopeptide pulldown experiments comparing interaction partners of wildtype and mutated sequences. Significant differential interactors between wildtype and mutant peptide pulldowns were determined by volcano plot analysis using significance thresholds (FDR < 0.05, s0 = 0.1). Four replicates were performed for each peptide pulldown experiment. See also table S5.

Discussion

We describe and validate a general proteomics strategy to resolve phosphotyrosine signaling in-vivo, including both quantitative analysis of phosphotyrosine sites and systematic examination of the dynamically regulated protein complexes they recruit. The approach thereby enables assignment of function to phosphotyrosine sites on a large scale. Phosphotyrosine interaction screens have successfully been performed using a yeast two-hybrid approach identifying hundreds of phosphotyrosine-dependent protein complexes (Grossmann et al., 2015), but this approach does not reveal which tyrosine site is responsible for the interaction. Here we based our approach on the principle of phosphopeptide pulldowns in combination with quantitative mass spectrometry (Hanke and Mann, 2009; Schulze et al., 2005) combined with the ability to perform such experiments at a large scale. To identify the protein complexes recruited to a particular regulated phosphotyrosine site, we developed an analytical framework that handles the inherent challenge of a general high affinity of SH2 domain-containing proteins towards phosphotyrosine sites. This is achieved by integration of affinity information from multiple phosphopeptide pulldowns quantitatively analyzing each phosphopeptide pulldown against multiple other phosphopeptide pulldowns and by implementing information from deep proteome measurements. We have made our analytical framework accessible to other researchers via a web-based tool http://pulldown.jensenlab.org. We have developed a strategy that allows for functional annotation of phosphotyrosine sites at large scale based on peptide pulldowns. To fully characterize recruited protein complexes at a given phosphotyrosine site at the level of the full length protein, it will be necessary to perform independent validation experiments.

Our approach represents several important advances. Firstly, we performed the peptide pulldowns in the relevant tissue lysate to establish the molecular interactions between proteins in the appropriate biological setting. This is important to ensure coverage of tissue-specific findings (Lundby et al., 2014). Our comparisons between interaction partners identified in lung and liver tissues further underscores this. It is important to notice that tissues consist of different cell types and our strategy does not enable us to identify the specific cell type in which the interaction takes place. Secondly, to the best of our knowledge, our dataset represents the first analysis of the composition of an entire phosphotyrosine protein network activated by a distinct growth factor in-vivo. Specifically, it is the first comprehensive and quantitative map of EGFR signaling in lung tissue and it identifies key components of the dynamic protein complexes formed. We demonstrate that 5 min of EGF stimulation in-vivo leads to assembly of 11 key signaling protein complexes encompassing 74 different proteins. The nodal proteins in each of the 11 complexes contain SH2 domains and their individual amino acid sequence preferences surrounding the phosphotyrosine sites dictate their binding affinities. The notion that SH2 domains recognize specific phosphopeptide sequences was established more than two decades ago, where the amino acid preferences for two groups of SH2 domains were revealed (Songyang et al., 1993). We confirmed the very specific motif pY-E/D-E/D-I for the Src family kinases (LCK, FYN, YES, and SRC) and the preference for SH2 domain of NCK1/2 to bind pY-hydrophilic-hydrophilic-P motifs. However, although SH2 domains of CRK was also originally established to prefer pY-hydrophilic-hydrophilic-P motifs, we find that the preference for hydrophilic amino acids in position +1 and +2 is not needed. On the contrary, the EGFR pY1016 site binding CRK and CRKL has the sequence pY-L-V-P indicating a stronger preference for hydrophobic amino acids in +1 and +2 positions.

The notion of sequence motifs explains why single amino acid mutations in the vicinity of phosphorylated tyrosine sites can switch the interacting adaptor protein complexes and thereby clarify the molecular mechanism of oncogenic driver mutations. Such a mechanism has previously been shown for a cancer mutation in the fibroblast growth factor receptor FGFR4 (Ulaganathan et al., 2015), and it represents a molecular mechanism of how cancer mutations can corrupt dynamic transmission properties in signaling pathways, changing cell decisions in a pathological manner (Bugaj et al., 2018). Here, we show that the lung cancer mutation EGFR P1019L introduces a switch in the signaling complex recruited to the receptor at pY1016, changing the signaling response in lung tissue. The phosphoinositol phosphatase SHIP2 is a critical regulator of signaling pathways as it negatively controls phosphatidylinositol-3,4,5-trisphosphate levels (Erneux et al., 2011). In breast cancer cells SHIP2 positively affects EGF-regulated AKT activation and induces EGF-dependent cell proliferation and tumor growth (Prasad, 2009). We found that SHIP2 enhances SHC1 interaction with the mutant receptor in an EGF-dependent manner, which leads to sustained ERK and AKT signaling ultimately increasing cell migration and invasiveness. This can explain the cellular outcome of the cancer-driving mutation EGFR P1019L. That is, we identify a specific adaptor protein, SHIP2, mediating the switch in signaling cascades responsible for the oncogenic properties of the mutant receptor. This suggests SHIP2 as a potential target for anti-cancer therapy in EGFR mutated lung cancers. We confirmed the oncogenic properties of the mutant in-vivo by using a zebrafish xenograft model. We have previously demonstrated that activation of sustained phospho-ERK and AKT signaling downstream of EGFR leads to long-term transcriptional changes inducing a gene expression program associated with cell migration, signaling and cytoskeletal rearrangements (Francavilla et al., 2016b). Similarly, this response may also explain the invasiveness and tumor progression that we identified in the mutant EGFR P1019L expressing cells. Targeting the induced proteins responsible for the cell migration phenotype may represent a treatment option for cancer patient harboring such mutations and who developed resistance to EGFR inhibitors.

From our large-scale efforts beyond EGFR mediated signaling, we identified and characterized eight other cancer mutations that also introduce molecular switches in the recruited protein complex formation upon activation of near-by phosphotyrosine sites. For GAB1, we found that an oncogenic mutation at position 320 abolishes recruitment of CRKL to pY317 upon activation, and for CRK a cancer mutation at position 136 introduces recruitment of phosphatases PTPN6 and PTPN11 upon activation of pY136. These oncogenic mutations in vicinity of phosphotyrosine sites thereby introduce molecular switches that alter the ultimate outcome of the signaling response. Delineating such tissue-specific signaling networks is important as they have the potential to offer novel candidate targets for developing alternative therapeutic interventions. We demonstrate the scalability of our approach and present it as a general strategy to evaluate functional effects of cancer mutations near phosphotyrosine sites, as well as a general strategy to investigate regulated protein–protein interactions in a tissue-specific manner.

STAR Methods

Lead Contact and Materials Availability

Further information and requests for resources and reagents should be directed to the Lead Contact, Jesper V. Olsen, by email at jesper.olsen@cpr.ku.dk

Experimental Model and Subject Details

Animal experiments

The study was carried out following approved national regulations in Denmark and with an animal experimental license granted by the Animal Experiments Inspectorate, Ministry of Justice, Denmark. Two groups of eight Sprague Dawley rats (Crl:SD, male, 200g, Charles River, Germany) were anesthetized with isoflurane. In each group, four rats were administered epidermal growth factor in isotonic saline (EGF, 100 µg/kg bodyweight) intravenously and four rats were administered isotonic saline intravenously. 3.5 min post injection the animals were perfused (1.5 min, 30 ml/min) with isotonic saline containing protease inhibitors (0.120 mM EDTA, 14 µM aprotinin, 0.3 nM valine-pyrrolidide and Roche Complete Protease Inhibitor tablets (Roche), pH = 7.4). Lungs were quickly removed and snap frozen in isopentane on dry ice. The total time from dosing to tissue collection was 5-8 min. The tissues were transferred to a solubilization buffer (1 % Triton x-100, 150 mM NaCl, 10 mM KCl, 5 mM EDTA, 50 mM Tris pH8.5) containing protease inhibitor (Roche Complete Protease Inhibitor tablets, Roche) and phosphatase inhibitors (1 mM ortho-vanadate, 5 mM sodium fluoride, 5 mM beta-glycerophosphate) in 5 µl extraction buffer per mg tissue and homogenized by ceramic beads (Precellys 24, Bertin Technologies, France), essentially as described previously (Lundby et al., 2013; Lundby et al., 2012). The samples were incubated for 2 h (20 rpm, 4 °C) and the soluble fractions were collected (10 min, 15,000 x g, 4 °C). Protein concentrations were determined (Bradford, BioRad) and from each lung extract 30 mg protein was used for further analysis.

Cell culture and ligand stimulation for EGFR-knock-out cells

A549 cells were purchased from ATCC and authenticated through the ATCC authentication service based on STR profiling (135-XV-20). They were maintained in DMEM-Glutamax (Gibco), 100 ug/mL penicillin (Invitrogen), 100 ug/mL streptomycin (Invitrogen) and supplemented with 10% FBS (Gibco). They have been checked monthly for mycoplasma. For ligand stimulation experiments, cells were serum starved overnight in serum-free medium, then stimulated for the indicated time points with 100 ng/ml of EGF or TGFα (Peprotech). Ligands were replenished every 24 hours for long-term stimulation.

Zebrafish xenograft model

Zebrafish embryos of the transgenic strain (kdrl:EGFP), expressing enhanced green fluorescence protein (GFP) under the kdrl promoter were raised at 28°C in humidified ambient air. At 24 hours post fertilization (hpf), the embryos were transferred to aquarium water containing 0.2mmol/L 1-phenyl-2-thio-urea (PTU, Sigma) for 24h. Cell lines were labelled in vitro with 1,1’-dioctadecyl3,3,3’3’-tetramethylindocarbocyanine (DiI, Sigma Aldrich) at a concentration of 5 ng/ml in PBS for 1h, re-plated in complete medium and incubated at 37°C and 5% CO2 for 24h. After labelling, the cells were collected, treated with 100 ng/ml EGF at a concentration of 20 million cells/ml in 1% FBS plus antibiotics containing DMEM and microinjected into the perivitelline space of dechorionated embryos, which were anaesthetized with 0.04mg/ml tricaine (MS-222, Sigma). Between 100-300 cancer cells were injected per fish. After injection, the embryos with labeled cells in the circulation were excluded and the remaining embryos were transferred to PTU containing aquarium water and incubated at 34°C in humidified ambient air. After 48h, the embryos were monitored using a fluorescence stereo microscope (Zeiss stereo lumar with AxioCam MRm, Carl Zeiss). Local tumor invasion and dissemination were determined by counting the red cells in the tumor region and the tail region, respectively. All animal experiments were approved by the Danish animal experiments inspectorate (2018-15-0202-0012).

Method Details

Western blotting of lung tissue samples

Immunoblotting were performed as described (Francavilla et al., 2013). Briefly, proteins were resolved by SDS-PAGE and transferred to nitrocellulose membranes (Protran, Biosciences). Proteins of interest were visualized using specific antibodies, followed by peroxidase-conjugated secondary antibodies and by an enhanced chemiluminescence kit (Amersham Biosciences). Antibodies were as follows: rabbit anti-EGFR (Upstate); mouse anti-tubulin (Sigma-Aldrich); rabbit anti-phospho EGFR Y1068, mouse anti-phospho-ERK1/2 (E10) and rabbit anti-ERK1/2 (137F5), rabbit anti-SHC, rabbit anti phosphor-SHC (Cell Signaling Technology).

Peptide preparation

Protein samples were acetone precipitated and resuspended in urea (6 M urea, 2 M thiourea). Following reduction (1 mM DTT, 30 min) and alkylation (5.5 mM chloroacetamide, 20 min in dark) the proteins were digested in-solution with endoproteinase Lys-C (1 µg/200 µg protein, 3 h) and following a 1:4 dilution of urea digestion was continued with trypsin (1 µg/200 µg protein, overnight). Enzymatic activity was quenched by reducing pH to ~2 with trifluoroacetic acid (TFA). Samples were centrifuged (20 min, 16.000 x g) and supernatants were desalted and concentrated on Sep-Pak C18 cartridges (Waters). Peptides were eluted from the cartridges with acetonitrile (MeCN), and after evaporating the organic solvents using a vacuum centrifuge, the peptides were resuspended in MOPS buffer (50 mM MOPS, pH7.2, 10 mM sodium phosphate, 50 mM NaCl). Peptides were redissolved by agitation at room temperature for 2 h and cleared by centrifugation (5 min, 16,000 x g).

Proteome samples

From each lung digest 10 µg peptides was acidified in 0.1% TFA and loaded onto in-house packed C18 STAGE tips preconditioned with 20 µl methanol (MeOH), 20 µl 80% MeCN, 0.5% acetic acid (AcOH), 2x20 µl 1% TFA, 3% MeCN. All STAGE tips were washed with 2x20 µl 8% MeCN, 0.5% AcOH.

Enrichment of tyrosine phosphorylated peptides

Peptide samples were cooled (4 °C) and tyrsosine phosphorylated peptides were enriched by incubation with a mix of agarose-conjugated anti-phosphotyrosine antibodies (PTMScan Phospho-Tyrosine Mouse mAb (P-Tyr100) and PTMScan Phospho-Tyrosine Rabbit mAb (P-Tyr-1000), Cell Signaling Technology, Danvers, MA, USA) for 2 h at 4°C. Following incubation, the samples were centrifuged and the supernatants retrieved for subsequent titanium dioxide (TiO2) enrichment. The beads were washed several times in MOPS buffer and low salt buffer followed by elution of bound peptides in 0.1% TFA. The peptides were loaded onto in-house packed C18 STAGE tips preconditioned with 20 µl MeOH, 20 µl 80% MeCN, 0.5% AcOH, 2x20 µl 1% TFA, 3% MeCN and subsequently washed with 2x20 µl 8% MeCN, 0.5% AcOH.

Phosphopeptide enrichment by TiO2

The supernatant from the phosphotyrosine enrichments were acidified to a final TFA concentration of 1%. The peptides were then concentrated on Sep-Pak C18 cartridges (Waters). Peptides were eluted from the cartridges with 2 x 0.75 ml 60% MeCN in 1% TFA. Phosphopeptides were enriched using TiO2 beads. 1 mg TiO2 beads (GL Sciences Inc, Japan) per sample were suspended in 5 µl 2,5-dihydroxybenzoic acid (DHB) (0.02 g DHB/ml 80% MeCN, 0.5% AcOH), mixed for 15 min and added to the samples, which were then incubated with gentle rotation for 15 min. The beads were washed with 100 µl 5 mM KH2PO4, 30% MeCN, 350 mM KCl followed by 100 µl 40% MeCN, 0.5% AcOH, 0.05% TFA and then resuspended in 50 µl 80% MeCN, 0.5% AcOH. The beads were loaded onto in-house packed C8 STAGE tips in 200 µl pipette tips preconditioned with 80% MeCN, 0.5% AcOH, and washed once with the same buffer, and eluted with 2x10 µl 5% ammonia and 2x10 µl 10% ammonia, 25%MeCN. Ammonia and organic solvents were evaporated using a vacuum centrifuge, and the peptides were acidified in 1% TFA, 5 %MeCN and loaded onto in-house packed C18 STAGE tips preconditioned with 20 µl MeOH, 20 µl 80% MeCN, 0.5% AcOH, 2x20 µl 1% TFA, 3% MeCN and subsequently washed with 2x20 µl 8% MeCN, 0.5% AcOH.

LC-MS/MS

Peptide mixtures were eluted into 96 well microtiter plates with 20 µl 40% MeCN, 0.5% AcOH and 20 µl 60% MeCN, 0.5% AcOH, organic solvents were evaporated, and the peptides were reconstituted in 2% MeCN, 0.1% TFA. The eluate was analyzed by online reversed-phase C18 nanoscale liquid chromatography tandem mass spectrometry on a Q-Exactive Plus quadropole Orbitrap mass spectrometer (Thermo Electron, Bremen, Germany) using a top10 higher-energy collisional dissociation (HCD) fragmentation method. The LC-MS analysis was performed with a nanoflow Easy –nLC system (Proxeon Biosystems, Odense, Denmark) connected through a nano-electrospray ion source to the mass spectrometer. Peptides were autosampled and separated by a linear gradient of increasing acetonitrile in 0.5% formic acid for 180 min in a 15 cm fused-silica emitter in-house packed with reversed-phase ReproSil-Pur C18-AQ 1.9 μm resin (Dr. Maisch GmbH, Ammerbuch-Entringen, Germany). Full-scan MS spectra (m/z 300−1750) were acquired with 70,000 resolution after accumulation of ions to a predictive automatic gain control (AGC) target value of 1 × 106. The 10 most intense ions were sequentially isolated and fragmented in the octopole collision cell by HCD with 35,000 resolution for phosphopeptide-enriched samples and resolution 17,500 for proteome samples. For phosphopeptide enriched samples the gradient was 135 min and for proteomes the gradient was 90 min.

Identification and quantification of phosphorylated peptides

Raw MS files were processed using the MaxQuant software (ver.1.0.14.7 for phosphopeptide enriched samples and ver.1.4.0.6 for proteome samples, www.maxquant.org), by which the precursor MS signal intensities were determined and HCD MS/MS spectra were deisotoped and filtered such that only the ten most abundant fragments per 100-m/z range were retained. Phosphopeptides were identified using the Mascot search algorithm (www.matrixscience.com) by searching all MS/MS spectra against a concatenated forward/reversed version of rat and mouse International Protein Index v.3.37 protein sequence database supplemented with protein sequences of commonly observed contaminants, such as human keratins and porcine trypsin. Peptides from the proteome samples were identified using the Andromeda search algorithm and a concatenated sequence database of all mouse and rat protein sequences deposited in the Uniprot database. The HCD-MS/MS spectra were searched with fixed modification of carbamidomethyl-cysteine and we allowed for variable modifications of oxidation (M), acetylation (protein N-term) and Gln->pyro-Glu. For the phosphopeptide-enriched samples phosphorylation (STY) was also a variable modification. Search parameters were set to an initial precursor ion tolerance of 7 ppm, MS/MS tolerance at 0.02 Da and requiring strict tryptic specificity with a maximum of two missed cleavages. Label-free peptide quantitation and validation were performed in the MaxQuant software suite. Phosphorylated peptides were filtered based on Mascot score, PTM (Andromeda) score, precursor mass accuracy, peptide length, and summed protein score to achieve an estimated FDR<0.01 based on the forward and reversed identifications. The minimum required peptide length was set to six amino acids. We required a minimum Mascot score of 10, a minimum Andromeda score of 25 and a delta score to the next best match of at least 5.

Peptide pulldowns

88 peptides comprising 13 amino acids, one of the regulated phosphotyrosines listed in Table 1 and the 6 amino acids flanking the phosphotyrosine on each side, and a biotinylated PEG2 linker in the N-terminus and an amide group in the C-terminus were synthesized by Biosyntan GmbH, Germany. 750 µg of purified peptide was used for three independent pulldowns. All peptides were coupled to sepharose streptavidin beads (GE Healthcare) by incubating beads with an excess of peptide at room temperature for 2h in incubation buffer (150 mM NaCl, 50 mM Tris pH 8.0, 0.1 % NP40). After extensive washing, the beads were transferred to 96 well multiscreen filter plates (Millipore, MSBVN1210) and liquid was removed by slow centrifugation (30 s, 60 g).

The lung tissue lysates used as input material for the peptide pulldowns were prepared from homogenized tissue from different rats treated with saline. Prior to the experiments, the tissue lysate was filtered at 0.2 µm and pre-cleared by incubation at 4 °C for 30 min with washed sepharose streptavidin beads (GE Healthcare) using 25 µl beads per mg lysate. For each peptide pulldown 2 mg of input material was used. Tissue lysate and peptides were incubated for 45 min at 4 °C while gently shaking in incubation buffer (50 mM Tris pH 8.5, 5 mM EDTA, 150 mM NaCl, 10 mM KCl, 0.1% Triton x-100, 0.5 mM DTT, Complete protease inhibitor cocktail tablet from Roche, 5 mM NaF, 5 mM beta-glycerophosphate, 1 mM sodium-orthovanadate). Beads were washed eight times, twice in 50 mM NaCl, twice in 150 mM NaCl and four times in water. Bound proteins were digested directly on the beads, as previously described (Eberl et al., 2013). 25 μl digestion buffer (2 M urea, 1 mM DTT, 120 ng trypsin) was added and the samples incubated for 30 min at room temperature with gentle agitation before elution and collection in a new plate. The beads were next incubated twice in 50 μl buffer (2 M urea, 5 mM iodoacetamide) for 10 min each and these eluates were collected in the same plate as above. Proteins were digested overnight at room temperature. The digest was terminated by quenching with TFA and the peptides were concentrated and desalted on C18 stage tips. An equivalent approach was applied for all other peptide pulldown experiments described in figs. 3 and 4.

Samples were eluted into 96 well plates and analyzed by LC-MS/MS on a Q-Exactive quadropole Orbitrap mass spectrometer as explained above in the LC-MS/MS section. The method used for the peptide pulldown samples applied a linear gradient of increasing acetonitrile in 0.5% formic acid for 60 min, full-scan MS spectra (m/z 300−1750) were acquired with 70,000 resolution after accumulation of ions to a predictive AGC target value of 1 × 106 and the 10 most intense ions were sequentially isolated and fragmented in the octopole collision cell by HCD with 17,500 resolution.

Production and purification of SH2 Domains

The SH2 domains of CRK, CRKL, SYK-N, SYK-C, and SH2B1 were produced and purified as previously described (Francavilla et al., 2013). Briefly, the SH2 domain constructs cloned into pNIC28-Bsa4 vectors were prepared for Isothermal Titration Calorimetry (ITC) experiments as follows. The E. coli BL21(DE3) R3 T1 cells expressing the 6His-tagged proteins were transformed and grown in Terrific Broth media supplemented with both 50 µg/ml kanamycin and 50 µg/ml chloramphenicol. The cell cultures were grown at 37 °C and when the OD600 reached a value of 1.0-1.5 cells were harvested by centrifugation and the cell pellets resuspended in lysis buffer, consisting of buffer A (50 mM NaP, pH 7.5; 300 mM NaCl; 10 mM imidazole; 10% glycerol; 0.5 mM TCEP (tris(2-carboxyethyl)phosphine)) supplemented with Complete Mini EDTA-free protease inhibitor (Roche) and 50 U/ml Benzonase. The cells were lysed using a high pressure homogenizer (Avestin), followed by centrifugation and the cell lysate was filtered through a 0.22 µm PES bottle top filter and purified on a ÄKTA Xpress system at 4°C. The proteins were purified by loading the clarified cell lysates onto 5 ml HiTrap chelating columns charged with nickel. Bound proteins were washed with buffer A containing 30 mM imidazole and eluted with buffer A containing 500 mM imidazole. The eluted proteins were further purified by size exclusion chromatography, i.e., loaded onto a HiLoad 16/60 Superdex 75 gel filtration column equilibrated with buffer B (50 mM NaP pH 7.5, 150 mM NaCl; 10% glycerol; 0.5 mM TCEP) and peak fractions pooled. All purified domains were analyzed by SDS-PAGE and the correct mass verified by ESI mass spectra recorded on a micrOTOF-Q II (Bruker) operated in positive mode.

Isothermal Titration Calorimetry (ITC)

Synthetic peptides were purchased from Peptide 2.0 Inc (Chantilly, VA, USA). The purity obtained in the synthesis was 95 – 98 % as determined by HPLC and subsequent analysis by mass spectrometry. Peptide C-termini were capped with amide groups. The synthesized peptides used for the ITC experiments were EGFR1010-1022 (VVDADEYLIPQQG), its mutant EGFR1010-1022 P1019L (VVDADEYLILQQG) together with their respective phosphorylated peptides, EGFR1010-1022 pY1016 (VVDADEpYLIPQQG) and EGFR1010-1022 pY1016/P1019L (VVDADEpYLILQQG). Prior to ITC experiments both the protein and the peptides were extensively dialyzed against 20 mM sodium phosphate pH 7.2, 150 mM NaCl, 0.5 mM TCEP. All ITC experiments were performed on an Auto-iTC200 instrument (Microcal, Malvern Instruments Ltd.) at 25 °C. Concentrations of the SH2 domains were determined using a spectrophotometer by measuring the absorbance at 280 nm and applying values for the extinction coefficients computed from the corresponding sequences by the ProtParam program (http://web.expasy.org/protparam/). Peptides were also quantitated spectroscopically using extinction coefficients of 2330 M-1 cm-1 (at 293 nm for tyrosine in 0.1 M NaOH) and 652 M-1 cm-1 (at 267 nm for phosphotyrosine in water) (Apostol et al., 1985; Cousins-Wasti et al., 1996). The non-phosphorylated peptides at approximately 500 μM concentration were loaded into the syringe and titrated into the calorimetric cell containing the SH2 domains at ~ 30 μM. The phosphorylated peptides were also titrated into the sample cell containing the SH2 domains (at the concentrations stated in the table below). The reference cell was filled with distilled water. In all assays, the titration sequence consisted of a single 0.4 μl injection followed by 19 injections, 2 μl each, with 150 s spacing between injections to ensure the return of the thermal power to the baseline before the next injection. The stirring speed was 750 rpm. Control experiments with the peptides injected in the sample cell filled with buffer were carried out under the same experimental conditions. These control experiments showed heats of dilution negligible in all cases. The heats per injection normalized per mole of injectant versus the molar ratio [peptide]/[SH2 domain] were fitted to a single-site model. Data were analyzed with MicroCal PEAQ-ITC (version 1.1.0.1262) analysis software (Malvern Instruments Ltd.).

Affinities and thermodynamic values of SH2 domains-peptide binding events inferred from ITC measurements performed at 25 °C. Gibbs free energy (ΔG), enthalpy (ΔH), entropy (-TΔS), equilibrium dissociation constant (KD), reaction stoichiometry (n) and concentration of both species (SH2 domains and peptides) used in the ITC experiments are shown. The protein-peptide interaction affinity is defined by the Gibbs energy for binding ΔG = RT lnKD.

Generation of EGFR knock-out cells

The CRISPR-Cas9 mediated knockout of EGFR protein in A549 cells was performed as previously described (Ran et al., 2013). Briefly, sgRNAs from exon1 and exon10 of the EGFR gene were designed using the online CRISPR design tool (http://crispr.mit.edu/) and the highest scoring two sgRNA sequences were selected for cloning into pSpCas9(BB)-2A-GFP (PX458) (Addgene # 48138) to avoid off-target effects. The final sequences of sgRNA oligos were:

Oligo Sequence (Forward) Score
sg-exon1-1 CACCGCTGCGCTCTGCCCGGCGAGT 90
sg-exon1-2 CACCGTCCTCCAGAGCCCGACTCGC 90
sg-exon10-1 CACCGGATATTCTGAAAACCGTAA 74
sg-exon10-2 CACCGTACTCCTCCTCTGGATCCAC 73

All four constructs (i.e., 2 each from each exon) were confirmed by sequencing. A549 cells (106) were electroporated with four combinations of 50 ng each of exon1 and exon10 targeted CRISPR constructs (e.g., exon1-1 + exon10-1, exon1-1 + exon10-2, etc.) using the Neon transfection system (Thermo Fisher Scientific) as per manufacturer’s instructions. After 48 hours of transfection cells were FACS sorted for GFP expression and plated with a density of ~0.5 cells per well of 96-well plates. Cells were regularly inspected for clonal expansion and mixed population wells were discarded. EGFR knockout clones were pre-screened using immunofluorescence microscopy against EGFR and selected clones were subsequently confirmed using immunoblotting.

siRNA silencing

Cells were transfected with 25 nM siRNA anti-INPPL1 (Dharmacon) and the corresponding control scrambled siRNA (Dharmcon) using Lipofectamine RNAiMAX reagent (Invitrogen) following the manufacturer’s recommendations in antibiotics free media. Cells were analyzed 48-72 hours after transfection.

Plasmids, insertional mutagenesis and cell transfection

GFP-EGFR expression plasmid was purchased from Addgene. Insertional mutagenesis was performed using QuikChange II Site-Directed Mutagenesis Kit (Agilent), following the manufacturer’s instructions. The mutation in the EGFR construct was confirmed by DNA sequencing using the “EGFP-N reverse primers” listed above.

Stable cell transfection was performed using Lipofectamine LTX Kit with Plus reagent (Invitrogen), according to the manufacturer’s instructions.

Stably transfected single cell clones selection and screening

Cells were grown in selection medium containing G418 (Geneticin, Thermo Fisher Scientific), starting after 24 hours from cell transfection. After three weeks selection, single GFP positive cells were plated in 96 wells plate through FACS sorting (BD FACS Aria III sorter), after DAPI staining to exclude the dead cells. After two weeks, GFP-positive clones were screened by flow cytometry (BD LSR Fortessa analyzer) for GFP expression and analyzed by western blot for EGFR and GFP expression.

Stably transfected cells were maintained at 70% minimum GFP expression to perform all the experiments.

Genomic DNA sequencing

Genomic DNA was extracted using DNeasy Blood & Tissue Kit (Qiagen). The region surrounding the mutation was amplified by PCR. The PCR product was cleaned up and sequenced using the “EGFP-N reverse primer”.

Immunoprecipitation and western blot

GFP pulldown was performed using GFP-Trap beads (ChromoTek) following the manufacturer’s instructions. Proteins were eluted from beads by boiling in LDS sample buffer (Novex) supplied with DTT to perform western blot analysis. Protein extracts preparation, immunoprecipitation and western blot experiments were performed as described elsewhere (Francavilla et al., 2016b). Briefly, proteins were resolved by SDS-PAGE and transferred to nitrocellulose membranes (Protran, Biosciences). Proteins of interest were visualized using specific antibodies, followed by peroxidase-conjugated secondary antibodies and by an enhanced chemiluminescence kit (Amersham Biosciences). Quantification analysis of western blot was performed with ImageJ software.

Cell proliferation assay

5000 cells were seeded in a 96 wells plate, starved overnight in serum-free medium and stimulated with EGF for 72 hours. Cell viability was detected by using Cell Counting kit-8 (Tebu-bio) following the manufacturer’s instructions. Cell viability is presented as mean [OD450nm] test wells (EGF stimulated)/[OD450nm] control wells (no EGF).

Cell migration assay (wound healing scratch assay)

Cells were seeded into 6-well plates at 95% confluence and starved overnight in serum-free or 1% FBS containing medium. Scratch wounds were made through the cell monolayer with a 20-200 μl pipette tip in each well and were washed with fresh medium to remove floating cells. Cells were then stimulated with EGF for 24 and 48 hours. Wound distance was measured in four different locations for each well at 0, 24, and 48 hours. A no ligand control was kept until 48 hours and no migration was observed, so the time 0 hours EGF was used as a reference for scratch measurement. Migration rate [1 – (scratch area after 24 or 48 hours of EGF stimulation/scratch area at 0 hours EGF)] was quantified using the wound-healing tool of ImageJ (http://dev.mri.cnrs.fr/projects/imagej-macros/wiki/Wound_Healing_Tool).

Cell invasion assay

Transwell matrigel-based invasion assay has been performed by using the Corning BioCoat Matrigel invasion chambers (ref number 354480) following the manufacturing instructions. Briefly, cells have been transfected either with siRNA against ShIP2 or a control sequence for 48 hours in medium containing 10% FBS but no antibiotics. Cells have been trypsinized, counted by trypan blue exclusion and resuspended in serum free medium plus antibiotics at a concentration of 1x105/ml. 5x104 cells in 500 µl have been plated on top of the matrigel transwell, prior to incubation of the transwells in serum free medium plus antibiotics at 37 degrees for 2 hours. EGF (or PBS as negative control) was added to the cells at 100 ng/ml. The matrigel transwells have been placed on top of 750 µl of medium containing 10% FBS (chemoattractant), EGF (100) plus antibiotics and incubated at 37 degrees for 24 hours. Cells that did not invade the matrigel were washed out by PBS using a cotton swab and the invading ones were stained by Shandon Kwik-Diff Stain kit (Thermo Fisher Scientific). Pictures have been taken at an inverted light microscope in 3 random fields for each chamber. Invading cells have been counted afterwards by using the ImageJ software. Invasion rate has been calculated dividing the number of invading cells in the EGF condition for the invading cells without EGF.

Sample preparation for mass spectrometry A549 GFP-EGFR experiments

GFP pulldown was performed as mentioned above. Proteins were eluted in 100 μl of GndCl buffer for 10 min at 99 °C, as previously described (Poulsen et al., 2013). Proteins were digested with 1 μl Lys-C (Wako) for 1 h followed by a threefold dilution with 25 mM Tris, pH 8.5, to 2 M GndCl and further digested overnight with 1 μl trypsin (Sigma Aldrich). Protease activity was quenched by acidification with trifluoroacetic acid (TFA) to a final concentration of approximately 1%, and the resulting peptide mixture was concentrated using reversed-phase Sep-Pak C18 stage tips (Rappsilber et al., 2007).

Sample preparation for proteome analysis was performed as previously described (Bekker-Jensen et al., 2017). Briefly, after cell lysis, protein digestion and peptide desalting, 120 μg of peptides were fractionated in 12 fractions through offline high pH reversed-Phase HPLC fractionation. Samples were acidified with formic acid to a final concentration of approximately 0.1% prior to concentration using vacuum centrifugation.

Liquid chromatography-tandem mass spectrometry (LC-MS/MS) A549 GFP-EGFR experiments

Peptides were reconstituted in 5% acetonitrile, 0.1% TFA and 1 µg from all samples was analyzed with an EASY-nLC system (Thermo Fisher Scientific) connected to a Q Exactive HF (for proteome analysis) or a Q-Exactive HFX (for EGFR interactome analysis) mass spectrometers (Thermo Fisher Scientific), through a nanoelectrospray ion source. Peptides were separated on a 15-cm (for proteome) or 50-cm (for EGFR interactome analysis) analytical column (75-μm inner diameter) in-house packed with 1.9 μm reversed-phase C18 beads (Reprosil-Pur AQ, Dr. Maisch), with 60 min gradients (for A549 proteome analysis) or 145-min gradients (for EGFR interactome analysis). The instruments were operated in data-dependent acquisition mode, with settings as previously described (Bekker-Jensen et al., 2017).

Raw MS data analysis for A549 GFP-EGFR experiments

Raw data were analyzed with the MaxQuant software suite (Cox and Mann, 2008), either version 1.5.7.5 (for A549 proteome analysis) or 1.5.8.4 (for EGFR interactome analysis). Proteins were identified with parameters previously described (Bekker-Jensen et al., 2017) and quantified using the LFQ algorithm integrated in MaxQuant (Cox et al., 2014). Briefly, all raw LC–MS/MS data were searched using the Andromeda Search engine against the rat and mouse UniProt database including all Swiss-Prot and TrEMBL entries as well as all isoforms. In addition, the default contaminant protein database was included and any hits to this excluded from further analysis. Carbamidomethylation of cysteine was specified as fixed modification for all groups. Variable modifications considered were oxidation of methionine, protein N-terminal acetylation, pyro-glutamate formation from glutamine and phosphorylation of serine, threonine, and tyrosine residues. Match between runs function was used for the EGFR interactome analysis. The false discovery rate (FDR) was set to 1% on peptide spectrum matches (PSM), PTM site and Protein level. MaxQuant make use of the target-decoy search strategy to estimate and control the extent of false-positive identifications using the concept of posterior error probability (PEP) to integrate multiple peptide properties, such as length, charge, number of modifications, and Andromeda score into a single quantity reflecting the quality of a peptide spectrum match (PSM). A second level of FDR control is set on the list of reported protein groups by calculating a Protein group score. This is the product of individual PEPs of the peptides of a protein group, and includes a factor to take into account the number of peptides per protein group. The protein group score is similar to the PEP, in that it provides a measure of the certainty of protein identification.

Data analysis for A549 GFP-EGFR experiments

Data analysis was performed using Perseus software (Tyanova et al., 2016). For the analysis of the A549 proteome, significance B test was performed using correction for multiple testing by the Benjamini–Hochberg method (FDR < 0.05). Missing values were imputed in all samples using Perseus default settings. For the analysis of the EGFR interactome, LFQ protein intensity values were normalized on EGFR expression, to account for uneven efficiency during individual pulldowns performed in parallel. Missing values were imputed in control samples using Perseus default settings. To find differential interactors between wildtype and mutant EGFR receptors, we first calculated EGF/Ctrl ratio for each replicate. We then performed significance B analysis comparing mutant versus wt EGFR average EGF/Ctrl ratio, using correction for multiple testing by the Benjamini–Hochberg method (FDR < 0.05).

Data independent analysis (DIA)

For DIA analysis of peptide pulldowns, iRT peptides (Biognosys AB) were spiked-in to the tryptic digests prior to Evotip loading and LC-MS analysis according to the manufacturer’s protocol. All samples were analyzed on the Evosep One using an in-house packed 12 cm, 150 μm i.d. capillary column with 1.9 μm Reprosil-Pur C18 beads (Dr. Maisch, Ammerbuch, Germany) using the 60 samples per day preprogrammed gradient. The column temperature was maintained at 40 °C using an integrated column oven (PRSO-V1, Sonation, Biberach, Germany) and interfaced online with the Q Exactive HF-X mass spectrometer. The mass spectrometer was operated in data-independent acquisition mode using a 2 second scan cycle that consisted of a full-scan MS recorded with 120,000 resolution using 3e6 ions followed by 48 MS/MS scans at 15,000 resolution with 3e6 ions or maximum injection time of 22 ms. The MS/MS isolation windows were set 15 m/z units with 1 m/z unit overlap. All Data independent analysis (DIA) raw files were processed with Spectronaut version 11 (Biognosys, Zurich, Switzerland). Project specific spectral libraries were imported from the separate MaxQuant analysis of the combined analysis of the 46 pre-fractionated fractions of each liver and lung lysates, and DIA files were analyzed using default settings.

Isothermal Titration Calorimetry (ITC) of Gab1 Y317 peptides

Peptides were purchased from Peptide 2.0 Inc (Chantilly). The purity obtained in the synthesis was 95 – 98 % as determined by high performance liquid chromatography (HPLC) and subsequent analysis by mass spectrometry. Peptide C-termini were capped with amide groups. A W residue was added to the N-terminus for UV quantification of the peptide concentration. The synthesized peptides used for the ITC experiments were hGab1311-323 I319V, pY317 (WPTPGNT(pY)QVPRTF), hGab1311-323 I319V, Y317 (WPTPGNTYQVPRTF), hGab1311-323 I319V, pY317, P320S (WPTPGNT(pY)QVSPRTF) and hGab1311-323 I319V, Y317, P320S (WPTPGNTYQVSRTF). Prior to ITC experiments both the protein and the peptides were extensively dialyzed against 50 mM sodium phosphate pH 7.5, 150 mM NaCl, 0.5 mM TCEP. All ITC experiments were performed on an Auto-iTC200 instrument (Microcal, Malvern Instruments Ltd.) at 25 °C. Concentrations of both the SH2 domains and the hGab1311-323 peptides were determined using a spectrophotometer by measuring the absorbance at 280 nm and applying values for the extinction coefficients computed from the corresponding amino acid sequences by the ProtParam program (http://web.expasy.org/protparam/). hGab1311-323 peptides at approximately 300 μM concentration were loaded into the syringe and titrated into the calorimetric cell containing the CRK5-125 or CRKL6-112 SH2 domains at ~ 25 μM. The reference cell was filled with distilled water. In all assays, the titration sequence consisted of a single 0.4 μl injection followed by 19 injections, 2 μl each, with 150 s spacing between injections to ensure that the thermal power returns to the baseline before the next injection. The stirring speed was 750 rpm. Control experiments with the hGab1311-323 peptides injected in the sample cell filled with buffer were carried out under the same experimental conditions. These control experiments showed heats of dilution negligible in all cases. The heats per injection normalized per mole of injectant versus the molar ratio [hGab1311-323 peptide]/[SH2 domain] were fitted to a single-site model. Data were analysed with MicroCal PEAQ-ITC (version 1.1.0.1262) analysis software (Malvern Instruments Ltd.) and AFFINImeter (version 2.1892.3).

Limitation of experimental approach

A potential limitation of our experimental design is that we make use of rat lung tissue but the molecular components of the biology of rat and human EGFR-signaling networks might differ. Similarly, the high blood concentration of EGF supplied is likely not observed under normal physiological conditions, but only in specific pathophysiological conditions such as local concentration of EGF in some tumors. Another limitation of our study design is that we have only investigated interactions caused by short linear motifs (SLiMs), whereby we excluded protein-protein interactions guided by tertiary folded protein conformation structures. Our experimental approach is based on the use of tissue lysates which also does not allow us to distinguish between the various cell types present in lung tissue or the subcellular localization of the proteins identified, and accordingly we cannot make specific claims about these. For this reason, we used a variety of validation experiments and designed the experiments as close to physiological conditions as experimentally possible with current state-of-the-art technology. We have generally used stringent and conservative cutoffs to deem phosphorylation sites and interactions significant, so the biology is likely even more complex. Moreover, for the functional validation experiments performed in the A549 lung cancer cells not all EGF-dependent interactions could be validated. This is likely due to differences in protein expression profiles between rat lung tissue and the lung cancer cell line. For instance, for the VAV-CRK complex we could validate Crkl and Vav3 as EGF-dependent interactors to EGFR, but not Crk, Vav1 and Vav2. Likewise, since we found that SYK, SHIP1 and Zap70 are not expressed in A549 cells based on our deep proteome profiling, we confirmed SHIP2 and SH2B1 as EGF-dependent interactors of EGFR P1019L in these cells. Although there are limitations to our analysis, our results clearly show that this does not preclude the identification of novel phosphotyrosine-dependent protein-protein interaction networks and genetic variants relevant to EGF signaling in-vivo.

Quantification and Statistical Analysis

All statistical and bioinformatics analyses were done using the freely available software Perseus (Tyanova et al., 2016), MaxQuant (Cox and Mann, 2008), Andromeda (Cox et al., 2011), Mascot (Perkins et al., 1999),Spectronaut (Bruderer et al., 2015), Uniprot Knowledgebase (UniProt, 2015), R framework, Bioconductor R-package LIMMA (Bolstad et al., 2003), IceLogo (Colaert et al., 2009), Cytoscape (Shannon et al., 2003), STRING (Szklarczyk et al., 2019), ImageJ (Schneider et al., 2012), Microsoft Office, innateDB (Lynn et al., 2008), ProtParam program (Gasteiger et al., 2003), MicroCal PEAQ-ITC (Linkuviene et al., 2016), and AFFINImeter (Pineiro et al., 2019). All measured peptide intensities were normalized using the ‘normalizeQuantiles´ function from the Bioconductor R package LIMMA (Bolstad et al., 2003), which normalizes the peptide intensities such that each quantile for each sample is set to the mean of that quantile across the dataset, resulting in peptide intensity distributions that are empirically identical. The proteome samples, the phosphotyrosine enriched samples, and the phosphopeptide-enriched samples were normalized individually. Subsequent data analysis was performed using Microsoft Office Excel and Perseus (Cox and Mann, 2012).

The proteome data were used to estimate the false discovery rate (FDR) for falsely identifying a peptide as being up-regulated between the two groups of rats, as global protein expression changes are not expected after brief EGFR stimulation. The duplicate animal experiments were analyzed individually. The normalized data for all modification specific peptides identified in the proteome samples were used, the peptide intensities were log2-transformed and all contaminants were removed. Only unmodified peptides identified in at least three of the four rats were included in the analysis. Intensity-based ratios and t-test based p-values were calculated for all peptides between the two groups of rats. For the first experiment, 14 peptides out of 12,162 up-regulated peptides had a ratio > 4 and a p-value < 0.05, which gives an estimated FDR of 0.0012. For the second experiment, 14 peptides out of 8,937 up-regulated peptides had a ratio > 4 and a p-value < 0.05, which gives a FDR of 0.0016. Based on this analysis we deem phosphorylated peptides with intensity ratios >4 and p-values < 0.05 in the comparison between EGF stimulated rats versus control rats to be statistically significantly up-regulated.

Measured peptide intensity ratios between EGF stimulated rats and control rats were compared for all phosphotyrosine peptides identified in both the phosphotyrosine enriched samples and in the phosphopeptide-enriched samples. This set of overlapping peptides was used to normalize the peptide intensity ratios measured in the phosphotyrosine-enriched samples relative to the phosphopeptide-enriched samples. Only phosphopeptides identified in at least half of the EGF stimulated rats were included in the subsequent analysis. For the phosphotyrosine enriched samples, missing data were imputed, if there were fewer than three identifications in the control animals, using normal distributed intensities (i.e. normal distribution with width 0.3 and left-shifted 1.8 compared with distribution of all measured intensities). The duplicate animal experiments were analyzed individually. For all phosphotyrosine-enriched samples and for all phosphopeptide-enriched samples intensity ratios and t-test based p-values were calculated for all phosphopeptides for the EGF stimulated rats relative to the control rats. Phosphopeptides with intensity ratios > 4 and p-values < 0.05 in the comparison between EGF stimulated rats versus control rats were considered significantly up-regulated.

Fractional stoichiometry in peptide pulldowns

The median iBAQ intensities from each set of triplicate pulldowns were used to estimate the relative protein abundance (Schwanhausser et al., 2011). To calculate the fractional stoichiometry of the protein interactors in each pulldown experiment, the iBAQ intensity of each significant interactor was divided with the iBAQ intensity of the significant interactor with the highest iBAQ intensity in each pulldown experiment. The fractional stoichiometries of the protein interactors were plotted next to the volcano plots as provided in the online webtool: http://pulldown.jensenlab.org/static/examples/sup_fig_4.pdf.

Phosphosite bait-sequence motif analysis

We performed sequence motif analysis of the phosphopeptide sequences recruiting the 11 core complexes (Suppl. Fig. 5A). The phosphopeptide sequences were analyzed against the human proteome and visualized as logo plots using the IceLogo software.

Computational analysis of peptide pulldown MS data

Imputation strategy for peptide pulldown data

A general challenge for quantitative proteomics approaches is the inherent problem of missing values, which is in part due to the stochastic nature of MS measurements. However, in general the problem of missing values is greater for low abundant proteins than for highly abundant proteins. The motivation for imputation in a dataset like ours is simple, as there will be cases when the protein intensity is below the detection limit in the control samples but above the detection limit in a specific peptide pulldown experiment. This suggests that there is a significant difference between the two samples, but few statistical methods can quantify this difference. We initially imputed missing values by the default imputation scheme used by the software package Perseus (Tyanova et al., 2016). However, as described below, this did not work well for this particular type of data. We therefore set out to develop another imputation scheme. To evaluate the imputation strategy, we first correlated the protein intensities measured in three control pulldown replicates from rat lung tissue against each other to visualize the distributions before imputation.

The default imputation scheme for the proteomics software package Perseus is to calculate the sample mean and standard deviation from the measured protein intensities and then impute values that are drawn randomly from a normal distribution with a mean that is left-shifted by 1.8 standard deviations and has a width of 0.3 compared to the experimental data.

Based on our empirical data we inferred that a potential better imputation strategy would be based on protein intensities measured in matched proteome controls. The rationale for this relies on that we observed a strong correlation between protein intensities of background binders and protein intensities in matched proteomes.

The protein intensity of unspecific background binders in control pulldown experiments are correlated with the protein intensity measured in matched proteome samples. Based on empirical data we inferred that it appeared more appropriate to impute missing values based on matched proteome measurements. Thus, when a protein measurement is missing in a control pulldown experiment the missing value will be imputed. Using the protein intensity offset between the control pulldowns and the proteome, i.e. the parameter b in y = x + b, we can apply the following imputation scheme: if a protein intensity is missing in a control pulldown but an intensity is measured in the proteome, then impute the intensity measured in the proteome subtracted with the offset.

Missing values

We calculated the average of the log10 MS-based protein intensities from four control pulldown experiments performed with empty beads as well as for four matched lung proteome measurements. The offset between these two sets of average intensities were fitted using least squares (y=x+b, where b is the offset) to estimate the scaling factor between empty beads pulldowns and proteome measurements. This fit was subsequently used to infer control intensities from reference proteome intensities for proteins with missing values in the control experiments. For full explanation of imputation scheme including empirical data see http://pulldown.jensenlab.org.

Variance estimate

To estimate the variance of protein intensities we explored an empirical observation that the intensity of a protein and its variance in MS based pulldown experiments appear interdependent. Accordingly, we performed experiments of four control pulldown replicates with only beads (i.e. no peptide present) in four different rat lung tissue lysates. All samples were analyzed by LC-MS in the same way to the remaining pulldowns described in our manuscript. To determine the interdependence between protein intensity and protein variance in pulldown experiments we correlated the variance between the four replicates as function of the average measured protein intensities. An approximation of the relationship between the protein intensities and their corresponding variance given by a four-parameter sigmoidal function:

sigmoid(x)=ad1+eb(xc)+d

We apply this four-parameter sigmoidal function to estimate the variance.

Significance C

Significance C is a modified Welch t-test that uses the 4-parameter sigmoid fit to estimate sample variance. Significance C uses the standard t-test variables (t and ν):

t=X1X2s12N1+s22N2v=(s12N1+s22N2)2s14N13+s24N23s12=sigmoid(X1)s22=sigmoid(X2)

where Xi, si and Ni are the sample mean, standard deviation and size of the i’th sample.

Individual protein ratios between bait pulldowns and controls were calculated as the difference between the averages of the normalized intensities in the specific bait pulldown and the corresponding median of the intensities of the same protein in all pulldowns if the protein has been detected in more than half of the pulldowns. However, protein intensities from empty bead pulldowns or scaled proteomes were used if a protein has been identified in less than half of the specific phosphopeptide pulldowns. The ‘worst case scenario’ of empty beads pulldown or the scaled protein intensity is imputed as proxy; i.e. the larger of the two.

One-sample Significance C test

In cases where protein intensities have been measured in all replicates of a peptide pulldown experiment, but no protein intensity have been measured in neither the control pulldowns nor in the proteome measurements, we face yet another challenge. These cases are potentially of great biological interest, as they point to highly specific interactions in the peptide pulldown experiment. But statistically it is a challenge how to handle these identifications. It is tempting to assume that the protein intensities in the control pulldowns will be close to zero, or at least below our detection limit, resulting in a protein ratio (pulldown/control) that approaches infinity. Given the restraints we have, we find that the best approach we can take to handle these specific proteins is to assume that a protein that is missing in all control replicas and all proteomes has a lower intensity than a protein that is missing in all but one of these. We accordingly evaluate these proteins by performing a one-sample Significance C test where u0 is the mean intensity of all proteins that have been observed in only one proteome replica.

The one sample significance C test is as follows:

t=Xμ0s2Ns2=sigmoid(X)

where X, s and N are the sample mean, standard deviation and size.

Determining significance criteria

Once the experimental data has been collected, analyzed and missing values have been imputed the data still remains to be evaluated to identify the proteins that are significant interactors of the bait pulldown. From the empirical data we have two metrics to evaluate: a protein intensity ratio (the ratio of average protein intensities in the peptide pulldown versus average protein intensities in the control pulldown) as well as a p-value (t-test of protein intensities in peptide pulldown versus intensities in control pulldowns). In order to emphasize both parameters in the data evaluation we plot the two against one another, in a volcano plot, and apply a significance cut-off that emphasize both parameters. This is done by an s-curve defined by

s=(ppc)(rrc)

where p is -log10(p-value), r is the log2(ratio), and pc and rc are the corresponding cutoffs. In the volcano plots, we show the significance cut-off curve corresponding to s=1.

To improve comparability between pull downs, the cutoffs pc and rc are automatically adjusted in a pull down-specific manner:

pc=3.5+purc=2+ru

where pu is the median -log10(p-value) of the proteins with the 50% lowest ratios, and ru is the median log2(ratio) of the proteins with 50% worst p-values.

The constants, 3.5 in the pc formula and 2 in the rc formula, were determined empirically based on our large scale dataset and represent a rational compromise between coverage and specificity. To determine the compromise between specificity and coverage we analyzed how the interdependence between the fraction of specific interactors that contained a SH2 domain as function of the number of SH2 domain containing proteins among the specific interactors were affected by the values set for rc and pc. The balance between coverage, represented by the number of SH2 domain containing proteins, and specificity, the fraction of significant interactors that contain a SH2 domain, is affected by the values for pc and rc. We varied pc and rc between 0 and 4 in increments of 0.25, and found that the constants 3.5 and 2 represents a place in parameter space where we retain coverage without losing specificity, and is found as the right most part of the peak.

Finally, for proteins with missing values in all control samples, ie those that we referred to as approaching “infinite ratios”, the p-value cutoff is defined by the s-curve when log10(r) = 10, as this ratio essentially corresponds to infinity in our experiments.

Prioritization scheme for imputation of missing values

Prioritization scheme for imputation of missing values in the algorithm for analyzing pulldown dataset. (A) For each replica of a pulldown experiment, the control intensity is used if available (replica 1). If there are no measurement available in the control, the scaled proteome intensity is used (replica 2). If neither a measurement in the control or in the scaled proteome is available (replica 3), then no value is used, but the final statistical test will have 1 less degree of freedom (ν=2). The mean of the control is thus X=7+82=7.5. (B) To filter our large-scale pulldown dataset for promiscuous binders we handle cases where a protein binds to more than half of the 85 baits as a special case. In these cases we use the median intensity of all pulldowns, if the median is the highest value. In this example we thus apply 8 instead of 7.5. Had the median intensity been 7, then we would have kept the 7.5 from the controls.

Data analysis of peptide pulldowns in web-based tool in summary

All pulldown experiments and proteome measurements were analyzed together in MaxQuant v.1.4.1.4 against a combined rat and mouse Uniprot database using FDR<0.01 for both the peptide and protein level. All protein intensities were log-transformed and normalized by median-subtraction in each experiment. The protein intensities observed in the proteomes were scaled to their corresponding intensities in the control pulldowns (empty beads) by calculating the global linear offset.

Next, the variance for each protein in each individual pulldown was determined based on the experimental triplicates and a sigmoidal distribution of the variance as function of protein intensity was determined by data smoothing. Individual protein ratios between bait pulldowns and controls were calculated as the difference between the averages of the normalized intensities in the specific bait pulldown and the corresponding median of the intensities of the same protein in all pulldowns if the protein has been detected in more than half of the pulldowns. However, protein intensities from empty bead pulldowns or scaled proteomes were used if a protein has been identified in less than half of the specific phosphopeptide pulldowns. The ‘worst case scenario’ of empty beads pulldown or the scaled protein intensity is imputed as proxy; i.e. the larger of the two. Finally, if no protein value was available in neither empty beads nor proteome, a one-sided t-test was performed against the estimated protein abundance detection limit. To determine significant interactors Significance C was applied. P-values and protein ratios were visualized in a one-sided volcano plot in which gene names of interactors above the significance curve were highlighted. Proteins depicted in blue are based on imputed control values. Crosses indicate that the protein contains a phosphotyrosine-binding domain, and if the protein is a significant interactor these proteins are depicted as stars, see: http://pulldown.jensenlab.org/static/examples/sup_fig_4.pdf. The analytical framework for analysis of LC-MS/MS-based pulldown experiments is available at http://pulldown.jensenlab.org.

Gene Ontology and pathway analyses

We translated all identified rat proteins with regulated tyrosine phosphorylation sites to their corresponding orthologous human genes. Based on the human gene names pathway and gene ontology (GO) enrichment analyses were performed using the innateDB webtool (www.innatedb.org) (Lynn et al., 2008). Enriched REACTOME pathways and GO-terms were determined based on their p-values, which were calculated using a hypergeometric test and corrected for multiple testing with a Benjamini-Hochberg FDR-based test. We required a minimum of four identified genes and a minimum of 5% coverage.

Sequence motif analysis

We performed sequence motif analysis using IceLogo (Colaert et al., 2009) with percentage difference as the scoring system and a p-value cut-off of 0.05. Our input dataset was sequence windows for phosphorylation sites identified to be regulated in the phosphopeptide enriched samples, and sequence windows for all other phosphorylation sites identified in the dataset were used as the background dataset.

Supplementary Material

Supp Fig 1

Evaluation of phosphotyrosine signaling dataset. Related to Figure 1. (A) Summary of MS data. Proteome samples were measured in single-shot experiments with a 90 min gradient and led to identification of >45,000 peptides. TiO2 enriched samples were also measured in single-shot experiments using a 135 min gradient leading to identification of >15,000 phosphorylated peptides. Antibody based enrichment for tyrosine phosphorylated peptides were made in two consecutive incubation steps, and each of the two samples were measured in single-shot experiments using a 135 min gradient leading to identification of more than 1,600 tyrosine phosphorylated peptides. (B) Fractional distribution of tyrosine class 1 phosphorylation sites reported in deep rodent tissue phosphoproteome studies. In deep phosphoproteome studies without specific enrichments for phosphotyrosines these sites make up around 1% of the identified phosphorylation sites (Humphrey et al., 2015; Huttlin et al., 2010; Lundby et al., 2012). This proportion is not a reflection of biology but a reflection of the phosphopeptide enrichment methods used. When we include antibody-based enrichment of phosphotyrosines, the fraction of phosphotyrosines are increased more than 8-fold. Thus, specific enrichment for tyrosine-phosphorylated peptides is necessary to obtain high coverage of phosphotyrosines. (C) The amino acid sequence flanking phosphorylation sites identified to be regulated in the TiO2 enriched samples (top) were analyzed relative to a background of all the non-regulated phosphorylation sites. A similar analysis was subsequently performed, where all sequences with a proline residue in the +1 position was removed from the analysis (bottom). Motifs were generated with IceLogo. (D) Overlap of proteins with tyrosine phosphorylation sites identified in deep phosphoproteomics studies investigating EGF dependent phosphotyrosine signaling. We performed a SILAC based experiment where A549 cells were either stimulated with EGF or with saline for 5 min, similar to the rats. We enriched for tyrosine phosphorylated peptides and analyzed the data by high-resolution MS/MS. A similar experiment has been performed by Francavilla et al. in HeLa cells, where HeLa cells were stimulated with either EGF or saline for 8 min (Francavilla et al., 2016a). The numbers indicate the number of proteins with localized phosphotyrosine sites. The lung phosphotyrosine dataset is by far the largest. Yet, the absolute pTyr-protein overlap is larger between A549 cells and HeLa cells than between A549 cells and lung tissue, which shows that the A549 cell phosphotyrosine protein network resemble that of HeLa cells more than it resembles the phosphotyrosine protein network in lung tissue. (E-H) Western blot validation of EGF-dependent tyrosine phosphorylation sites. (E) Whole cell lysates from A549 cells were immunoblotted with the indicated antibodies. (F-H) Whole cell lysates from A549 cells were immunoprecipitated (IP) and immunoblotted with the indicated antibodies.

Supp Fig 2

Quantitative analysis of phosphoproteomics data. Related to Figure 1. (A) Correlation plots for proteome-(left), phosphotyrosine-(middle) and TiO2 enriched-(right) samples. Median protein intensities measured for all EGF stimulated rats were plotted as function of the protein intensities for each individual EGF stimulated rat. The Pearson correlation coefficient is indicated in each plot. (B) T-test based analysis of proteome (left), phosphotyrosine (middle), and TiO2 enriched (right) samples comparing EGF stimulated versus saline injected rats. The dashed lines indicate the cut-off criteria of p < 0.05 and ratio > 4. For the proteome samples this results in an estimated false discovery rate of 0.002. The FDR estimation from lung proteome data is based on the assumption that we do not expect global protein expression changes after a short 5 min stimulus (C) Gene ontology enrichment analysis of proteins that were identified as containing a regulated tyrosine phosphorylation site. (D) A hierarchical clustering of iBAQ protein intensities were performed based on the proteome measurements from the three lysates of rat lung tissue used for the peptide pulldowns. The relative protein abundance of a selection of proteins known to be involved in the EGF dependent signaling network were estimated from the protein intensity iBAQ values of the proteomes and indicated in the figure. The color scale indicates log2-transformed iBAQ intensities with the median subtracted.

Supp Fig 3

Comparison of results from analysis of phosphopeptide pulldowns using either a classical approach comparing phosphopeptide versus matched unmodified peptide pulldown or the approach we present in this paper. Related to Figure 2. (A) Three biological replicates of peptide based pulldowns were performed in different lung tissue lysates with a peptide corresponding to the sequence flanking tyrosine residue 921 of IRS4 either being phosphorylated or not. Analysis of the data using the classical approach (Eberl et al., 2013; Hubner et al., 2010)identifies 23 significant protein interactors (left). Using our web-based tool for the data analysis we identify 8 of these proteins as significant interactors, in addition to two other unique proteins (right). (B) A similar experiment was performed for a peptide based pulldown using a phosphorylated or a non-phosphorylated version of a peptide corresponding to the amino acid sequence flanking SHC-1 tyrosine residue 349. The classical approach leads to identification of 14 significant interactors. Our method reduces this set of protein interactors to seven; six of which are also represented in the classical approach. (C) Similar experiment performed for pY1172 EGFR. (D) Venn-diagram from 45 different phosphopeptide bait interactions in liver tissue lysates identified by data-independent acquisition. The number of significant SH2 domain containing proteins is indicated in parenthesis.

Supp Fig 4

Validation of primary interaction partners in peptide pulldowns. Related to Figure 3. (A) Interaction partners of phosphorylated PTPN11 Y546. Gene names are displayed for significant protein interactors, and interactors containing a SH2 domain are indicated with stars. For proteins highlighted in blue, control protein intensities are inferred from scaled proteome measurements and proteins marked with a cross contain a phosphotyrosine-binding domain. Proteins without any intensity information in controls are plotted as infinity-ratios to the right of the ratio plot. Stoichiometry of significant interactors indicates GRB2 as likely primary interaction partner. (B) Interaction partners of phosphorylated GAB1 Y317. Stoichiometry of significant interactors indicates CRKL as likely primary interaction partner. (C) Western blot confirmation of EGF-dependent interaction between PTPN11 and GRB2 by co-immunoprecipitation in A549 lung cancer cells. (D) Western blot validation of EGF-dependent interaction between GAB1 and CRKL by co-immunoprecipitation in A549 lung cancer cells.

Supp Fig 5

Isothermal titration calorimetry measurements for tyrosine phosphorylated peptides and novel interaction partners identified. Related to Figure 4. Binding affinities were measured for the tyrosine-phosphorylated form of wildtype EGFR residues flanking pY1016 as well as for the EGFR cancer mutation EGFR P1019L. Affinities for binding of purified SH2 domains from CRK, CRKL, SH2B1 and SYK were tested. (A) EGFR wt and CRK. (B) EGFR_P1019L and CRK. (C) EGFR wt and CRKL. (D) EGFR_P1019L and CRKL. (E) EGFR wt and SYK (SH2 domain from residues 7-114). (F) EGFR_P1019L and SYK (SH2 domain from residues 7-114). (G) EGFR wt and SYK (SH2 domain from residues 159-266). (H) EGFR_P1019L and and SYK (SH2 domain from residues 159-266). (I) EGFR wt and SH2B1 (SH2 domain from residues 515-632). (J) EGFR_P1019L and SH2B1 (SH2 domain from residues 515-632).

Supp Fig 6

Evaluation and validation of EGFR knockout A549 cells. Related to Figure 5. (A) Western blot confirmation of EGFR knockout. Whole cell extracts (WCE) from parental (WT) and EGFR knockout (EKO) A549 cells, after stimulation with either EGF or TGFα for 30 minutes, were immunoblotted with the indicated antibodies. (B). MS analysis of EGFR abundance. Bar plot showing normalized iBAQ EGFR intensities from parental and EGFR knockout A549 cells. MRPS10 housekeeping protein is shown as control. Values represent the mean ±SEM of two biological replicates. (C) Reproducibility of MS-based proteome analysis of parental and EGFR knockout cells. Multi-scatter plot showing Pearson correlation coefficients between the proteome of the different replicates. (D) Comparative proteome analysis between parental and EGFR knockout cells. Ratio versus intensity plot shows proteome changes between EGFR parental (right) and knockout cells (left). Significantly regulated hits (according to their “significance B”) are shown as colored dots (Orange = higher in parental cells; blue = higher in knockout cells). (E) Wound-healing scratch assay to quantify cell migration. Images of wildtype and knockout EGFR cells after EGF stimulation for 0, 24 h and 48 h. Quantification of cell migration is shown on the right panel. Data represent the mean ±SEM of two independent experiments, each one including two technical replicates. Numeric values on the top of each bar indicate the absolute migration rate, where 0 indicates no migration and 1 equals full migration.

Supp Fig 7

Signaling and cellular outcome of EGFR knockout A549 cells reconstituted with WT or P1019L EGFR-GFP mutant construct. Related to Figure 5. (A) Whole cell extracts (WCE) from parental (WT) A549 cells, EGFR knockout (EKO) cells and different stably transfected clones with either WT or P1019L EGFR-GFP constructs were immunoblotted with the indicated antibodies. (B) Different clones stably transfected with either WT or P1019L EGFR-GFP constructs were stimulated with EGF for 8 minutes and the lysates were immunoblotted with the indicated antibodies. (C) DNA sequence of the genomic DNA region surrounding the missense mutation site. Sequencing has been performed using a reverse primer. (D) Lysates from cells stimulated with EGF for 8 minutes were GFP-immunoprecipitated and analyzed through LC-MS/MS. Ratio versus intensity plot shows EGF-dependent interactors changes between WT (left) and P1019L (right) EGFR-expressing cells. Significantly regulated hits (according to their “significance B”) are shown as colored dots Data represent two independent experiments, each one including three WT and two P1019L clones. (E) WCE from the experiment shown in D were immunoblotted with the indicated antibodies. (F). WCE from the experiment showed in Fig. 3F were immunoblotted with the indicated antibodies (G, H). Cells were stimulated with EGF for the indicated time points and immunoblotted with the indicated antibodies. (I) Quantification of immunoblots showed in FigS10G-H, using ImageJ. Phospho-proteins expression is normalized on the total protein expression. Data represent the mean ±SEM of respectively four (ERK) and two (AKT) independent experiments. (J) CCK8 proliferation assay on EGFR knockout, WT- and P1019L-expressing cells, stimulated with EGF for 72 hours. Data represent the mean ± SEM of two independent experiments, each one including three WT and two P1019L clones (three technical replicates for each). (K, L) Wound-healing scratch assay to quantify cell migration. (K) Representative images of CTRL and SHIP2 KD EGFR P1019L expressing cells after EGF stimulation for 0 and 36h are shown. Numbers in red indicate the average absolute migration rate, where 0 indicates no migration and 1 equals full migration (two independent experiments, each including a different P1019L clone). (L) EGF-dependent migration of EGFR P1019L expressing cells was assessed after DMSO or SHIP2 inhibitor (AS1938909) treatment. Quantification is displayed as mean ±SEM of two independent experiments, each including a different P1019L clone. The p-values in K-L were evaluated by two-sided Student’s T-test.

Supplementary Tables
Supplementary Tables

Acknowledgements

The authors thank all lab members for fruitful discussion, Simon Kamenov for assistance with part of the pulldown experiments and Rossana Foti for access to microscopy. We would like to thank Dr. Pia Rengtved Lundegaard for the Tg(kdrl:EGFP) strains and for technical assistance with the zebrafish experiments Work at The Novo Nordisk Foundation Center for Protein Research (CPR) is funded in part by a generous donation from the Novo Nordisk Foundation (Grant number NNF14CC0001). The proteomics technology developments applied was part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement: MSmed-686547, EPIC-XS-823839 and ERC synergy grant 810057-HighResCells; Manufacturer Vilhelm Pedersen & wife’s Memorial Fund Award (J.V.O.); Sapere Aude and YDUN Grant from The Danish Council for Independent Research (A.L., grant number DFF-4092-00045 and grant number DFF-6110-00166) and The Novo Nordisk Foundation (A.L., grant number NNF15OC0017586). SPG is funded by the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No 798716.

Footnotes

Author Contributions

AS performed animal experiments shown in Fig. 1A. GF generated mutant cell lines and performed experiments shown in Fig. 2D, 3F-J, Sup. Fig. S1G-H, S5C-D, S6C-E, S9-10. IP generated the EGFR knockout cells and P1019L mutant. BM performed ITC experiments shown in Fig. 3D-E, Fig.4G and Sup. Fig. S7-8, supervised by GM. CF performed western blots shown in Fig. 1B. DBBJ performed all peptide pulldown experiments shown in Sup. Fig. S4, KBE with SRM and DBBJ performed peptide pulldown experiments using DIA approach. CDK provided input for optimization of peptide pulldown experiments shown in Sup. Fig. S4. SPG performed zebrafish experiments in Fig. 3K-L under supervision of MK. JCR and LJJ developed statistical framework and web-interface for analysis of pulldown data. AL performed experiments and analyzed data shown in all remaining figures. AL and JVO conceived the project, designed the experiments, analyzed all MS data, critically evaluated results and wrote the manuscript. All authors provided input for the manuscript.

Declaration of Interests

Author A.S. is current employee of Novo Nordisk.

Additional Resources

We have made our analytical framework accessible to other researchers via a web-based tool http://pulldown.jensenlab.org. On this website we also provide a step-by-step guideline for how to use the webtool as well as in-depth explanation of the analysis pipeline, and all volcano plots….

Data and Code Availability

The mass spectrometry proteomics data in Thermo Scientific’s *.raw format have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository (https://www.ebi.ac.uk/pride/archive/) with the dataset identifier PXD004055. All other data supporting the findings of this study are available within the article and its supplemental information files or from the Lead Contact upon reasonable request.

References

  1. Apostol I, Kuciel R, Wasylewska E, Ostrowski WS. Phosphotyrosine as a substrate of acid and alkaline phosphatases. Acta Biochim Pol. 1985;32:187–197. [PubMed] [Google Scholar]
  2. Bache N, Geyer PE, Bekker-Jensen DB, Hoerning O, Falkenby L, Treit PV, Doll S, Paron I, Muller JB, Meier F, et al. A Novel LC System Embeds Analytes in Pre-formed Gradients for Rapid, Ultra-robust Proteomics. Mol Cell Proteomics. 2018;17:2284–2296. doi: 10.1074/mcp.TIR118.000853. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bae JH, Lew ED, Yuzawa S, Tome F, Lax I, Schlessinger J. The selectivity of receptor tyrosine kinase signaling is controlled by a secondary SH2 domain binding site. Cell. 2009;138:514–524. doi: 10.1016/j.cell.2009.05.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bamford S, Dawson E, Forbes S, Clements J, Pettett R, Dogan A, Flanagan A, Teague J, Futreal PA, Stratton MR, et al. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br J Cancer. 2004;91:355–358. doi: 10.1038/sj.bjc.6601894. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bekker-Jensen DB, Kelstrup CD, Batth TS, Larsen SC, Haldrup C, Bramsen JB, Sorensen KD, Hoyer S, Orntoft TF, Andersen CL, et al. An Optimized Shotgun Strategy for the Rapid Generation of Comprehensive Human Proteomes. Cell Syst. 2017;4:587–599.:e584. doi: 10.1016/j.cels.2017.05.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bisson N, James DA, Ivosev G, Tate SA, Bonner R, Taylor L, Pawson T. Selected reaction monitoring mass spectrometry reveals the dynamics of signaling through the GRB2 adaptor. Nat Biotechnol. 2011;29:653–658. doi: 10.1038/nbt.1905. [DOI] [PubMed] [Google Scholar]
  7. Blagoev B, Kratchmarova I, Ong SE, Nielsen M, Foster LJ, Mann M. A proteomics strategy to elucidate functional protein-protein interactions applied to EGF signaling. Nat Biotechnol. 2003;21:315–318. doi: 10.1038/nbt790. [DOI] [PubMed] [Google Scholar]
  8. Boersema PJ, Foong LY, Ding VM, Lemeer S, van Breukelen B, Philp R, Boekhorst J, Snel B, den Hertog J, Choo AB, et al. In-depth qualitative and quantitative profiling of tyrosine phosphorylation using a combination of phosphopeptide immunoaffinity purification and stable isotope dimethyl labeling. Mol Cell Proteomics. 2010;9:84–99. doi: 10.1074/mcp.M900291-MCP200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–193. doi: 10.1093/bioinformatics/19.2.185. [DOI] [PubMed] [Google Scholar]
  10. Bruderer R, Bernhardt OM, Gandhi T, Miladinovic SM, Cheng LY, Messner S, Ehrenberger T, Zanotelli V, Butscheid Y, Escher C, et al. Extending the limits of quantitative proteome profiling with data-independent acquisition and application to acetaminophen-treated three-dimensional liver microtissues. Mol Cell Proteomics. 2015;14:1400–1410. doi: 10.1074/mcp.M114.044305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bugaj LJ, Sabnis AJ, Mitchell A, Garbarino JE, Toettcher JE, Bivona TG, Lim WA. Cancer mutations and targeted drugs can disrupt dynamic signal encoding by the Ras-Erk pathway. Science. 2018;361 doi: 10.1126/science.aao3048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cohen P. Protein kinases--the major drug targets of the twenty-first century? Nat Rev Drug Discov. 2002;1:309–315. doi: 10.1038/nrd773. [DOI] [PubMed] [Google Scholar]
  13. Colaert N, Helsens K, Martens L, Vandekerckhove J, Gevaert K. Improved visualization of protein consensus sequences by iceLogo. Nat Methods. 2009;6:786–787. doi: 10.1038/nmeth1109-786. [DOI] [PubMed] [Google Scholar]
  14. Cousins-Wasti RC, Ingraham RH, Morelock MM, Grygon CA. Determination of affinities for lck SH2 binding peptides using a sensitive fluorescence assay: comparison between the pYEEIP and pYQPQP consensus sequences reveals context-dependent binding specificity. Biochemistry. 1996;35:16746–16752. doi: 10.1021/bi9620868. [DOI] [PubMed] [Google Scholar]
  15. Cox J, Hein MY, Luber CA, Paron I, Nagaraj N, Mann M. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol Cell Proteomics. 2014;13:2513–2526. doi: 10.1074/mcp.M113.031591. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Cox J, Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol. 2008;26:1367–1372. doi: 10.1038/nbt.1511. [DOI] [PubMed] [Google Scholar]
  17. Cox J, Mann M. 1D and 2D annotation enrichment: a statistical method integrating quantitative proteomics with complementary high-throughput data. BMC bioinformatics. 2012;13(Suppl 16):S12. doi: 10.1186/1471-2105-13-S16-S12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Cox J, Neuhauser N, Michalski A, Scheltema RA, Olsen JV, Mann M. Andromeda: a peptide search engine integrated into the MaxQuant environment. Journal of proteome research. 2011;10:1794–1805. doi: 10.1021/pr101065j. [DOI] [PubMed] [Google Scholar]
  19. Eberl HC, Spruijt CG, Kelstrup CD, Vermeulen M, Mann M. A map of general and specialized chromatin readers in mouse tissues generated by label-free interaction proteomics. Molecular cell. 2013;49:368–378. doi: 10.1016/j.molcel.2012.10.026. [DOI] [PubMed] [Google Scholar]
  20. Erneux C, Edimo WE, Deneubourg L, Pirson I. SHIP2 multiple functions: a balance between a negative control of PtdIns(3,4,5)P(3) level, a positive control of PtdIns(3,4)P(2) production, and intrinsic docking properties. J Cell Biochem. 2011;112:2203–2209. doi: 10.1002/jcb.23146. [DOI] [PubMed] [Google Scholar]
  21. Francavilla C, Papetti M, Rigbolt KT, Pedersen AK, Sigurdsson JO, Cazzamali G, Karemore G, Blagoev B, Olsen JV. Multilayered proteomics reveals molecular switches dictating ligand-dependent EGFR trafficking. Nature structural & molecular biology. 2016a doi: 10.1038/nsmb.3218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Francavilla C, Papetti M, Rigbolt KT, Pedersen AK, Sigurdsson JO, Cazzamali G, Karemore G, Blagoev B, Olsen JV. Multilayered proteomics reveals molecular switches dictating ligand-dependent EGFR trafficking. Nature structural & molecular biology. 2016b;23:608–618. doi: 10.1038/nsmb.3218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Francavilla C, Rigbolt KT, Emdal KB, Carraro G, Vernet E, Bekker-Jensen DB, Streicher W, Wikstrom M, Sundstrom M, Bellusci S, et al. Functional proteomics defines the molecular switch underlying FGF receptor trafficking and cellular outputs. Molecular cell. 2013;51:707–722. doi: 10.1016/j.molcel.2013.08.002. [DOI] [PubMed] [Google Scholar]
  24. Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, Appel RD, Bairoch A. ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic acids research. 2003;31:3784–3788. doi: 10.1093/nar/gkg563. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Grossmann A, Benlasfer N, Birth P, Hegele A, Wachsmuth F, Apelt L, Stelzl U. Phospho-tyrosine dependent protein-protein interaction network. Molecular systems biology. 2015;11:794. doi: 10.15252/msb.20145968. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Hanke S, Mann M. The phosphotyrosine interactome of the insulin receptor family and its substrates IRS-1 and IRS-2. Mol Cell Proteomics. 2009;8:519–534. doi: 10.1074/mcp.M800407-MCP200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Hein MY, Hubner NC, Poser I, Cox J, Nagaraj N, Toyoda Y, Gak IA, Weisswange I, Mansfeld J, Buchholz F, et al. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell. 2015;163:712–723. doi: 10.1016/j.cell.2015.09.053. [DOI] [PubMed] [Google Scholar]
  28. Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan M. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic acids research. 2012;40:D261–270. doi: 10.1093/nar/gkr1122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Hubner NC, Bird AW, Cox J, Splettstoesser B, Bandilla P, Poser I, Hyman A, Mann M. Quantitative proteomics combined with BAC TransgeneOmics reveals in vivo protein interactions. J Cell Biol. 2010;189:739–754. doi: 10.1083/jcb.200911091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Humphrey SJ, Azimifar SB, Mann M. High-throughput phosphoproteomics reveals in vivo insulin signaling dynamics. Nat Biotechnol. 2015;33:990–995. doi: 10.1038/nbt.3327. [DOI] [PubMed] [Google Scholar]
  31. Hunter T, Sefton BM. Transforming gene product of Rous sarcoma virus phosphorylates tyrosine. Proc Natl Acad Sci U S A. 1980;77:1311–1315. doi: 10.1073/pnas.77.3.1311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Huttlin EL, Jedrychowski MP, Elias JE, Goswami T, Rad R, Beausoleil SA, Villen J, Haas W, Sowa ME, Gygi SP. A tissue-specific atlas of mouse protein phosphorylation and expression. Cell. 2010;143:1174–1189. doi: 10.1016/j.cell.2010.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Janne PA, Engelman JA, Johnson BE. Epidermal growth factor receptor mutations in non-small-cell lung cancer: implications for treatment and tumor biology. J Clin Oncol. 2005;23:3227–3234. doi: 10.1200/JCO.2005.09.985. [DOI] [PubMed] [Google Scholar]
  34. Kandoth C, McLellan MD, Vandin F, Ye K, Niu B, Lu C, Xie M, Zhang Q, McMichael JF, Wyczalkowski MA, et al. Mutational landscape and significance across 12 major cancer types. Nature. 2013;502:333–339. doi: 10.1038/nature12634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Kavanaugh WM, Turck CW, Williams LT. PTB domain binding to signaling proteins through a sequence motif containing phosphotyrosine. Science. 1995;268:1177–1179. doi: 10.1126/science.7539155. [DOI] [PubMed] [Google Scholar]
  36. Kelstrup CD, Bekker-Jensen DB, Arrey TN, Hogrebe A, Harder A, Olsen JV. Performance Evaluation of the Q Exactive HF-X for Shotgun Proteomics. Journal of proteome research. 2018;17:727–738. doi: 10.1021/acs.jproteome.7b00602. [DOI] [PubMed] [Google Scholar]
  37. Klaeger S, Heinzlmeir S, Wilhelm M, Polzer H, Vick B, Koenig PA, Reinecke M, Ruprecht B, Petzoldt S, Meng C, et al. The target landscape of clinical kinase drugs. Science. 2017;358 doi: 10.1126/science.aan4368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Kolch W. Coordinating ERK/MAPK signalling through scaffolds and inhibitors. Nature reviews Molecular cell biology. 2005;6:827–837. doi: 10.1038/nrm1743. [DOI] [PubMed] [Google Scholar]
  39. Li T, Wernersson R, Hansen RB, Horn H, Mercer J, Slodkowicz G, Workman CT, Rigina O, Rapacki K, Staerfeldt HH, et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat Methods. 2017;14:61–64. doi: 10.1038/nmeth.4083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Linkuviene V, Krainer G, Chen WY, Matulis D. Isothermal titration calorimetry for drug design: Precision of the enthalpy and binding constant measurements and comparison of the instruments. Anal Biochem. 2016;515:61–64. doi: 10.1016/j.ab.2016.10.005. [DOI] [PubMed] [Google Scholar]
  41. Liu JJ, Sharma K, Zangrandi L, Chen C, Humphrey SJ, Chiu YT, Spetea M, Liu-Chen LY, Schwarzer C, Mann M. In vivo brain GPCR signaling elucidated by phosphoproteomics. Science. 2018;360 doi: 10.1126/science.aao4927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Lundby A, Andersen MN, Steffensen AB, Horn H, Kelstrup CD, Francavilla C, Jensen LJ, Schmitt N, Thomsen MB, Olsen JV. In vivo phosphoproteomics analysis reveals the cardiac targets of beta-adrenergic receptor signaling. Sci Signal. 2013;6:rs11. doi: 10.1126/scisignal.2003506. [DOI] [PubMed] [Google Scholar]
  43. Lundby A, Rossin EJ, Steffensen AB, Acha MR, Newton-Cheh C, Pfeufer A, Lynch SN, Olesen SP, Brunak S, Ellinor PT, et al. Annotation of loci from genome-wide association studies using tissue-specific quantitative interaction proteomics. Nat Methods. 2014;11:868–874. doi: 10.1038/nmeth.2997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Lundby A, Secher A, Lage K, Nordsborg NB, Dmytriyev A, Lundby C, Olsen JV. Quantitative maps of protein phosphorylation sites across 14 different rat organs and tissues. Nature communications. 2012;3:876. doi: 10.1038/ncomms1871. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Lynn DJ, Winsor GL, Chan C, Richard N, Laird MR, Barsky A, Gardy JL, Roche FM, Chan TH, Shah N, et al. InnateDB: facilitating systems-level analyses of the mammalian innate immune response. Mol Syst Biol. 2008;4:218. doi: 10.1038/msb.2008.55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Meyer K, Kirchner M, Uyar B, Cheng JY, Russo G, Hernandez-Miranda LR, Szymborska A, Zauber H, Rudolph IM, Willnow TE, et al. Mutations in Disordered Regions Can Cause Disease by Creating Dileucine Motifs. Cell. 2018;175:239–253.:e217. doi: 10.1016/j.cell.2018.08.019. [DOI] [PubMed] [Google Scholar]
  47. Miller ML, Hanke S, Hinsby AM, Friis C, Brunak S, Mann M, Blom N. Motif decomposition of the phosphotyrosine proteome reveals a new N-terminal binding motif for SHIP2. Mol Cell Proteomics. 2008;7:181–192. doi: 10.1074/mcp.M700241-MCP200. [DOI] [PubMed] [Google Scholar]
  48. Olsen JV, Blagoev B, Gnad F, Macek B, Kumar C, Mortensen P, Mann M. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell. 2006;127:635–648. doi: 10.1016/j.cell.2006.09.026. [DOI] [PubMed] [Google Scholar]
  49. Paez JG, Janne PA, Lee JC, Tracy S, Greulich H, Gabriel S, Herman P, Kaye FJ, Lindeman N, Boggon TJ, et al. EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science. 2004;304:1497–1500. doi: 10.1126/science.1099314. [DOI] [PubMed] [Google Scholar]
  50. Pawson T, Scott JD. Signaling through scaffold, anchoring, and adaptor proteins. Science. 1997;278:2075–2080. doi: 10.1126/science.278.5346.2075. [DOI] [PubMed] [Google Scholar]
  51. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20:3551–3567. doi: 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2. [DOI] [PubMed] [Google Scholar]
  52. Petschnigg J, Groisman B, Kotlyar M, Taipale M, Zheng Y, Kurat CF, Sayad A, Sierra JR, Mattiazzi Usaj M, Snider J, et al. The mammalian-membrane two-hybrid assay (MaMTH) for probing membrane-protein interactions in human cells. Nat Methods. 2014;11:585–592. doi: 10.1038/nmeth.2895. [DOI] [PubMed] [Google Scholar]
  53. Pineiro A, Munoz E, Sabin J, Costas M, Bastos M, Velazquez-Campoy A, Garrido PF, Dumas P, Ennifar E, Garcia-Rio L, et al. AFFINImeter: A software to analyze molecular recognition processes from experimental data. Anal Biochem. 2019;577:117–134. doi: 10.1016/j.ab.2019.02.031. [DOI] [PubMed] [Google Scholar]
  54. Poulsen JW, Madsen CT, Young C, Poulsen FM, Nielsen ML. Using guanidine-hydrochloride for fast and efficient protein digestion and single-step affinity-purification mass spectrometry. Journal of proteome research. 2013;12:1020–1030. doi: 10.1021/pr300883y. [DOI] [PubMed] [Google Scholar]
  55. Prasad NK. SHIP2 phosphoinositol phosphatase positively regulates EGFR-Akt pathway, CXCR4 expression, and cell migration in MDA-MB-231 breast cancer cells. Int J Oncol. 2009;34:97–105. [PubMed] [Google Scholar]
  56. Ran FA, Hsu PD, Wright J, Agarwala V, Scott DA, Zhang F. Genome engineering using the CRISPR-Cas9 system. Nat Protoc. 2013;8:2281–2308. doi: 10.1038/nprot.2013.143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Rappsilber J, Mann M, Ishihama Y. Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat Protoc. 2007;2:1896–1906. doi: 10.1038/nprot.2007.261. [DOI] [PubMed] [Google Scholar]
  58. Rikova K, Guo A, Zeng Q, Possemato A, Yu J, Haack H, Nardone J, Lee K, Reeves C, Li Y, et al. Global survey of phosphotyrosine signaling identifies oncogenic kinases in lung cancer. Cell. 2007;131:1190–1203. doi: 10.1016/j.cell.2007.11.025. [DOI] [PubMed] [Google Scholar]
  59. Rix U, Superti-Furga G. Target profiling of small molecules by chemical proteomics. Nat Chem Biol. 2009;5:616–624. doi: 10.1038/nchembio.216. [DOI] [PubMed] [Google Scholar]
  60. Rouhi P, Jensen LD, Cao Z, Hosaka K, Lanne T, Wahlberg E, Steffensen JF, Cao Y. Hypoxia-induced metastasis model in embryonic zebrafish. Nat Protoc. 2010;5:1911–1918. doi: 10.1038/nprot.2010.150. [DOI] [PubMed] [Google Scholar]
  61. Rush J, Moritz A, Lee KA, Guo A, Goss VL, Spek EJ, Zhang H, Zha XM, Polakiewicz RD, Comb MJ. Immunoaffinity profiling of tyrosine phosphorylation in cancer cells. Nat Biotechnol. 2005;23:94–101. doi: 10.1038/nbt1046. [DOI] [PubMed] [Google Scholar]
  62. Schlessinger J, Lemmon MA. SH2 and PTB domains in tyrosine kinase signaling. Sci STKE. 2003;2003:RE12. doi: 10.1126/stke.2003.191.re12. [DOI] [PubMed] [Google Scholar]
  63. Schneider CA, Rasband WS, Eliceiri KW. NIH Image to ImageJ: 25 years of image analysis. Nat Methods. 2012;9:671–675. doi: 10.1038/nmeth.2089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Schulze WX, Deng L, Mann M. Phosphotyrosine interactome of the ErbB-receptor kinase family. Mol Syst Biol. 2005;1:20050008. doi: 10.1038/msb4100012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Schwanhausser B, Busse D, Li N, Dittmar G, Schuchhardt J, Wolf J, Chen W, Selbach M. Global quantification of mammalian gene expression control. Nature. 2011;473:337–342. doi: 10.1038/nature10098. [DOI] [PubMed] [Google Scholar]
  66. Sefton BM, Hunter T, Beemon K, Eckhart W. Evidence that the phosphorylation of tyrosine is essential for cellular transformation by Rous sarcoma virus. Cell. 1980;20:807–816. doi: 10.1016/0092-8674(80)90327-x. [DOI] [PubMed] [Google Scholar]
  67. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Sharma K, D’Souza RC, Tyanova S, Schaab C, Wisniewski JR, Cox J, Mann M. Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. Cell reports. 2014;8:1583–1594. doi: 10.1016/j.celrep.2014.07.036. [DOI] [PubMed] [Google Scholar]
  69. Sharma K, Weber C, Bairlein M, Greff Z, Keri G, Cox J, Olsen JV, Daub H. Proteomics strategy for quantitative protein interaction profiling in cell extracts. Nat Methods. 2009;6:741–744. doi: 10.1038/nmeth.1373. [DOI] [PubMed] [Google Scholar]
  70. Songyang Z, Shoelson SE, Chaudhuri M, Gish G, Pawson T, Haser WG, King F, Roberts T, Ratnofsky S, Lechleider RJ, et al. SH2 domains recognize specific phosphopeptide sequences. Cell. 1993;72:767–778. doi: 10.1016/0092-8674(93)90404-e. [DOI] [PubMed] [Google Scholar]
  71. Soria JC, Ohe Y, Vansteenkiste J, Reungwetwattana T, Chewaskulyong B, Lee KH, Dechaphunkul A, Imamura F, Nogami N, Kurata T, et al. Osimertinib in Untreated EGFR-Mutated Advanced Non-Small-Cell Lung Cancer. The New England journal of medicine. 2018;378:113–125. doi: 10.1056/NEJMoa1713137. [DOI] [PubMed] [Google Scholar]
  72. Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic acids research. 2006;34:D535–539. doi: 10.1093/nar/gkj109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, Simonovic M, Doncheva NT, Morris JH, Bork P, et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic acids research. 2019;47:D607–D613. doi: 10.1093/nar/gky1131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Tompa P, Davey NE, Gibson TJ, Babu MM. A million peptide motifs for the molecular biologist. Molecular cell. 2014;55:161–169. doi: 10.1016/j.molcel.2014.05.032. [DOI] [PubMed] [Google Scholar]
  75. Tyanova S, Temu T, Sinitcyn P, Carlson A, Hein MY, Geiger T, Mann M, Cox J. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat Methods. 2016;13:731–740. doi: 10.1038/nmeth.3901. [DOI] [PubMed] [Google Scholar]
  76. Ulaganathan VK, Sperl B, Rapp UR, Ullrich A. Germline variant FGFR4 p.G388R exposes a membrane-proximal STAT3 binding site. Nature. 2015;528:570–574. doi: 10.1038/nature16449. [DOI] [PubMed] [Google Scholar]
  77. UniProt C. UniProt: a hub for protein information. Nucleic acids research. 2015;43:D204–212. doi: 10.1093/nar/gku989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Vacic V, Markwick PR, Oldfield CJ, Zhao X, Haynes C, Uversky VN, Iakoucheva LM. Disease-associated mutations disrupt functionally important regions of intrinsic protein disorder. PLoS computational biology. 2012;8:e1002709. doi: 10.1371/journal.pcbi.1002709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. van Biesen T, Hawes BE, Luttrell DK, Krueger KM, Touhara K, Porfiri E, Sakaue M, Luttrell LM, Lefkowitz RJ. Receptor-tyrosine-kinase- and G beta gamma-mediated MAP kinase activation by a common signalling pathway. Nature. 1995;376:781–784. doi: 10.1038/376781a0. [DOI] [PubMed] [Google Scholar]
  80. van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, Fuxreiter M, Gough J, Gsponer J, Jones DT, et al. Classification of intrinsically disordered regions and proteins. Chem Rev. 2014;114:6589–6631. doi: 10.1021/cr400525m. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Zhang Y, Wolf-Yadlin A, Ross PL, Pappin DJ, Rush J, Lauffenburger DA, White FM. Time-resolved mass spectrometry of tyrosine phosphorylation sites in the epidermal growth factor receptor signaling network reveals dynamic modules. Mol Cell Proteomics. 2005;4:1240–1250. doi: 10.1074/mcp.M500089-MCP200. [DOI] [PubMed] [Google Scholar]
  82. Zhou H, Di Palma S, Preisinger C, Peng M, Polat AN, Heck AJ, Mohammed S. Toward a comprehensive characterization of a human cancer cell phosphoproteome. Journal of proteome research. 2013;12:260–271. doi: 10.1021/pr300630k. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Fig 1

Evaluation of phosphotyrosine signaling dataset. Related to Figure 1. (A) Summary of MS data. Proteome samples were measured in single-shot experiments with a 90 min gradient and led to identification of >45,000 peptides. TiO2 enriched samples were also measured in single-shot experiments using a 135 min gradient leading to identification of >15,000 phosphorylated peptides. Antibody based enrichment for tyrosine phosphorylated peptides were made in two consecutive incubation steps, and each of the two samples were measured in single-shot experiments using a 135 min gradient leading to identification of more than 1,600 tyrosine phosphorylated peptides. (B) Fractional distribution of tyrosine class 1 phosphorylation sites reported in deep rodent tissue phosphoproteome studies. In deep phosphoproteome studies without specific enrichments for phosphotyrosines these sites make up around 1% of the identified phosphorylation sites (Humphrey et al., 2015; Huttlin et al., 2010; Lundby et al., 2012). This proportion is not a reflection of biology but a reflection of the phosphopeptide enrichment methods used. When we include antibody-based enrichment of phosphotyrosines, the fraction of phosphotyrosines are increased more than 8-fold. Thus, specific enrichment for tyrosine-phosphorylated peptides is necessary to obtain high coverage of phosphotyrosines. (C) The amino acid sequence flanking phosphorylation sites identified to be regulated in the TiO2 enriched samples (top) were analyzed relative to a background of all the non-regulated phosphorylation sites. A similar analysis was subsequently performed, where all sequences with a proline residue in the +1 position was removed from the analysis (bottom). Motifs were generated with IceLogo. (D) Overlap of proteins with tyrosine phosphorylation sites identified in deep phosphoproteomics studies investigating EGF dependent phosphotyrosine signaling. We performed a SILAC based experiment where A549 cells were either stimulated with EGF or with saline for 5 min, similar to the rats. We enriched for tyrosine phosphorylated peptides and analyzed the data by high-resolution MS/MS. A similar experiment has been performed by Francavilla et al. in HeLa cells, where HeLa cells were stimulated with either EGF or saline for 8 min (Francavilla et al., 2016a). The numbers indicate the number of proteins with localized phosphotyrosine sites. The lung phosphotyrosine dataset is by far the largest. Yet, the absolute pTyr-protein overlap is larger between A549 cells and HeLa cells than between A549 cells and lung tissue, which shows that the A549 cell phosphotyrosine protein network resemble that of HeLa cells more than it resembles the phosphotyrosine protein network in lung tissue. (E-H) Western blot validation of EGF-dependent tyrosine phosphorylation sites. (E) Whole cell lysates from A549 cells were immunoblotted with the indicated antibodies. (F-H) Whole cell lysates from A549 cells were immunoprecipitated (IP) and immunoblotted with the indicated antibodies.

Supp Fig 2

Quantitative analysis of phosphoproteomics data. Related to Figure 1. (A) Correlation plots for proteome-(left), phosphotyrosine-(middle) and TiO2 enriched-(right) samples. Median protein intensities measured for all EGF stimulated rats were plotted as function of the protein intensities for each individual EGF stimulated rat. The Pearson correlation coefficient is indicated in each plot. (B) T-test based analysis of proteome (left), phosphotyrosine (middle), and TiO2 enriched (right) samples comparing EGF stimulated versus saline injected rats. The dashed lines indicate the cut-off criteria of p < 0.05 and ratio > 4. For the proteome samples this results in an estimated false discovery rate of 0.002. The FDR estimation from lung proteome data is based on the assumption that we do not expect global protein expression changes after a short 5 min stimulus (C) Gene ontology enrichment analysis of proteins that were identified as containing a regulated tyrosine phosphorylation site. (D) A hierarchical clustering of iBAQ protein intensities were performed based on the proteome measurements from the three lysates of rat lung tissue used for the peptide pulldowns. The relative protein abundance of a selection of proteins known to be involved in the EGF dependent signaling network were estimated from the protein intensity iBAQ values of the proteomes and indicated in the figure. The color scale indicates log2-transformed iBAQ intensities with the median subtracted.

Supp Fig 3

Comparison of results from analysis of phosphopeptide pulldowns using either a classical approach comparing phosphopeptide versus matched unmodified peptide pulldown or the approach we present in this paper. Related to Figure 2. (A) Three biological replicates of peptide based pulldowns were performed in different lung tissue lysates with a peptide corresponding to the sequence flanking tyrosine residue 921 of IRS4 either being phosphorylated or not. Analysis of the data using the classical approach (Eberl et al., 2013; Hubner et al., 2010)identifies 23 significant protein interactors (left). Using our web-based tool for the data analysis we identify 8 of these proteins as significant interactors, in addition to two other unique proteins (right). (B) A similar experiment was performed for a peptide based pulldown using a phosphorylated or a non-phosphorylated version of a peptide corresponding to the amino acid sequence flanking SHC-1 tyrosine residue 349. The classical approach leads to identification of 14 significant interactors. Our method reduces this set of protein interactors to seven; six of which are also represented in the classical approach. (C) Similar experiment performed for pY1172 EGFR. (D) Venn-diagram from 45 different phosphopeptide bait interactions in liver tissue lysates identified by data-independent acquisition. The number of significant SH2 domain containing proteins is indicated in parenthesis.

Supp Fig 4

Validation of primary interaction partners in peptide pulldowns. Related to Figure 3. (A) Interaction partners of phosphorylated PTPN11 Y546. Gene names are displayed for significant protein interactors, and interactors containing a SH2 domain are indicated with stars. For proteins highlighted in blue, control protein intensities are inferred from scaled proteome measurements and proteins marked with a cross contain a phosphotyrosine-binding domain. Proteins without any intensity information in controls are plotted as infinity-ratios to the right of the ratio plot. Stoichiometry of significant interactors indicates GRB2 as likely primary interaction partner. (B) Interaction partners of phosphorylated GAB1 Y317. Stoichiometry of significant interactors indicates CRKL as likely primary interaction partner. (C) Western blot confirmation of EGF-dependent interaction between PTPN11 and GRB2 by co-immunoprecipitation in A549 lung cancer cells. (D) Western blot validation of EGF-dependent interaction between GAB1 and CRKL by co-immunoprecipitation in A549 lung cancer cells.

Supp Fig 5

Isothermal titration calorimetry measurements for tyrosine phosphorylated peptides and novel interaction partners identified. Related to Figure 4. Binding affinities were measured for the tyrosine-phosphorylated form of wildtype EGFR residues flanking pY1016 as well as for the EGFR cancer mutation EGFR P1019L. Affinities for binding of purified SH2 domains from CRK, CRKL, SH2B1 and SYK were tested. (A) EGFR wt and CRK. (B) EGFR_P1019L and CRK. (C) EGFR wt and CRKL. (D) EGFR_P1019L and CRKL. (E) EGFR wt and SYK (SH2 domain from residues 7-114). (F) EGFR_P1019L and SYK (SH2 domain from residues 7-114). (G) EGFR wt and SYK (SH2 domain from residues 159-266). (H) EGFR_P1019L and and SYK (SH2 domain from residues 159-266). (I) EGFR wt and SH2B1 (SH2 domain from residues 515-632). (J) EGFR_P1019L and SH2B1 (SH2 domain from residues 515-632).

Supp Fig 6

Evaluation and validation of EGFR knockout A549 cells. Related to Figure 5. (A) Western blot confirmation of EGFR knockout. Whole cell extracts (WCE) from parental (WT) and EGFR knockout (EKO) A549 cells, after stimulation with either EGF or TGFα for 30 minutes, were immunoblotted with the indicated antibodies. (B). MS analysis of EGFR abundance. Bar plot showing normalized iBAQ EGFR intensities from parental and EGFR knockout A549 cells. MRPS10 housekeeping protein is shown as control. Values represent the mean ±SEM of two biological replicates. (C) Reproducibility of MS-based proteome analysis of parental and EGFR knockout cells. Multi-scatter plot showing Pearson correlation coefficients between the proteome of the different replicates. (D) Comparative proteome analysis between parental and EGFR knockout cells. Ratio versus intensity plot shows proteome changes between EGFR parental (right) and knockout cells (left). Significantly regulated hits (according to their “significance B”) are shown as colored dots (Orange = higher in parental cells; blue = higher in knockout cells). (E) Wound-healing scratch assay to quantify cell migration. Images of wildtype and knockout EGFR cells after EGF stimulation for 0, 24 h and 48 h. Quantification of cell migration is shown on the right panel. Data represent the mean ±SEM of two independent experiments, each one including two technical replicates. Numeric values on the top of each bar indicate the absolute migration rate, where 0 indicates no migration and 1 equals full migration.

Supp Fig 7

Signaling and cellular outcome of EGFR knockout A549 cells reconstituted with WT or P1019L EGFR-GFP mutant construct. Related to Figure 5. (A) Whole cell extracts (WCE) from parental (WT) A549 cells, EGFR knockout (EKO) cells and different stably transfected clones with either WT or P1019L EGFR-GFP constructs were immunoblotted with the indicated antibodies. (B) Different clones stably transfected with either WT or P1019L EGFR-GFP constructs were stimulated with EGF for 8 minutes and the lysates were immunoblotted with the indicated antibodies. (C) DNA sequence of the genomic DNA region surrounding the missense mutation site. Sequencing has been performed using a reverse primer. (D) Lysates from cells stimulated with EGF for 8 minutes were GFP-immunoprecipitated and analyzed through LC-MS/MS. Ratio versus intensity plot shows EGF-dependent interactors changes between WT (left) and P1019L (right) EGFR-expressing cells. Significantly regulated hits (according to their “significance B”) are shown as colored dots Data represent two independent experiments, each one including three WT and two P1019L clones. (E) WCE from the experiment shown in D were immunoblotted with the indicated antibodies. (F). WCE from the experiment showed in Fig. 3F were immunoblotted with the indicated antibodies (G, H). Cells were stimulated with EGF for the indicated time points and immunoblotted with the indicated antibodies. (I) Quantification of immunoblots showed in FigS10G-H, using ImageJ. Phospho-proteins expression is normalized on the total protein expression. Data represent the mean ±SEM of respectively four (ERK) and two (AKT) independent experiments. (J) CCK8 proliferation assay on EGFR knockout, WT- and P1019L-expressing cells, stimulated with EGF for 72 hours. Data represent the mean ± SEM of two independent experiments, each one including three WT and two P1019L clones (three technical replicates for each). (K, L) Wound-healing scratch assay to quantify cell migration. (K) Representative images of CTRL and SHIP2 KD EGFR P1019L expressing cells after EGF stimulation for 0 and 36h are shown. Numbers in red indicate the average absolute migration rate, where 0 indicates no migration and 1 equals full migration (two independent experiments, each including a different P1019L clone). (L) EGF-dependent migration of EGFR P1019L expressing cells was assessed after DMSO or SHIP2 inhibitor (AS1938909) treatment. Quantification is displayed as mean ±SEM of two independent experiments, each including a different P1019L clone. The p-values in K-L were evaluated by two-sided Student’s T-test.

Supplementary Tables
Supplementary Tables

Data Availability Statement

The mass spectrometry proteomics data in Thermo Scientific’s *.raw format have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository (https://www.ebi.ac.uk/pride/archive/) with the dataset identifier PXD004055. All other data supporting the findings of this study are available within the article and its supplemental information files or from the Lead Contact upon reasonable request.

RESOURCES