Graphical abstract
Keywords: Protein tyrosine phosphatases, Computational prediction, Peptide substrates, PTP1B
Abstract
Phosphotyrosine peptides are useful starting points for inhibitor design and for the search for protein tyrosine phosphatase (PTP) phosphoprotein substrates. To identify novel phosphopeptide substrates of PTP1B, we developed a computational prediction protocol based on a virtual library of protein sequences with known phosphotyrosine sites. To these we applied sequence-based methods, biologically meaningful filters and molecular docking. Five peptides were selected for biochemical testing of their potential as PTP1B substrates. All five peptides were equally good substrates for PTP1B compared to a known peptide substrate whereas appropriate control peptides were not recognized, showing that our protocol can be used to identify novel peptide substrates of PTP1B.
1. Introduction
To date, the knowledge of phosphatase substrates is far less than that of kinase substrates, and the identification of substrates is one of the key challenges in phosphatase research.1 Peptides can play a crucial role in finding new substrates.2 Furthermore, they can be used as starting point for chemical tool development, for example for inhibitors or pull-down baits.3 New peptide substrates can be identified, for example, by using phosphopeptide microarrays2, 4 or peptide libraries.5 Computational methods are a cheap alternative to the aforementioned approaches. They have so far been applied to find protein substrates for phosphatases,6 and for analysis in combination with peptide microarrays.2 Here, we have developed a computational protocol for identifying human protein derived peptide substrates of protein-tyrosine phosphatase 1B (PTP1B) as a model phosphatase. PTP1B is a well-studied phosphatase involved in cancer and diabetes.7 Our method led to the discovery of five biochemically confirmed novel peptide substrates of PTP1B.
2. Results and discussion
As starting point we constructed a virtual library composed of human proteins with known phosphotyrosine (pY) sites but not containing artificial pY-containing sequences. Phospho.ELM,8 PhosphoSitePlus9 and HPRD10 were used for this purpose. There are in total 3799 non-redundant protein sequences in the library, and 4931 non-redundant 11-mer peptides (with pY in the middle) were extracted. The length of 11 amino acids was chosen following previous studies testing PTP phosphopeptide substrate specificity.(b), 11
A sequence-based prediction that employs three methods (Fig. 1) was then carried out on these sequences. These methods are based on the dephosphorylation data from the human DEPhOsphorylation Database DEPOD.1, 12 While method 2 has been used before for PTP1B substrate identification,2b the other two methods have not, and they employ different algorithms than method 2. We reasoned that the consensus prediction result from all three methods would give us the best result for finding PTP1B peptide substrates.
The 4931 extracted pY-containing peptides were assigned information content-based scores (method 1) and also position specific scoring matrix (PSSM) scores2b (method 2) and ranked accordingly. The 3799 protein sequences, which contain 89874 tyrosines in total, were given to the pre-trained prediction model customized for PTP1B in Musite.13 5825 tyrosines of the 89874 were predicted to be dephosphorylation sites for PTP1B (specificity ⩾ 95%) (method 3). The top 10% peptides were taken from each method. The three predictions were combined together to get a common set, which are 139 non-redundant pY-containing peptides. They correspond to 231 pY sites on 191 original protein sequences (122 genes). Among them 15 genes code known protein substrates of PTP1B, and for 12 of them our predicted dephosphorylation sites match with reported ones (Supplementary Table S1). For the remaining 3 protein substrates dephosphorylation sites are unknown. DEPOD contains 39 substrates, and we did not identify 24 of those. Reasons for this lie in the strict cut-off that we applied to obtain the consensus results from the three methods of the sequence-based prediction. For example, the PTP1B substrate Src14 was predicted as good substrate only by method 3 (Musite cut-off 95%, sequence score for STEPQpYQPGEN 98.48%), but not by methods 1 (rank 803/4931) and 2 (rank 985/4931). Since a strict cut-off generally decreases the chances of finding false positives and since we did identify dephosphorylation sites of 13 proteins as well as 107 novel substrate candidates, we judged the result of our cut-off setting as acceptable.
Next, we applied biologically meaningful filters which are derived from DEPOD to the predicted substrates. The candidate proteins should be either the known substrates of certain PTPs that share common known substrates with PTP1B, or they should be mapped within the same KEGG15 and NCI-Nature PID16 pathways as PTP1B. The filtering resulted in 44 candidate protein substrates including the 15 known protein substrates of PTP1B.
In order to refine the prediction result, of the predicted 29 novel substrates of PTP1B 35 non-redundant 11-mer peptides with single pY sites and 7 with double pY sites were extracted from original protein sequences and converted to 3D chemical structures. Then these peptide structures were docked into the PTP1B catalytic site by GOLD17 and the docking solutions were further refined by FlexPepDock in Rosetta.18 After that, the top docking solutions were manually investigated to check if they satisfy the following criteria: (i) the phosphotyrosine points into the catalytic site; (ii) the N-C terminal orientation accords with reported ones; (iii) formation of at least two of the three key hydrogen bonds19 (Fig. 2). After this procedure, the docking solutions of 8 peptides were found to satisfy these criteria (see Table 1, peptides 1–5, plus two peptides from JAK1 and one from Fyn listed in the Supplementary Table S1).
Table 1.
ID | Source (gene name) | pY site | Sequence | Km [μM] | kcat/Km (×10−5 M−1 s−1) |
---|---|---|---|---|---|
1 | ARHGAP5 | 1090 | DPSDNpYAEPID | 28.5 ± 5.0 | 11.6 ± 3.4 |
2 | SKAP1 | 271 | EEEDIpYEVLPD | 14.7 ± 3.2 | 27.9 ± 13.9 |
3 | GAB2 | 293 | DNEDVpYTFKTP | 17.1 ± 4.1 | 9.2 ± 1.6 |
4 | ACP1 | 133 | IEDPpYpYGNDSD | 20.1 ± 4.6 | 8.9 ± 1.1 |
5 | ITGB1 | 783 | QENPIpYKSPIN | 21.7 ± 8.9 | 7.1 ± 3.0 |
6 | Src | 530 | STEPQpYQPGEN | 29.1 ± 5.3 | 9.0 ± 3.7 |
7 | — | — | KKKKpYPKK | Inactive | Inactive |
8 | NOS3 | 657 | LGSRApYPHFCA | >300 | n.d. |
n.d. = not determined. Peptides are acetylated at the N-terminus and contain an amide at the C-terminus.
To test if the predicted phosphopeptides would indeed be in vitro substrates of PTP1B, we synthesized six of these peptides. The two peptides from JAK1 were not synthesized and tested because the sequences are highly similar to the ones of the known substrates JAK220 and MET21 that were also detected by our procedure. Kinetic analysis of the corresponding JAK2 sequence KEpYpYKVK yielded similar values as obtained for the peptides tested here (Km = 12.7 μM; kcat/Km = 23.2 × 10−5 M−1 s−1),22 showing that these type of peptide sequences are dephosphorylated by PTP1B in vitro. With the exception of the Fyn peptide, which showed poor solubility, we then tested the enzymatic activity of PTP1B against the synthesized peptides using the in vitro EnzChek phosphate assay23 (Table 1). We also synthesized and tested a positive control peptide from Src as known substrate of PTP1B,3, 14 and a negative control peptide which PTP1B does not dephosphorylate.5e We found that the five peptides from the prediction were equally good substrates for PTP1B as the Src peptide, whereas as expected the negative control was not recognized. We also tested a peptide from our prediction (peptide 8 in Table 1), which was assigned low scores by all the three sequence-based methods, and found that PTP1B recognized it poorly. These results demonstrate that our protocol can successfully identify novel peptide substrates of PTP1B, and that it can differentiate between good and poor peptide substrates.
3. Conclusion
The here described computational approach lead to the discovery of five novel phosphopeptide substrates for PTP1B. These substrates can serve to develop substrate-based PTP1B inhibitors and other peptide-based tools.3 In this case the step of applying biologically meaningful filters could potentially be skipped. The here-identified new peptide substrates are derived from known phosphorylation sites on human proteins and biologically meaningful filters were applied in the procedure. Through this procedure we found 15 known protein substrates of PTP1B. Together, this suggests that the peptides can be used to search for potential physiological substrates of PTP1B. However, since we also detected JAK1, which is not a natural substrate of PTP1B, the candidates must be carefully studied in order to establish if they are natural substrates. The example of Src, which was not identified by our procedure, shows that our procedure is not exclusive; meaning that among the hits listed in Supplementary Table S1 potentially other substrate candidates can be found. Furthermore, we identified correct dephosphorylation sites for 12 of 15 protein substrates, and for the remaining 3 proteins the dephosphorylation sites are unknown, suggesting that the dephosphorylation sites on these proteins could be predicted by the phosphopeptides that we identified by our method. In addition, we expect that our protocol can be applied to other protein tyrosine phosphatases with known substrate dephosphorylation sites and protein structure data as a cheap alternative to peptide libraries and microarrays. Finally, our approach should also be applicable to other enzymatic posttranslational protein modifications (PTMs) if there is (1) available known PTM site information (sequence and position) and adequate data to construct the positive and negative data set used for the sequence-based prediction model; (2) applicable biological meaningful filters such as enzyme substrate scope or pathway involvement; and (3) available experimental structures or reliable structure models of enzymes or enzyme-substrate complexes for the molecular docking study.
4. Experimental section
4.1. Computational methods
For the method 1 with information content-based score, the sequence logo was produced from the multiple alignment of the known dephosphorylation sites of PTP1B (5 amino acids before and after the pY site) via WebLogo.24, 25 Then scores were assigned to query pY-containing peptides via adding up the height value of amino acid i at the jth position (−5 to −1 and 1 to 5) in the sequence logo.
For the PSSM score method, three amino acid frequency matrices (21 rows and 10 columns each) were generated: (i) the positive matrix value at row i and column j is the frequency of amino acid i (including blank where applicable) at the jth position (−5 to −1 and 1 to 5) in the multiple alignment of the known dephosphorylation sites of PTP1B (11-mer with 5 amino acids before and after the pY site); (ii) an analogous negative matrix which is based on the alignment of all the tyrosine sites (11-mer) from human proteins without any pY site annotation; (iii) a total matrix which is based on the alignment of all the 11-mer peptides in positive and negative matrices. Then the PSSM scores were assigned to query pY-containing peptides by adding up the overall peptide positions according to the formula in Figure 1.
Musite, which is a stand-alone application for phosphorylation site prediction, was adapted to predict dephosphorylation sites via customized prediction model training. We combined the protein sequences containing known dephosphorylation sites of PTP1B and human proteins without any pY site annotation together as input to train the prediction model. The positive and negative training sets are known dephosphorylation sites and the remaining tyrosine sites, respectively. A query protein sequence is given to this prediction model and the possible dephosphorylation sites on the protein are predicted.
For molecular docking procedure the PTP1B structure was taken from PDB entry 1EEO.19 200 docking solutions were generated for each peptide by GOLD. During the docking process, the following residues in catalytic site were treated as flexible: Gln-21, Arg-24, Lys-41, Arg-47, Asp-48, Ser-118, Lys-120, Asp-181, and Phe-182. The top 10 docking solutions were sent to FlexPepDock for further refinement and 10 × 200 docking solutions were generated for each peptide.
4.2. Peptide synthesis
All chemicals were obtained from Sigma–Aldrich (Steinheim, Germany) and VWR (Darmstadt, Germany) and used without further purification. Fmoc-protected amino acids and Rink amide AM resin (200–400 mesh, 0.7 mmol/g) were purchased from Novabiochem (Darmstadt, Germany). The following Fmoc protected l-amino acids were used: Fmoc-Asn(Trt)-OH, Fmoc-Gln(Trt)-OH, Fmoc-Glu(tBu)-OH, Fmoc-Gly-OH, Fmoc-Leu-OH, Fmoc-Lys(Boc)-OH, Fmoc-Pro-OH, Fmoc-Tyr(PO(OBzl)OH)-OH, Fmoc-Ala-OH, Fmoc-Phe-OH, Fmoc-Ile-OH, Fmoc-Arg(Pbf)-OH, Fmoc-Ser(tBu)-OH, Fmoc-Thr(tBu)-OH, Fmoc-Val-OH, Fmoc-His(Trt)-OH, Fmoc-Cys(Trt)-OH and Fmoc-Asp(tBu)-OH. 2-(1H-Benzotriazole-1-yl)-1,1,3,3-tetramethyluronium hexafluorophosphate (HBTU) and Triisopropylsilane (TIS) were also purchased from Novabiochem. HPLC–MS analysis and HPLC purifications were carried out on a Shimadzu High Performance Liquid Chromatograph/Mass Spectrometer LCMS-2010EV with an UV/Vis Photodiode array detector SPD-M20A Prominence. The solvent mixtures were H2O and MeCN with 0.05% TFA added. RP-HPLC analytical runs were carried out with a Macherey Nagel C18 EC 250/4.0 NUCLEODUR 100-5 C18 ec column and a pump rate of 1.5 ml/min. For semi-preparative separations a Macherey Nagel C18 VP 250/10 NUCLEODUR 110-5 C18 ec column and a pump rate of 5 ml/min was used. Mass spectra were recorded on a MALDI micro MX mass spectrometer (Waters, Manchester, UK) equipped with a reflectron analyzer, used in positive ion mode with delayed extraction activated.
Solid-phase peptide synthesis (SPPS) was performed as described before3b with standard Fmoc chemistry on Rink amide resin using an automated peptide synthesizer (Syro I, Multisyntech). Acetylation, cleavage and ether-precipitation were carried out as described before3b and the peptides were purified by HPLC. Analytical data is presented in Table 2.
Table 2.
Peptide | Calculated MW | Observed MW | HPLC gradient (% MeCN in H2O incl. 0.05% TFA) | Retention time (min) |
---|---|---|---|---|
1 | 1355 | 1378.0 [M+Na]+ | 10 → 50 | 9.8 |
2 | 1470 | 1493.0 [M+Na]+ | 10 → 50 | 13.6 |
3 | 1448 | 1449.0 [M+H]+ | 10 → 50 | 11.2 |
4 | 1485 | 1509.9 [M+Na]+ | 10 → 50 | 8.5 |
5 | 1422 | 1423.2 [M+H]+ | 10 → 50 | 10.5 |
6 | 1369 | 1392.9 [M+Na]+ | 10 → 50 | 3.0 |
7 | 1296 | 1296.3 [M+H]+ | 10 → 50 | 1.8 |
8 | 1341 | 1342.1 [M+H]+ | 10 → 50 | 8.8 |
4.3. Protein expression and purification
PTP1B was expressed and purified as described previously.3
4.4. Phosphatase assay
The in vitro EnzChek phosphate assay was carried out with the commercial kit from Molecular Probes as described previously.23
Acknowledgment
X.L. thanks EMBL/Marie Curie Actions for an EIPOD postdoctoral fellowship.
Footnotes
Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.bmc.2016.03.030.
Supplementary data
References and notes
- 1.Li X., Wilmanns M., Thornton J., Köhn M. Sci. Signal. 2013;6 doi: 10.1126/scisignal.2003203. rs10. [DOI] [PubMed] [Google Scholar]
- 2.(a) Sacco F., Tinti M., Palma A., Ferrari E., Nardozza A.P., Hooft van Huijsduijnen R., Takahashi T., Castagnoli L., Cesareni G. J. Biol. Chem. 2009;284:22048. doi: 10.1074/jbc.M109.002758. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Ferrari E., Tinti M., Costa S., Corallino S., Nardozza A.P., Chatraryamontri A., Ceol A., Cesareni G., Castagnoli L. J. Biol. Chem. 2011;286:4173. doi: 10.1074/jbc.M110.157420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.(a) Meyer C., Hoeger B., Temmerman K., Tatarek-Nossol M., Pogenberg V., Bernhagen J., Wilmanns M., Kapurniotu A., Köhn M. ACS Chem. Biol. 2014;9:769. doi: 10.1021/cb400903u. [DOI] [PubMed] [Google Scholar]; (b) Meyer C., Hoeger B., Chatterjee J., Köhn M. Bioorg. Med. Chem. 2015;23:2848. doi: 10.1016/j.bmc.2015.03.015. [DOI] [PubMed] [Google Scholar]
- 4.(a) Köhn M., Gutierrez-Rodriguez M., Jonkheijm P., Wetzel S., Wacker R., Schroeder H., Prinz H., Niemeyer C.M., Breinbauer R., Szedlacsek S.E., Waldmann H. Angew. Chem., Int. Ed. 2007;46:7700. doi: 10.1002/anie.200701601. [DOI] [PubMed] [Google Scholar]; (b) Gao L., Sun H., Yao S.Q. Biopolymers. 2010;94:810. doi: 10.1002/bip.21533. [DOI] [PubMed] [Google Scholar]; (c) Sun H., Tan L.P., Gao L., Yao S.Q. Chem. Commun. (Camb) 2009:677. doi: 10.1039/b817853d. [DOI] [PubMed] [Google Scholar]; (d) Sun H., Lu C.H., Uttamchandani M., Xia Y., Liou Y.C., Yao S.Q. Angew. Chem., Int. Ed. 2008;47:1698. doi: 10.1002/anie.200703473. [DOI] [PubMed] [Google Scholar]; (e) Espanel X., Hooft van Huijsduijnen R. Methods. 2005;35:64. doi: 10.1016/j.ymeth.2004.07.009. [DOI] [PubMed] [Google Scholar]
- 5.(a) Huyer G., Kelly J., Moffat J., Zamboni R., Jia R., Gresser M.J., Ramachandran C. Anal. Biochem. 1998;258:19. doi: 10.1006/abio.1997.2541. [DOI] [PubMed] [Google Scholar]; (b) Pellegrini M.C., Liang H., Mandiyan S., Wang K., Yuryev A., Vlattas I., Sytwu T., Li Y.C., Wennogle L.P. Biochemistry. 1998;37:15598. doi: 10.1021/bi981427+. [DOI] [PubMed] [Google Scholar]; (c) Wälchli S., Espanel X., Harrenga A., Rossi M., Cesareni G., Hooft van Huijsduijnen R. J. Biol. Chem. 2004;279:311. doi: 10.1074/jbc.M307617200. [DOI] [PubMed] [Google Scholar]; (d) Ren L., Chen X., Luechapanichku R., Selner N.G., Meyer T.M., Wavreille A., Chan R., Lorio C., Zhou X., Neel B.G., Pei D. Biochemistry. 2011;50:2339. doi: 10.1021/bi1014453. [DOI] [PMC free article] [PubMed] [Google Scholar]; (e) Vetter S.W., Keng Y.F., Lawrence D.S., Zhang Z.-Y. J. Biol. Chem. 2000;275:2265. doi: 10.1074/jbc.275.4.2265. [DOI] [PubMed] [Google Scholar]
- 6.(a) Sacco F., Boldt K., Calderone A., Panni S., Paoluzi S., Castagnoli L., Ueffing M., Cesareni G. Front Genet. 2014;5:115. doi: 10.3389/fgene.2014.00115. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Forti F.L. Integr. Biol. (Camb). 2015;7:73. doi: 10.1039/c4ib00186a. [DOI] [PubMed] [Google Scholar]
- 7.He R., Zeng L.-F., He Y., Zhang S., Zhang Z.-Y. FEBS J. 2013;280:731. doi: 10.1111/j.1742-4658.2012.08718.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Diella F., Gould C.M., Chica C., Via A., Gibson T.J. Nucleic Acids Res. 2008;36:D240. doi: 10.1093/nar/gkm772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hornbeck P.V., Chabra I., Kornhauser J.M., Skrzypek E., Zhang B. Proteomics. 2004;4:1551. doi: 10.1002/pmic.200300772. [DOI] [PubMed] [Google Scholar]
- 10.Keshava Prasad T.S., Goel R., Kandasamy K., Keerthikumar S., Kumar S., Mathivanan S., Telikicherla D., Raju R., Shafreen B., Venugopal A., Balakrishnan L., Marimuthu A., Banerjee S., Somanathan D.S., Sebastian A., Rani S., Ray S., Harrys Kishore C.J., Kanth S., Ahmed M., Kashyap M.K., Mohmood R., Ramachandra Y.L., Krishna V., Rahiman B.A., Mohan S., Ranganathan P., Ramabadran S., Chaerkady R., Pandey A. Nucleic Acids Res. 2009;37:D767. doi: 10.1093/nar/gkn892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Barr A.J., Ugochukwu E., Lee W.H., King O.N., Filippakopoulos P., Alfano I., Savitsky P., Burgess-Brown N.A., Müller S., Knapp S. Cell. 2009;136:352. doi: 10.1016/j.cell.2008.11.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Duan G., Li X., Köhn M. Nucleic Acids Res. 2015;43:D531. doi: 10.1093/nar/gku1009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gao J., Thelen J.J., Dunker A.K., Xu D. Mol. Cell Proteomics. 2010;9:2586. doi: 10.1074/mcp.M110.001388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bjorge J.D., Pang A., Fujita D.J. J. Biol. Chem. 2000;275:41439. doi: 10.1074/jbc.M004852200. [DOI] [PubMed] [Google Scholar]
- 15.Kanehisa M., Goto S., Sato Y., Furumichi M., Tanabe M. Nucleic Acids Res. 2012;40:D109. doi: 10.1093/nar/gkr988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Schaefer C.F., Anthony K., Krupa S., Buchoff J., Day M., Hannay T., Buetow K.H. Nucleic Acids Res. 2009;37:D674. doi: 10.1093/nar/gkn653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Verdonk M.L., Cole J.C., Hartshorn M.J., Murray C.W., Taylor R.D. Proteins. 2003;52:609. doi: 10.1002/prot.10465. [DOI] [PubMed] [Google Scholar]
- 18.London N., Raveh B., Cohen E., Fathi G., Schueler-Furman O. Nucleic Acids Res. 2011;39:W249. doi: 10.1093/nar/gkr431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Sarmiento M., Puius Y.A., Vetter S.W., Keng Y.F., Wu L., Zhao Y., Lawrence D.S., Almo S.C., Zhang Z.Y. Biochemistry. 2000;39:8171. doi: 10.1021/bi000319w. [DOI] [PubMed] [Google Scholar]
- 20.Myers M.P., Andersen J.N., Cheng A., Tremblay M.L., Horvath C.M., Parisien J.P., Salmeen A., Barford D., Tonks N.K. J. Biol. Chem. 2001;276:47771. doi: 10.1074/jbc.C100583200. [DOI] [PubMed] [Google Scholar]
- 21.Sangwan V., Paliouras G.N., Abella J.V., Dubé N., Monast A., Tremblay M.L., Park M. J. Biol. Chem. 2008;283:34374. doi: 10.1074/jbc.M805916200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Meyer, C. (PhD Thesis), September 2012, Technical University Munich.
- 23.McParland V., Varsano G., Li X., Thornton J., Baby J., Aravind A., Meyer C., Pavic K., Rios P., Köhn M. Biochemistry. 2011;50:7579. doi: 10.1021/bi201095z. [DOI] [PubMed] [Google Scholar]
- 24.Crooks G.E., Hon G., Chandonia J.M., Brenner S.E. Genome Res. 2004;14:1188. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Scheider T.D., Stephens R.M. Nucleic Acids Res. 1990;18:6097. doi: 10.1093/nar/18.20.6097. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.