Abstract
In prokaryotes, protein phosphorylation plays a critical role in regulating a broad spectrum of biological processes and occurs mainly on various amino acids, including serine (S), threonine (T), tyrosine (Y), arginine (R), aspartic acid (D), histidine (H) and cysteine (C) residues of protein substrates. Through literature curation and public database integration, here we reported an updated database of phosphorylation sites (p-sites) in prokaryotes (dbPSP 2.0) that contains 19,296 experimentally identified p-sites in 8,586 proteins from 200 prokaryotic organisms, which belong to 12 phyla of two kingdoms, bacteria and archaea. To carefully annotate these phosphoproteins and p-sites, we integrated the knowledge from 88 publicly available resources that covers 9 aspects, namely, taxonomy annotation, genome annotation, function annotation, transcriptional regulation, sequence and structure information, family and domain annotation, interaction, orthologous information and biological pathway. In contrast to version 1.0 (~30 MB), dbPSP 2.0 contains ~9 GB of data, with a 300-fold increased volume. We anticipate that dbPSP 2.0 can serve as a useful data resource for further investigating phosphorylation events in prokaryotes. dbPSP 2.0 is free for all users to access at: http://dbpsp.biocuckoo.cn.
Subject terms: Sequence annotation, Protein databases
Introduction
As one of the most well-characterized and important post-translational modifications (PTMs), protein phosphorylation plays an essential role in almost all signalling pathways and biological processes, from eukaryotes to prokaryotes1–5. This reversibly dynamic process is precisely modulated by protein kinases (PKs) and protein phosphatases (PPs), which are involved in linking or removing a phosphate group at specific residues of protein substrates1–5. The first eukaryotic phosphoprotein was discovered in 1883 by Olof Hammarsten, a Swedish biochemist, who detected phosphorous in a secreted protein, casein, from milk6. Although later studies demonstrated that many proteins can be phosphorylated in eukaryotes, it was long debated whether protein phosphorylation also exists in prokaryotes until the discovery of isocitrate dehydrogenase in Escherichia coli, the first identified prokaryotic phosphoprotein, in 19797,8. In contrast with eukaryotic phosphorylation, which occurs mainly at specific serine (S), threonine (T) and tyrosine (Y) residues of proteins5, prokaryotic protein phosphorylation can occur at additional types of amino acids, such as arginine (R), aspartic acid (D), histidine (H) and cysteine (C)1,9–13. Given the importance of phosphorylation in the regulation of protein functions11–13, the identification of novel phosphorylation sites (p-sites) in proteins is fundamental for understanding the molecular mechanism and regulatory roles of prokaryotic phosphorylation.
Previously, experimental identification of p-sites with conventional biochemical assays was usually labour intensive, time consuming and expensive and was accomplished in a low-throughput (LTP) manner. The LTP methods mainly included site-directed mutagenesis (SDM) of candidate p-sites14, in vitro kinase assay (IKA) to identify potential kinase-specific p-sites15, detection of p-sites in purified proteins with LTP mass spectrometry (LTP-MS)16, and N-terminal sequencing of phosphopeptides (NSP)17. The quality of p-sites identified in LTP studies is higher, because usually multiple assays were performed, and the biological functions of p-sites were also carefully analyzed. Recently, advances in the development of proteomic techniques using high-throughput MS (HTP-MS) have enabled the large-scale phosphoproteomic identification of p-sites in prokaryotic proteins18–21. For example, Macek et al. conducted phosphoproteomic profiling to detect 54 phosphoserine (pS), 16 phosphothreonine (pT) and 8 phosphotyrosine (pY) residues of 78 proteins in Bacillus subtilis, as well as 81 pS/pT/pY sites of 79 E. coli phosphoproteins18,19. For arginine phosphorylation, Elsholz et al. systematically identified 121 phosphoarginine (pR) residues in 87 B. subtilis proteins20, whereas Schmidt et al. later quantitatively characterized 134 phosphoproteins with 217 pR sites in B. subtilis21. More recently, Lai et al. detected 159 phosphohistidine (pH) and 69 phosphoaspartic acid (pD) sites of 197 phosphopeptides in nine prokaryotic organisms13. Because an increasing number of LTP and HTP p-site investigations have been reported, the collection, curation, integration and annotation of known phosphoproteins and p-sites in prokaryotes will provide invaluable information for better understanding the host-pathogen interaction and development of antimicrobial agents.
In 2015, we developed a new database of phosphorylation sites in prokaryotes (dbPSP) 1.0, which contained 7,391 experimentally identified p-sites, including 2,709 pS, 2,174 pT, 2,187 pY, 142 pR, 84 pD, 90 pH and 5 phosphocysteine (pC) sites, in 3,750 phosphoproteins of 96 prokaryotes22. Compared with the second largest resource, the Phosphorylation Site Database, which curated approximately 1,400 prokaryotic p-sites23, dbPSP 1.0 had a > 4-fold greater data volume. At that time, few annotations were provided, except limited information on p-sites. Due to the large number of prokaryotic p-sites found in recent studies, here we created dbPSP 2.0, which contains 19,296 known p-sites in 8,586 proteins from 200 prokaryotic organisms, through literature curation and public database integration (Fig. 1a, Supplementary Table 1). Furthermore, we carefully annotated these phosphoproteins and p-sites through integrating the knowledge from 88 publicly accessible databases, covering 9 aspects. In contrast with dbPSP 1.0 (~30 MB), this updated database possesses ~9 GB of data, with a 300-fold increased volume. We confirmed that dbPSP 2.0 will be continuously updated and can provide a much more useful resource for exploring protein phosphorylation in prokaryotes.
Results
dbPSP update
Entries of newly reported p-sites
Compared with version 1.0, version 2.0 contains 11,905 new entries (Fig. 1b). Through literature curation and public database integration, dbPSP 2.0 contains 19,296 non-redundant p-sites on seven different types of amino acid residues in 8,586 substrates from 200 prokaryotic species (Supplementary Table 1). In our dataset, there are 18,576 and 671 p-sites derived from HTP and LTP studies, respectively. The derivation of 96.27% known p-sites from HTP studies indicated the importance and usefulness of MS-based phosphoproteomic profiling for studying prokaryotic phosphorylation. In addition to version 1.0, we also compared dbPSP 2.0 to other existing databases, including the Phosphorylation Site Database23, UniProt24, dbPTM 201925, SysPTM 2.026 and PHOSIDA27, and our database contained a much higher number of known phosphoproteins and p-sites in prokaryotes (Fig. 1b). For each p-site, its corresponding gene name, UniProt accession number, organism, phylum, phosphorylated position, residue type, flanking peptide, data type, experimental method and original reference(s) have been present (Supplementary Table 1).
Distribution of phosphoproteins and p-sites for different residue types and different phyla
In dbPSP 1.0, known p-sites were taken from 96 prokaryotic organisms belonging to 11 phyla, Crenarchaeota, Euryarchaeota, Proteobacteria, Actinobacteria, Firmicutes, Cyanobacteria, Deinococcus-Thermus, Tenericutes, Spirochaetes, Chlamydiae and Thermotogae22. Due to the new data accumulation, known p-sites have been extended to 200 prokaryotic species in 12 phyla by adding a new phylum, Bacteroidetes (Fig. 2a). The distribution of numbers of p-sites among different phyla was analyzed, and it was observed that more p-sites were identified in Proteobacteria and Actinobacteria than in other phyla, with proportions of 27.95% and 23.13%, respectively (Fig. 2a). The Proteobacteria phylum comprises a number of extensively studied microorganisms, such as the most widely used model organism E. coli in microbiological studies7,8, and a human pathogen Shigella flexneri that causes bacillary dysentery mainly in children and results in 14,000 deaths per year28. In Actinobacteria phylum, one of the most notorious species is Mycobacterium tuberculosis, which is the causative agent of tuberculosis (TB) and annually causes 1.5 million deaths29. Due to the high virulence of M. tuberculosis, two related species including the slow-growing Mycobacterium bovis30 and the fast-growing Mycobacterium smegmatis30 were established as models to study mycobacterial physiology. Additionally, we analyzed the distribution of p-sites on different types of amino acid residues and found that pS, pT and pY sites appear more frequently than other types of residues and occupy proportions of 39.67%, 31.55% and 19.87%, respectively (Fig. 2b). Moreover, the distribution of different types of p-sites among the 12 phyla was evaluated (Fig. 2c). The most pR sites were detected in Firmicutes, whereas Proteobacteria had the highest number of pD and pH sites (Fig. 2c). Additional detailed data statistics can be viewed at http://dbpsp.biocuckoo.cn/Statistics.php.
Coverage of phosphoproteins in different species
Due to data limitation, here we only calculated the coverage values of phosphoproteins in 50 species with ≥10 phosphorylated substrates (Supplementary Table 2). For each prokaryote, its proteome set was downloaded from UniProt24 by searching the corresponding Proteome ID, e.g., UP000001018 for Sulfolobus acidocaldarius (strain ATCC 33909/DSM 639/JCM 8929/NBRC 15157/NCIMB 11770) (https://www.uniprot.org/proteomes/?query=taxonomy:330779). Then the proportion of phosphoproteins against all protein products were counted, and top 10 species with higher coverage values were shown. From the results, we found that the coverage values of the 10 prokaryotes ranged from 8.47% (Staphylococcus aureus) to 36.06% (S. acidocaldarius) (Fig. 3a). Previously, it was estimated that about 30% of human proteins might be phosphorylated31, and a later study demonstrated that at least 75% of human proteins are phosphorylated in vivo32. Thus, when more and more phosphoproteomic studies are performed for prokaryotes, the coverage values of their phosphoproteins will be undoubtedly increased.
New annotations
Multiple-layer annotation of prokaryotic phosphoproteins
For convenience, dbPSP 2.0 was organized as a phosphoprotein-centred database. To provide an integrative annotation of known phosphoproteins and p-sites, we provided a variety of cross-references to public data sources. For example, gene and protein names were taken mainly from UniProt24, whereas corresponding accession numbers were integrated from UniProt24, Ensembl33, EMBL34, KEGG35 and NCBI GenBank36. Moreover, functional descriptions, protein/nucleotide sequences, and keywords were derived from UniProt24 to provide the basic information for each phosphoprotein entry, while the primary references with PMIDs were provided for each p-site. The gene ontology (GO) annotations in the Gene Ontology resource37 were also included if available. Furthermore, the knowledge from 88 additional public resources, such as ChEMBL38, BacDive39, PDB40, IUPred2A41, InterPro42, BioGRID43, EggNOG 5.044 and Reactome45, was integrated to comprehensively annotate the prokaryotic phosphoproteins. These resources covered 9 aspects, namely, taxonomy annotation, genome annotation, function annotation, transcriptional regulation, sequence and structure information, family and domain annotation, interaction, orthologous information and biological pathway (Fig. 1a). A brief summary of all public resources integrated in dbPSP 2.0 can be accessed at: http://dbpsp.biocuckoo.cn/Links.php. For these resources, the annotation datasets can be downloaded at http://dbpsp.biocuckoo.cn/Download.php.
Dynamic 3D structure details for phosphoproteins
For each phosphoprotein with available 3D structures characterized by X-ray crystallography or NMR spectroscopy, a representative 3D structure was selected for intuitive visualization. Users can select all or specific p-sites for visualizing their locations on protein structures.
HTP p-site classification
In phosphoproteomic studies, phosphopeptides were derived from mass spectrometry spectral datasets, usually with a false discovery rate (FDR) of 0.01 at the peptide-spectrum match (PSM), peptide and protein level for quality control. To pinpoint an exact p-site in a phosphopeptide, a localization probability (LP) score could be calculated by a variety of tools, such as MaxQuant46. LP scores range from 0 to 1, and a higher LP score represents a higher probability of a detected site being a real p-site. Since HTP p-sites were identified from different studies with different confidence, we classified all collected HTP p-sites into four classes based on their LP scores if available, namely, class I (LP > 0.75), class II (LP ≤ 0.75 and >0.5), class III (LP ≤ 0.5 and ≥0.25), and class IV (LP < 0.25), as previously described46. In most of these HTP studies, different reference databases, distinct search engines and/or diverse parameter configurations were adopted for phosphopeptide detection in different organisms. Thus, the aggregation of false positive identifications might result in a considerable higher FPR value in the cumulative dataset. A re-analysis of all raw MS datasets under a unified platform will generate phosphopeptides with much higher quality, although such an effort is not within the scope of dbPSP 2.0, which directly collected known p-sites from published literature.
Multi-alignment (MSA) of orthologs
Here, potential orthologues of known phosphoproteins were obtained from Clusters of Orthologous Groups of proteins (COG)47. For each orthologous group, all protein sequences were multi-aligned using MUSCLE48, and a conservation ratio was calculated for the sequences containing the same types of phosphorylatable residues against all sequences in the group. The distribution of the conservation ratio ranged from 0 to 1 was illustrated for all p-sites in the orthologous groups (Fig. 3b), and we only detected 227 p-sites with a conservation ratio > 0.9 (Supplementary Table 3). These highly conserved p-sites might be useful for the investigation of conserved functions of phosphorylation in prokaryotes.
Browse lists and detailed phosphoprotein information page
dbPSP 2.0 was developed with a user-friendly website interface, and multiple browse and search options were implemented to conveniently query the data. Here, we chose B. subtilis ClpP, an ATP-dependent Clp protease proteolytic subunit, as an example to introduce the usage of dbPSP 2.0. Two browse options, ‘Browse by phyla’ (Fig. 4a) and ‘Browse by residue types’ (Fig. 4b), are accessible to browse the data. In the option ‘Browse by phyla’, 12 representative diagrams for all phyla are listed. The user can click the phylum to link the taxonomic category of the given phylum (Fig. 4a). The user can select ‘Bacillus subtilis (strain 168)’ to retrieve a list of phosphoproteins in a tabular format with ‘dbPSP ID’, ‘UniProt Accession’, ‘Gene Name’, ‘Protein Name’ and ‘Organism’ (Fig. 4a). In the option ‘Browse by residue types’, the user can choose one of the 7 residue types to browse all phosphoproteins with the given phosphorylation residue type. For example, by clicking the diagram of arginine, all proteins with pR sites are listed (Fig. 4b). Through selecting ‘PP04832’, the dbPSP ID of ClpP (Fig. 4a,b), the detailed phosphoprotein page for ClpP, is displayed (Fig. 4c,d). For a brief overview, the dbPSP ID, protein/gene names, organism, and dynamic structure details are presented (Fig. 4c). The ‘Sites’ part provides mainly detailed information on p-sites, and the original peptide and primary reference can be shown by clicking the ‘View’ button of each p-site (Fig. 4c). To access additional information on the phosphoprotein, users can click the label ‘Annotation’ on the left menu and select the interesting aspect to access the corresponding resources (Fig. 4d). For each resource, the annotation details are presented on a new page after clicking the ‘More’ icon (Fig. 4d). In addition to the browse options, multiple search options, including ‘Substrate Search’, ‘Peptide Search’, ‘Advanced Search’, ‘Batch Search’ and ‘BLAST Search’, were also developed for users to easily access the database.
Sequence preferences of different types of p-sites
Due to the limited number of pC sites, here we only analyzed the sequence preferences of pS, pT, pY, pR, pD and pH sites by using pLogo49 for bacteria and archaea (Fig. 5). For prokaryotic pS, pT and pY sites, we also compared their sequence preferences to those of eukaryotic phosphorylation, including 382,105 pS, 123,247 pT and 59,824 pY sites by integrating two previously developed databases, dbPAF50 and dbPPT51. For pS and pT sites in archaea, R or lysine (K) residues most frequently occur at the +1 position, with a lesser extent at the +2 position (Fig. 5a). In bacteria, K residues are over-represented at the −1 position for pS sites, whereas S, D, glycine (G) and proline (P) are enriched at the −2, −1, +1 and +2 positions for pT sites, respectively (Fig. 5a). For pY sites, S residues frequently appear at the +1 position for eukaryotic phosphorylation, whereas K residues preferentially appear at the −2 position for bacteria and the −1 and −2 positions for archaea (Fig. 5a). For prokaryotic pD sites, methionine (M) and P residues are over-represented at the +3 and +4 positions around p-sites in bacteria but not archaea (Fig. 5b). For pH sites, S residues preferentially appear at the −1 position for bacteria (Fig. 5b). Due to data limitation, the sequence preference of pR sites in only bacteria was analyzed, and asparagine (N) residues are enriched at the −1 position (Fig. 5b).
Application of dbPSP
After the publication of dbPSP 1.0, it has been visited more than 180,000 times and has served as a highly useful resource for studying prokaryotic phosphorylation50,52–56. For example, Garcia-Garcia et al. re-analyzed the phosphoproteomic datasets in dbPSP and found that phosphoproteins are essential for the regulation of the cell cycle and DNA-mediated processes in bacteria52. With the help of dbPSP, Venkat et al. experimentally validated that phosphorylation of S280 decreases the enzyme activity of malate dehydrogenase (MDH) in E. coli53. Additionally, Lin et al. utilized p-site information in dbPSP to analyze phosphoproteomic data and dissected the dynamic alteration of phosphorylation in various phosphoproteins during antibiotic treatment and resistance54. Moreover, Hasan et al. adopted pS and pT sites in dbPSP as training datasets and developed a useful tool, Microbial Phosphorylation Site predictor (MPSite), for predicting microbial p-sites55. In addition, the phosphorylation data of representative prokaryotes from dbPSP was utilized for kinase motif enrichment analysis, and the results demonstrated that most eukaryotic phosphorylation motifs could not be recovered in prokaryotes56.
In dbPSP 2.0, we collected and curated newly identified p-sites in prokaryotic phosphoproteins, which could present more complete information on phosphorylation in prokaryotes. Furthermore, dbPSP 2.0 has rich annotations for phosphoproteins and p-sites, which is critical for exploring the function and mechanism of phosphorylation events. In addition, the MSA results of orthologues were provided in this database and will be important for discovering conserved functional p-sites in prokaryote cells. Based on previous studies, dbPSP could work as a well-curated data resource of prokaryotic phosphoproteins to provide helpful support for phosphoproteomic analysis, tool development, and the investigation of prokaryotic phosphorylation events. We anticipate that the updated dbPSP 2.0 could be a comprehensive data resource for better understanding the importance of protein phosphorylation in prokaryotes.
Discussion
Protein phosphorylation is one of most well-studied PTMs and is reported to be involved in regulating numerous cellular processes in prokaryotic cells8,57. In 2015, we collected 7,391 known p-sites of 3,750 proteins in 96 prokaryotes from published literature and developed dbPSP 1.022 to contain these datasets. Due to the accumulation of phosphorylation information, here we released dbPSP 2.0 by adding 11,905 new entries to include newly discovered phosphoproteins and p-sites in prokaryotes. Furthermore, the rich annotations derived from 88 public databases were integrated. In total, dbPSP 2.0 contained 19,296 known p-sites in 8,586 phosphoproteins and occupied the size of ~9 GB, with a 300-fold increase compared to that of version 1.0.
In this study, to cover the diverse biological roles of prokaryotic phosphoproteins, we included multiple-layer knowledge from other databases to comprehensively annotate phosphoproteins. For example, the prokaryotic ClpP enzyme plays an important role in modulating various biological processes, such as cellular stress response, pathogenesis and homeostasis58. Inhibiting the function of ClpP was reported to affect the infectivity and virulence of microbial pathogens59. Moreover, the arginine phosphorylation of ClpP was essential for maintaining its function20,21,60. As shown in Fig. 6, the B. subtilis protease ClpP is annotated as a serine peptidase and participates in eliminating damaged proteins during heat shock, and its activity can be repressed by CtsR as well as by 20,697 compounds. Meanwhile, ClpP might interact with 9 partners and self-assemble in hexameric ring structures (Fig. 6). In particular, we found nearly 15,700 records from 6 orthologous databases to demonstrate that ClpP is a highly conserved subunit in prokaryotes, and the results are consistent with previous studies. In addition, the functional domain and p-site information of ClpP were also provided. In dbPSP 2.0, the curated data resources of p-sites and phosphoproteins as well as annotation information are downloadable at http://dbpsp.biocuckoo.cn/Download.php.
In summary, the dbPSP 2.0 database will be continuously maintained and updated when new p-sites in prokaryotes are identified. In addition to adding additional annotations from other public databases, we will further develop computational tools for the prediction of prokaryotic p-sites. We anticipate that this database can provide helpful support for better understanding the regulatory mechanisms and functions of phosphorylation in prokaryotes.
Methods
Data collection and update
In dbPSP 1.0, we manually collected 7,391 p-sites in 3,750 non-redundant prokaryotic phosphoproteins from the literature22. In this study, the phosphorylation events in prokaryotes newly reported since 2014 were considered and collected. To obtain known p-sites from the literature, we searched the PubMed database with multiple general keywords, such as ‘bacteria phosphoproteomics’, ‘archaea phosphorylation’, ‘archaebacteria phospho-site’. All the retrieved 39,997 articles were manually curated to collect the experimentally identified prokaryotic p-sites, and collected p-sites were then mapped to protein sequences obtained from UniProt (release 2019_05)24 (Fig. 1a). We also integrated the prokaryotic p-sites from other public databases, with 1,400, 427, 419, 345 and 317 p-sites from Phosphorylation Site Database23, UniProt24, dbPTM 201925, SysPTM 2.026 and PHOSIDA27, respectively (Fig. 1b). These datasets were cross-checked with our manually collected dataset and then integrated into the dbPSP 2.0 database.
Structure data collection and prediction
The 3D structures of phosphoproteins for intuitive visualization were obtained from the PDB40 if available. A JavaScript molecular visualization library, 3Dmol.js61, was used to support the dynamic structure chart in the browser interface. In addition, the probabilities of disordered binding regions and disorder propensity values were predicted by using ANCHOR241 and IUPred241, respectively. The details are provided on the phosphoprotein page.
Web interface construction
HTML, PHP and JavaScript were applied to develop the web interface as the front-end. The MySQL server was applied to manage the data as the back-end. The backlog and cache data will be cleared regularly, and the dbPSP database will be maintained and optimized continuously.
Supplementary information
Acknowledgements
This work was supported by grants from the Special Project on Precision Medicine under the National Key R&D Program (2017YFC0906600 and 2018YFC0910500), Natural Science Foundation of China (81701567, 31930021, 31970633 and 31671360), China Postdoctoral Science Foundation (2018M642816 and 2019T120648), Fundamental Research Funds for the Central Universities (2017KFXKJC001 and 2019kfyRCPY043), Changjiang Scholars Program of China, and program for HUST Academic Frontier Youth Team. The manuscript has been edited by American Journal Experts (AJE) prior to submission.
Author contributions
Y.X. and D.P. conceived and supervised this study. Y.S. and Y.Z. collected known p-sites, integrated various data resources and developed this updated database. S.L., C.W., J.Z. and H.X. participated in processing data resources. Y.X., D.P. and Y.S. wrote the manuscript. All authors read and approved the final manuscript.
Data availability
All the collected phosphoproteins, p-sites and various annotations are freely available at http://dbpsp.biocuckoo.cn/Download.php. For convenience, phosphorylation datasets can be downloaded in three data types, including the total dataset, the phylum-specific datasets, and the residue-specific datasets The datasets of phosphoproteins in prokaryotes have been uploaded to figshare62, 10.6084/m9.figshare.11436879. The annotation datasets were classified by their functional categories, and users can choose the corresponding options based on their own purposes. All data sets in dbPSP are made available under a Creative Commons CC 3.0 BY license (https://creativecommons.org/licenses/by/3.0/cn/).
Code availability
The source code of dbPSP 2.0 database has been uploaded to GitHub: https://github.com/BioCUCKOO/dbPSP2.0.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Ying Shi, Ying Zhang.
Contributor Information
Di Peng, Email: pengdi@hust.edu.cn.
Yu Xue, Email: xueyu@hust.edu.cn.
Supplementary information
is available for this paper at 10.1038/s41597-020-0506-7.
References
- 1.Mijakovic I, Grangeasse C, Turgay K. Exploring the diversity of protein modifications: special bacterial phosphorylation systems. FEMS Microbiol Rev. 2016;40:398–417. doi: 10.1093/femsre/fuw003. [DOI] [PubMed] [Google Scholar]
- 2.Esser Dominik, Hoffmann Lena, Pham Trong Khoa, Bräsen Christopher, Qiu Wen, Wright Phillip C., Albers Sonja-Verena, Siebers Bettina. Protein phosphorylation and its role in archaeal signal transduction. FEMS Microbiology Reviews. 2016;40(5):625–647. doi: 10.1093/femsre/fuw020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Stock AM, Robinson VL, Goudreau PN. Two-component signal transduction. Annu Rev Biochem. 2000;69:183–215. doi: 10.1146/annurev.biochem.69.1.183. [DOI] [PubMed] [Google Scholar]
- 4.Möglich Andreas. Signal transduction in photoreceptor histidine kinases. Protein Science. 2019;28(11):1923–1946. doi: 10.1002/pro.3705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Guo Yaping, Peng Di, Zhou Jiaqi, Lin Shaofeng, Wang Chenwei, Ning Wanshan, Xu Haodong, Deng Wankun, Xue Yu. iEKPD 2.0: an update with rich annotations for eukaryotic protein kinases, protein phosphatases and proteins containing phosphoprotein-binding domains. Nucleic Acids Research. 2018;47(D1):D344–D350. doi: 10.1093/nar/gky1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Tagliabracci VS, Pinna LA, Dixon JE. Secreted protein kinases. Trends in biochemical sciences. 2013;38:121–130. doi: 10.1016/j.tibs.2012.11.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Garnak M, Reeves HC. Phosphorylation of Isocitrate dehydrogenase of Escherichia coli. Science. 1979;203:1111–1112. doi: 10.1126/science.34215. [DOI] [PubMed] [Google Scholar]
- 8.Cozzone AJ. Protein phosphorylation in prokaryotes. Annu Rev Microbiol. 1988;42:97–125. doi: 10.1146/annurev.mi.42.100188.000525. [DOI] [PubMed] [Google Scholar]
- 9.Matthews Harry R. Protein kinases and phosphatases that act on histidine, lysine, or arginine residues in eukaryotic proteins: A possible regulator of the mitogen-activated protein kinase cascade. Pharmacology & Therapeutics. 1995;67(3):323–350. doi: 10.1016/0163-7258(95)00020-8. [DOI] [PubMed] [Google Scholar]
- 10.Khoury, G. A., Baliban, R. C. & Floudas, C. A. Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database. Sci Rep1, 90 (2011). [DOI] [PMC free article] [PubMed]
- 11.Trentini DB, et al. Arginine phosphorylation marks proteins for degradation by a Clp protease. Nature. 2016;539:48–53. doi: 10.1038/nature20122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Fuhs SR, Hunter T. pHisphorylation: the emergence of histidine phosphorylation as a reversible regulatory modification. Curr Opin Cell Biol. 2017;45:8–16. doi: 10.1016/j.ceb.2016.12.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lai SJ, et al. Site-specific His/Asp phosphoproteomic analysis of prokaryotes reveals putative targets for drug resistance. BMC Microbiol. 2017;17:123. doi: 10.1186/s12866-017-1034-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kitanishi K, et al. Identification and functional and spectral characterization of a globin-coupled histidine kinase from Anaeromyxobacter sp. Fw109-5. J Biol Chem. 2011;286:35522–35534. doi: 10.1074/jbc.M111.274811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Yadav GS, Ravala SK, Malhotra N, Chakraborti PK. Phosphorylation Modulates Catalytic Activity of Mycobacterial Sirtuins. Front Microbiol. 2016;7:677. doi: 10.3389/fmicb.2016.00677. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Villarino A, et al. Proteomic identification of M. tuberculosis protein kinase substrates: PknB recruits GarA, a FHA domain-containing protein, through activation loop-mediated interactions. J Mol Biol. 2005;350:953–963. doi: 10.1016/j.jmb.2005.05.049. [DOI] [PubMed] [Google Scholar]
- 17.Forest KT, Dunham SA, Koomey M, Tainer JA. Crystallographic structure reveals phosphorylated pilin from Neisseria: phosphoserine sites modify type IV pilus surface chemistry and fibre morphology. Mol Microbiol. 1999;31:743–752. doi: 10.1046/j.1365-2958.1999.01184.x. [DOI] [PubMed] [Google Scholar]
- 18.Macek B, et al. The serine/threonine/tyrosine phosphoproteome of the model bacterium Bacillus subtilis. Mol Cell Proteomics. 2007;6:697–707. doi: 10.1074/mcp.M600464-MCP200. [DOI] [PubMed] [Google Scholar]
- 19.Macek B, et al. Phosphoproteome analysis of E. coli reveals evolutionary conservation of bacterial Ser/Thr/Tyr phosphorylation. Mol Cell Proteomics. 2008;7:299–307. doi: 10.1074/mcp.M700311-MCP200. [DOI] [PubMed] [Google Scholar]
- 20.Elsholz AK, et al. Global impact of protein arginine phosphorylation on the physiology of Bacillus subtilis. Proc Natl Acad Sci USA. 2012;109:7451–7456. doi: 10.1073/pnas.1117483109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Schmidt A, et al. Quantitative phosphoproteomics reveals the role of protein arginine phosphorylation in the bacterial stress response. Mol Cell Proteomics. 2014;13:537–550. doi: 10.1074/mcp.M113.032292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Pan Z, et al. dbPSP: a curated database for protein phosphorylation sites in prokaryotes. Database (Oxford) 2015;2015:bav031. doi: 10.1093/database/bav031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Wurgler-Murphy SM, King DM, Kennelly PJ. The Phosphorylation Site Database: A guide to the serine-, threonine-, and/or tyrosine-phosphorylated proteins in prokaryotic organisms. Proteomics. 2004;4:1562–1570. doi: 10.1002/pmic.200300711. [DOI] [PubMed] [Google Scholar]
- 24.UniProt C. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47:D506–D515. doi: 10.1093/nar/gky1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Huang KY, et al. dbPTM in 2019: exploring disease association and cross-talk of post-translational modifications. Nucleic Acids Res. 2019;47:D298–D308. doi: 10.1093/nar/gky1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Li J, et al. SysPTM 2.0: an updated systematic resource for post-translational modification. Database (Oxford) 2014;2014:bau025. doi: 10.1093/database/bau025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Gnad F, Gunawardena J, Mann M. PHOSIDA 2011: the posttranslational modification database. Nucleic Acids Res. 2011;39:D253–260. doi: 10.1093/nar/gkq1159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Standish AJ, et al. Unprecedented Abundance of Protein Tyrosine Phosphorylation Modulates Shigella flexneri Virulence. J Mol Biol. 2016;428:4197–4208. doi: 10.1016/j.jmb.2016.06.016. [DOI] [PubMed] [Google Scholar]
- 29.de Keijzer, J. et al. Mechanisms of Phenotypic Rifampicin Tolerance in Mycobacterium tuberculosis Beijing Genotype Strain B0/W148 Revealed by Proteomics. J Proteome Res15, 1194–1204 (2016). [DOI] [PubMed]
- 30.Nakedi KC, Nel AJ, Garnett S, Blackburn JM, Soares NC. Comparative Ser/Thr/Tyr phosphoproteomics between two mycobacterial species: the fast growing Mycobacterium smegmatis and the slow growing Mycobacterium bovis BCG. Front Microbiol. 2015;6:237. doi: 10.3389/fmicb.2015.00237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Cohen P. The origins of protein phosphorylation. Nat Cell Biol. 2002;4:E127–130. doi: 10.1038/ncb0502-e127. [DOI] [PubMed] [Google Scholar]
- 32.Sharma K, et al. Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. Cell Rep. 2014;8:1583–1594. doi: 10.1016/j.celrep.2014.07.036. [DOI] [PubMed] [Google Scholar]
- 33.Kersey PJ, et al. Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species. Nucleic Acids Res. 2018;46:D802–D808. doi: 10.1093/nar/gkx1011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Madeira F, Madhusoodanan N, Lee J, Tivey ARN, Lopez R. Using EMBL-EBI Services via Web Interface and Programmatically via Web Services. Curr Protoc Bioinformatics. 2019;66:e74. doi: 10.1002/cpbi.74. [DOI] [PubMed] [Google Scholar]
- 35.Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44:D457–462. doi: 10.1093/nar/gkv1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Sayers, E. W. et al. GenBank. Nucleic Acids Res48, D84–D86 (2019). [DOI] [PMC free article] [PubMed]
- 37.The Gene Ontology, C The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 2019;47:D330–D338. doi: 10.1093/nar/gky1055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Gaulton A, et al. The ChEMBL database in 2017. Nucleic Acids Res. 2017;45:D945–D954. doi: 10.1093/nar/gkw1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Reimer LC, et al. BacDive in 2019: bacterial phenotypic data for High-throughput biodiversity analysis. Nucleic Acids Res. 2019;47:D631–D636. doi: 10.1093/nar/gky879. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Burley SK, et al. RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res. 2019;47:D464–D474. doi: 10.1093/nar/gky1004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Meszaros B, Erdos G, Dosztanyi Z. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 2018;46:W329–W337. doi: 10.1093/nar/gky384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Mitchell AL, et al. InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res. 2019;47:D351–D360. doi: 10.1093/nar/gky1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Oughtred R, et al. The BioGRID interaction database: 2019 update. Nucleic Acids Res. 2019;47:D529–D541. doi: 10.1093/nar/gky1079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Huerta-Cepas J, et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 2019;47:D309–D314. doi: 10.1093/nar/gky1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Jupe S, et al. Interleukins and their signaling pathways in the Reactome biological pathway database. J Allergy Clin Immunol. 2018;141:1411–1416. doi: 10.1016/j.jaci.2017.12.992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Humphrey SJ, et al. Dynamic adipocyte phosphoproteome reveals that Akt directly regulates mTORC2. Cell metabolism. 2013;17:1009–1020. doi: 10.1016/j.cmet.2013.04.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Galperin MY, Makarova KS, Wolf YI, Koonin EV. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 2015;43:D261–269. doi: 10.1093/nar/gku1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.O’Shea JP, et al. pLogo: a probabilistic approach to visualizing sequence motifs. Nat Methods. 2013;10:1211–1212. doi: 10.1038/nmeth.2646. [DOI] [PubMed] [Google Scholar]
- 50.Ullah S, et al. dbPAF: an integrative database of protein phosphorylation in animals and fungi. Sci Rep. 2016;6:23534. doi: 10.1038/srep23534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Cheng H, et al. dbPPT: a comprehensive database of protein phosphorylation in plants. Database (Oxford) 2014;2014:bau121. doi: 10.1093/database/bau121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Garcia-Garcia T, et al. Role of Protein Phosphorylation in the Regulation of Cell Cycle and DNA-Related Processes in Bacteria. Front Microbiol. 2016;7:184. doi: 10.3389/fmicb.2016.00184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Venkat S, et al. Genetically Incorporating Two Distinct Post-translational Modifications into One Protein Simultaneously. ACS Synth Biol. 2018;7:689–695. doi: 10.1021/acssynbio.7b00408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Lin MH, et al. A New Tool to Reveal Bacterial Signaling Mechanisms in Antibiotic Treatment and Resistance. Mol Cell Proteomics. 2018;17:2496–2507. doi: 10.1074/mcp.RA118.000880. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Hasan MM, Rashid MM, Khatun MS, Kurata H. Computational identification of microbial phosphorylation sites by the enhanced characteristics of sequence information. Sci Rep. 2019;9:8258. doi: 10.1038/s41598-019-44548-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Bradley D, Beltrao P. Evolution of protein kinase substrate recognition at the active site. PLoS Biol. 2019;17:e3000341. doi: 10.1371/journal.pbio.3000341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Bourret RB, Borkovich KA, Simon MI. Signal transduction pathways involving protein phosphorylation in prokaryotes. Annu Rev Biochem. 1991;60:401–441. doi: 10.1146/annurev.bi.60.070191.002153. [DOI] [PubMed] [Google Scholar]
- 58.Vahidi S, et al. Reversible inhibition of the ClpP protease via an N-terminal conformational switch. Proc Natl Acad Sci USA. 2018;115:E6447–E6456. doi: 10.1073/pnas.1805125115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Bhandari V, et al. The Role of ClpP Protease in Bacterial Pathogenesis and Human Diseases. ACS Chem Biol. 2018;13:1413–1425. doi: 10.1021/acschembio.8b00124. [DOI] [PubMed] [Google Scholar]
- 60.Trentini DB, Fuhrmann J, Mechtler K, Clausen T. Chasing Phosphoarginine Proteins: Development of a Selective Enrichment Method Using a Phosphatase Trap. Mol Cell Proteomics. 2014;13:1953–1964. doi: 10.1074/mcp.O113.035790. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Rego N, Koes D. 3Dmol.js: molecular visualization with WebGL. Bioinformatics. 2015;31:1322–1324. doi: 10.1093/bioinformatics/btu829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Shi Y, 2020. dbPSP 2.0, an updated database of protein phosphorylation sites in prokaryotes. Figshare. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Shi Y, 2020. dbPSP 2.0, an updated database of protein phosphorylation sites in prokaryotes. Figshare. [DOI] [PMC free article] [PubMed]
Supplementary Materials
Data Availability Statement
All the collected phosphoproteins, p-sites and various annotations are freely available at http://dbpsp.biocuckoo.cn/Download.php. For convenience, phosphorylation datasets can be downloaded in three data types, including the total dataset, the phylum-specific datasets, and the residue-specific datasets The datasets of phosphoproteins in prokaryotes have been uploaded to figshare62, 10.6084/m9.figshare.11436879. The annotation datasets were classified by their functional categories, and users can choose the corresponding options based on their own purposes. All data sets in dbPSP are made available under a Creative Commons CC 3.0 BY license (https://creativecommons.org/licenses/by/3.0/cn/).
The source code of dbPSP 2.0 database has been uploaded to GitHub: https://github.com/BioCUCKOO/dbPSP2.0.