Skip to main content
Molecular & Cellular Proteomics : MCP logoLink to Molecular & Cellular Proteomics : MCP
. 2009 Apr 14;8(8):1839–1849. doi: 10.1074/mcp.M900030-MCP200

SysPTM: A Systematic Resource for Proteomic Research on Post-translational Modifications*

Hong Li ‡,§,¶,, Xiaobin Xing ‡,§,¶,, Guohui Ding , Qingrun Li ‡,, Chuan Wang , Lu Xie §,**, Rong Zeng ‡,**, Yixue Li ‡,§,**
PMCID: PMC2722767  PMID: 19366988

Abstract

With the rapid expansion of protein post-translational modification (PTM) research based on large-scale proteomic work, there is an increasing demand for a suitable repository to analyze PTM data. Here we present a curated, web-accessible PTM data base, SysPTM. SysPTM provides a systematic and sophisticated platform for proteomic PTM research equipped not only with a knowledge base of manually curated multi-type modification data but also with four fully developed, in-depth data mining tools. Currently, SysPTM contains data detailing 117,349 experimentally determined PTM sites on 33,421 proteins involving nearly 50 PTM types, curated from public resources including five data bases and four web servers and more than one hundred peer-reviewed mass spectrometry papers. Protein annotations including Pfam domains, KEGG pathways, GO functional classification, and ortholog groups are integrated into the data base. Four online tools have been developed and incorporated, including PTMBlast, to compare a user's PTM dataset with PTM data in SysPTM; PTMPathway, to map PTM proteins to KEGG pathways; PTMPhylog, to discover potentially conserved PTM sites; and PTMCluster, to find clusters of multi-site modifications. The workflow of SysPTM was demonstrated by analyzing an in-house phosphorylation dataset identified by MS/MS. It is shown that in SysPTM, the role of single-type and multi-type modifications can be systematically investigated in a full biological context. SysPTM could be an important contribution to modificomics research. SysPTM is freely available online at www.sysbio.ac.cn/SysPTM.


Post-translational modifications (PTMs)1 are various processing events that change the maturity, activity, and/or turnover of proteins. More than 200 different types of PTMs have been found, with new ones still being reported (1). PTMs not only change the physicochemical properties of proteins (2) but also dynamically regulate various biological events such as protein degradation, subcellular localization, conformational change, protein-protein interaction, and signal transduction (35). Previous studies have revealed the central roles of PTMs in human health and disease. For example, phosphorylation of pRB1 has been associated with tumorigenesis through controlling cell division (6); S-nitrosylation of parkin regulates its E3 ligase activity, resulting in protein accumulation in sporadic Parkinson disease (7); and defects in protein glycosylation have been related to several forms of congenital muscular dystrophy (8). Given this important role in health and disease, PTMs have been regarded as potential disease biomarkers or therapeutic targets. For example, Erlotinib (Tarceva), an inhibitor of epidermal growth factor receptor tyrosine kinase, has been approved by the Food and Drug Administration to treat non-small cell lung cancer (9); and histone deacetylase inhibitors have been demonstrated to have a potential therapeutic role in Huntington disease (10). The broad range of important roles played by PTMs in physiological and pathological processes has made PTM research an active field in recent years. Yet we remain limited in our knowledge of the full scope of PTM distribution on proteins and the precise location of PTM sites.

There are two major kinds of experimental methods to identify PTMs: 1) traditional biological experiments such as radiolabeling PTM proteins (11), Western analysis with antibodies against specific modifications (12), and site-directed mutagenesis of potential modification sites (13); and 2) large-scale proteomic experiments, especially multiple-dimensional liquid chromatography tandem mass spectrometry. Traditional experiments are laborious and time-consuming, resulting in slow data accumulation. By contrast, more recent MS/MS experiments have led to the discovery of thousands of new phosphorylation (14), glycosylation (15), acetylation (16), sumoylation (17), S-nitrosylation (18), and other modification sites. For example, based on MS/MS data, more than 6,000 phosphopeptides have been reported in HeLa cells (14), and 159 candidate sumoylated proteins have been found in yeast (17). Although advanced technologies have allowed PTM data to accumulate rapidly, it is impossible to identify all PTM sites for a set of proteins in one experiment, due to biased modification enrichment related to experimental protocol, limited sensitivity of mass spectrometer instrumentation, and failures in spectrum matching. Data bases are needed to amass PTM data from various experiments for comprehensive understanding of PTMs.

Most data bases for storing PTM information have fallen into two general classes. One class focuses on a single modification type, such as Phospho.ELM (19) for phosphorylation or O-GLYCBASE (20) for glycosylation. Although these data bases have been widely used, they are limited in utility due to recording only a single modification type. The other class of PTM data base is the primary protein data base; these data bases collect PTM information with multiple modification types but are more broadly focused on providing diverse information about proteins, rather than PTM information specifically. Swiss-Prot (21) and HPRD (22) are examples of such data bases. As compared with either of the above two types of data base, integrated PTM data bases are more desirable. One example is dbPTM (23), which integrated experimentally determined PTM information from four external data bases. PhosphoSite started the harvesting of phosphorylation sites from published literature with a focus on in vivo mammalian phosphorylation data (24), but recently it has expanded to integrate nine other modification types. Even integrated data bases, however, have not taken into full consideration the aforementioned quickly accumulating PTM data from MS/MS experiments. These data, many of which are reported in the published literature but not collected in any data base, continue to increase rapidly due to new experiments. Such a wealth of information should be incorporated more comprehensively into the current PTM knowledge domain.

At the same time, the high-throughput nature and complexity of MS/MS data pose computational challenges for proteome-scale PTM analyses in a biological context. A pure data repository is insufficient for such tasks. Powerful computational tools must accompany data repositories to allow knowledge extraction.

To address these needs, we developed a systematic resource for PTM research, SysPTM, consisting of a PTM data base and four analysis tools. The SysPTM data base incorporates the existing features of numerous previous data bases, with an emphasis on collecting modification datasets from MS/MS experiments reported in the literature. The current release of SysPTM (v1.1) contains data detailing 117,349 PTM sites on 33,421 proteins involving nearly 50 modification types. The four analysis tools are PTMBlast, PTMPathway, PTMPhylog, and PTMCluster, which, respectively, can compare user PTM datasets with PTM data stored in SysPTM, map PTM proteins to KEGG pathways, discover potentially conserved PTM sites, and find significant clusters of multi-site modifications.

In this work, an in-house MS/MS phosphorylation dataset from mouse embryonic stem cells was analyzed to demonstrate the SysPTM workflow. SysPTM can be accessed online.

EXPERIMENTAL PROCEDURES

System Configuration

SysPTM consists of a relational data base and a dynamic web interface. A simplified entity-relationship diagram of the SysPTM data base is shown in supplemental Fig. S1. The SysPTM data base is implemented using Mysql Server Edition 5.0 and is configured on a running RedHat Linux Server. The SysPTM website is publicly available. The web interface is implemented with JavaServer Pages technology using the Apache Tomcat 5.5 Server. All functions are programmed in Java and Perl languages.

Data Collection

PTM Data Collection

For comprehensive PTM data coverage and timely updates, semi-automatic methods were used to collect PTM sites from public data resources and peer reviewed MS/MS literature (see Supplemental Methods). In the current version of SysPTM, modification information was automatically retrieved from five data bases (Swiss-Prot version 56.2 (21), Phospho.ELM version 8.0 (19), HPRD release 7 (22), O-GLYCBASE version 6.0 (20), and Ubiprot version 1.0 (25)) and four web servers (SUMOsp version 1.0 (26), Memo version 2.0 (27), NetAcet version 1.0 (28), and LysAcet version 1.1 (55)). These data were integrated and stored as SysPTM-A (see Supplemental Methods). SysPTM-A will be updated every time a new major data base version is released. In addition, numerous modification sites scattered in the MS/MS literature and rarely collected by existing data bases were integrated into our data base as SysPTM-B. A Perl program was used to search PubMed with the following limits: MS-related keywords (mass spectrometry, proteomics), seven modification types (phosphorylation, acetylation, methylation, sumoylation, ubiquitination, glycosylation, S-nitrosylation), and a time duration of January 2005 to October 2008. This search retrieved 1118 MS/MS papers. Data quality was controlled by manually checking whether the original datasets were validated through manual check, score cutoff, false discovery rate, or other methods (see Supplemental Methods), which resulted in stringent literature filtering. Of the 1118 papers retrieved, 104 were selected for further data collection. (Data quality control for these 104 papers is shown in supplemental Table S1.) Data on unambiguous PTM sites and peptides were manually extracted from the Supplemental Material referenced in these candidate papers. PTM sites on peptides were automatically mapped to protein sequences. Information on modification type, site location, peptide sequence, and peptide score were then stored in SysPTM-B (supplemental Fig. S1). SysPTM-B will be updated every six months.

PTM Data Integration

As mentioned above, SysPTM data were collected from diverse resources and literature, resulting in various protein identifiers from different data bases. To integrate heterogeneous data and avoid redundancy, a two-step integrating process was performed (see Supplemental Methods): 1) Protein identifiers in the same data base were mapped to identifiers in the newest data base version, and corresponding protein sequences were retrieved; and 2) Proteins with different identifiers but with the same sequence (100% identity) and in the same species were regarded as the same protein.

PTM Protein Annotation

In addition to PTM-related information, detailed functional annotations from external data bases were integrated into SysPTM to help users access other protein-related knowledge, including domains identified by HMMER (29) based on the models in the Pfam data base (30), pathways from KEGG (31), gene ontology from the GO database (32), relationships between genes and disease from Online Mendelian Inheritance in Man (OMIM) (33), and ortholog groups from HomoloGene (34).

Analysis Tools

1) PTMBlast is a PTM site similarity search program designed to explore similar PTM sites between a user dataset (query) and SysPTM data (subject). The program provides three sequence alignment methods with different sensitivity and specificity: in Method 1, protein sequences between query and subject must be identical; in Method 2, protein sequences between query and subject are aligned by the protein-protein BLAST (BLASTP) program (35); and in Method 3, modified peptides in the query are aligned with subject protein sequences by the Smith-Waterman algorithm, which is more sensitive than BLAST for short peptides 36). After sequence alignment, residues modified at the same position, in both query and subject, are identified as overlapped PTM sites. Otherwise, modified residues are considered unique to the query or subject. Overlapped and unique PTM sites are highlighted in vivid color on the aligned sequences, which can be downloaded.

2) PTMPathway is a tool to associate PTM proteins with KEGG pathways. The Perl SOAP::Lite package and KEGG Application Programming Interface (API) are used to access the KEGG pathway database (31).

3) PTMPhylog maps and colors PTM sites on multiple sequence alignments of orthologous proteins. Ortholog groups are retrieved from the NCBI HomoloGene data base (34), and ClustalW (37) is used for multiple sequence alignment.

4) PTMCluster is used to find statistically significant PTM site clusters on the same protein. The PTMCluster algorithm is an improvement on a neighborhood model proposed by Li et al. (38). PTM sites are searched for candidate PTM clusters along a positional distance tree, and a p value P is calculated for each cluster to qualify whether PTM sites in the cluster are close enough in space than randomly distributed sites (see supplemental Fig. S2 for a detailed illustration of this method). Significant PTM clusters are selected if they satisfy p value and site number cut-offs (P ≤ 0.01; N ≥ 3).

Case Study

A in-house phosphorylation dataset generated by an MS/MS experiment on mouse embryonic stem (mES) cell proteins was employed as a case to demonstrate the use of SysPTM (see Supplemental Methods). mES cells were lysed in lysis buffer mixed with phosphatase inhibitors. The peptide mixtures were separated by strong anion exchange reversed-phase high performance liquid chromatography. LC-MS/MS was performed using an LTQ-Orbitrap mass spectrometer. SEQUEST was used to match MS/MS spectra against the International Protein Index (IPI) mouse data base (version 3.22). All output results were merged by in-house software (BuildSummary), and the false discovery rate was controlled to below 1%. Finally 1152 phosphosites on 526 distinct phosphoproteins were determined (supplemental Table S5). The spectra of phosphopeptides used for the case study were manually checked (Supplemental Methods).

RESULTS

Contents and Statistics

SysPTM is a systematic resource integrating a PTM data base and four analysis tools. The structure of SysPTM is depicted in Fig. 1A. The SysPTM data base currently houses information relating to 117,349 experimentally determined PTM sites, with nearly 50 modification types, on 33,421 proteins. Most modification types are amino acid-specific; different modification types may target the same amino acid (supplemental Fig. S3). The number of PTM sites collected in SysPTM for several common modification types, along with the number of correlated proteins, is shown in Fig. 1B (supplemental Table S4). Phosphorylation is the most frequent PTM, with 87,068 sites on 24,705 proteins; other abundant modifications include glycosylation (7564 sites) and acetylation (3001 sites).

Fig. 1.

Fig. 1.

SysPTM content and statistics. A, SysPTM road map, with overview of database construction and online tools. B, number of unique proteins and unique PTM sites stored in SysPTM, in total and for seven common modification types. C, distribution of PTM site number: 59% of proteins have more than one modification site. D, distribution of PTM type number: 13% of proteins have more than one modification type.

Modification sites in SysPTM are categorized into two groups based on data source: SysPTM-A, with information culled from public resources, contains data on 75,047 unique PTM sites on 21,971 proteins (supplemental Table S2); and SysPTM-B, with information culled from 104 peer-reviewed MS/MS papers, contains information on 67,596 unique PTM sites on 20,675 proteins (supplemental Table S3). SysPTM-A makes up the major portion of the SysPTM data base and covers nearly 50 modification types. These data include “golden standard” sites culled from public data bases (Swiss-Prot, Phospho.ELM, HPRD, O-GLYCBASE, Ubiprot), as well as sites modeled as positive training sets for prediction web servers (SUMOsp, Memo, NetAcet, and LysAcet). Thus, the data were fully integrated into SysPTM-A after removing ambiguous items (e.g. sites listed as “potential”, “probable”, or “by similarity” in Swiss-Prot) or citing “HTP” for curated data from high-throughput technologies (e.g.“HTP” data from Phospho.ELM). SysPTM-B makes up a significant portion of SysPTM and covers seven modification types: phosphorylation, acetylation, methylation, sumoylation, ubiquitination, glycosylation, and S-nitrosylation, with phosphorylation (64,025 sites) and glycosylation (1970 sites), the most abundant modification types. SysPTM-B was collected from peer-reviewed mass spectrometry literature providing intact PTM datasets and detailed PTM identification procedures. Diverse data quality control for modified peptides were carried out in each paper, such as spectra manually checked, filtered by software score threshold or false discovery rate, validated by an algorithm (e.g. Ascore), single protein extensively identified with modifications, etc. For inclusion in SysPTM-B, we collected only PTM sites with at least one data quality check measure. Ambiguous PTM sites that could not be assigned to a specific amino acid were not added to SysPTM-B. Of the PTM sites collected in SysPTM-B, 62.6% are not covered by SysPTM-A. For example, SysPTM-B contains 115 S-nitrosylation sites, whereas only 48 S-nitrosylation sites are collected in SysPTM-A.

In addition to functioning alone, modifications can “crosstalk,” working together to regulate biological functions (39). To investigate this phenomenon, the number of modification sites and types on all proteins was analyzed, with results shown in Fig. 1, C and D. Fifty-nine percent of proteins collected in SysPTM have more than one modification sites (Fig. 1C), and 13% of proteins have more than one modification types (D).

Moreover, abundant modification data and diverse functional annotations in SysPTM provide multifaceted views of PTM-associated features. Supplemental Table S6 lists functions associated with six frequent PTMs, shown to be significantly enriched (p < = 0.01) under the chi-square test. Interestingly, proteins modified by phosphorylation and sumoylation are enriched in similar GO-defined functions, such as nucleus (GO category: cellular component), transcription (GO category: biological process), and protein binding (GO category: molecular function), which might support the hypothesis of cross-talk between these two modification types (40).

Data Accessibility

An online version of SysPTM is available, where users can query protein and PTM information through search and browse pages, retrieve batch files through a statistics page, upload their own modification datasets through a submit page, and acquire help through a “FAQ” page. Supplemental Fig. S4A shows the search page interface. Users may query the database by gene name, protein description, Swiss-Prot ID/AC, IPI ID, or NCBI protein GI. The BLAST search engine also is included to allow users to perform sequence similarity searches. supplemental Fig. S4B illustrates the browse page. Through this page, users can browse PTM proteins by PTM type, data source, or KEGG pathway. Search and browse results are returned as SysPTM entries; each entry contains eight sections (supplemental Fig. S4C), each of which can be expanded by clicking to show details (supplemental Fig. S5). The sections provide the following information: “Summary” gives basic protein information, such as species, protein description, gene name, protein identifier, and three-dimensional structure; “PTMsite-Statistics” lists the number of PTM sites of each modification type, along with the SysPTM data source (SysPTM-A and/or B) for these sites; “PTMsite-Map” is an interactive interface to view PTM sites, Pfam domains, and PTM position clusters along the protein sequence; “PTMsite-Table” lists each PTM residue, position, type, and data source (SysPTM-A and/or B); “PTMsite-Source” provides more detailed information on data source (protein data base or literature reference) and original experimental evidence; “PTMsite-Cluster” gives predicted PTM position clusters with their statistical p value; “PTMprotein-Sequence” highlights PTM sites on the protein sequence; and “PTMprotein-Annotation” shows protein annotation information and links to public data bases.

Accessibility and Application of Four Modification Data Mining Tools

PTMBlast

PTMBlast accepts PTM target sites/peptides and protein sequences as input and provides three methods for comparing these sequences with different target datasets in SysPTM. Flexible parameters such as comparison method, similarity cutoff, species, modification type, and data source can be selected by the user, and the results of PTMBlast with aligned modification sites and statistics tables can be directly viewed on a webpage or returned by e-mail (Fig. 2A). The three PTMBlast comparison methods vary in sensitivity and specificity. Method 1 has greatest stringency and lowest sensitivity, requiring the compared protein sequences to be exactly identical. Method 2 has medium stringency and sensitivity and is suitable for general requirements, whereas Method 3 has least stringency and greatest sensitivity because it considers only the similarity of modified peptides without regard to protein sequence. It is recommended for users to retrieve results by e-mail because Methods 2 and 3 may take long computational time. Species is another important parameter of PTMBlast. When data are compared within the same species, PTMBlast returns modification sites that overlap with previously discovered sites as well as potentially novel modification sites. When data are compared across species, the results of PTMBlast indicate conserved modification sites among species.

Fig. 2.

Fig. 2.

Sample pages showing the interface for SysPTM tools. A, PTMBlast results. Unique (aqua for SysPTM data and orange for query data) and overlapped PTM sites (magenta) are highlighted in color on the protein sequence for display and download. B, PTMPhylog results. PTM sites are mapped to aligned sequences of ortholog proteins in different species. Red residues are from SysPTM-A; blue residues are uniquely from SysPTM-B. C, PTMPathway results. Rectangles with yellow background area denote PTM proteins found in SysPTM; pink texts in rectangles denote user-defined proteins. D, PTMCluster results. The protein is plotted as a horizontal box. In this box, predicted PTM clusters are represented as yellow bars, and Pfam domains are shown as gray bars. PTM sites from SysPTM-A and SysPTM-B, respectively, are labeled above and below the protein box.

PTMPathway

PTMPathway implements two main functions: 1) to browse PTM proteins based on a given KEGG pathway name and PTM type; and 2) to map user-defined proteins to KEGG pathways and highlight all PTM proteins. Results are returned in the form of a pathway graph: rectangles with yellow background area denote PTM proteins found in SysPTM; pink text in rectangles denotes user-defined proteins (Fig. 2C). Clicking a protein within a yellow rectangle returns the protein's SysPTM entry page, with a detailed view of all information. Human Wnt signaling pathway was queried as an example, and 406 modification sites on 64 proteins involving about 10 modification types were found by PTMPathway, out of which five frequent modification types were manually labeled with different shapes and are shown in Fig. 3A.

Fig. 3.

Fig. 3.

Application of SysPTM tools. A, PTM pathway diagram showing five types of frequent modifications on proteins in the human Wnt signaling pathway. B, evolutionary conservation of human phosphoproteins and non-phosphoproteins, based on percent identities with orthologs from 19 other species. C, PTM position cluster on human RPB1 predicted by PTMCluster. This cluster ranges from amino acid 1836 to 1924, with red characters representing phosphosites recorded in SysPTM-A and blue characters representing phosphosites uniquely recorded in SysPTM-B. Consensus heptapeptide YSPTSP repeats are underlined. Note that many phosphosites occur at Ser-2 or Ser-5 of these repeats.

PTMPhylog

PTMPhylog was developed to study the conservation of PTM sites and associated proteins. It searches by protein description, identifier, or sequence and returns ortholog protein sequences aligned across multiple species, with PTM residues highlighted in color (Fig. 2B). The evolutionary conservation of human phosphoproteins was estimated based on the percent identities with ortholog proteins from 19 species in the HomoloGene data base (41), and the results are shown in Fig. 3B. Generally phosphoproteins are significantly more conserved than their non-phosphorylated counterparts, suggesting functional importance of phosphoproteins. Given that this tendency may be true for other modifications as well, PTMPhylog provides an important tool for identifying conserved modification sites among different species.

PTMCluster

Genes with positional proximity along the genome have been reported to co-express (42) or exhibit tissue-specific features (43); similar to this, multiple modification sites on a protein may cluster within a small region, with physical proximity mediating protein biological activity (44). PTMCluster was developed to mine proteome-scale PTM position clusters. Statistical significance of clustering predictions is given by p value, and regions of predicted clusters are plotted as yellow rectangles along the protein sequence (Fig. 2D). Almost 10,000 clusters were found in protein sequences stored in SysPTM; these clusters covered most PTM types and appeared in many proteins. Fig. 3C shows an example of a PTM cluster in the C-terminal domain of human RNA polymerase II largest subunit (RPB1). This cluster contains 18 phosphosites, most located on the Ser-2 or Ser-5 residue of the repeated consensus heptapeptide YSPTSP. A previous study reported that both Ser-2 and Ser-5 in the YSPTSP consensus sequence were required to be simultaneously phosphorylated during M phase of the cell cycle (45). Therefore phosphosites in this PTM position cluster may work together as a functional region.

Workflow and Case Study

Here we propose a workflow for analyzing an experimental PTM dataset with SysPTM, using the example of an MS-identified phosphorylation dataset from mES cells. The experimental aim was to identify phosphoproteins in stem cells and explore their functions. At the data overview level of the workflow (see Fig. 4A), PTMBlast was performed to compare phosphosites in mES cells with phosphosites recorded in SysPTM. The number of unique phosphosites identified in our experiment was quite large (59.2% using Method 1, 48.9% using Method 2, and 47.6% using Method 3). These unique phosphosites might reveal potential stem cell-specific characteristics of mES cells. GO functional analysis of newly identified and overlapped phosphoproteins based on PTMBlast Method 2 was performed, and the results are shown in Fig. 4B. A significant number of the unique phosphoproteins are involved in DNA binding, transcription factor activity, regulation of transcription, and multi-cellular organism development, supporting the idea that phosphorylation is an important regulatory process for embryonic stem cells (46).

Fig. 4.

Fig. 4.

Example workflow outlining analysis of an mES cell protein phosphorylation dataset. Identified phosphopeptides and phosphoproteins were taken as input for multi-level analysis in SysPTM. A, PTMBlast results comparing phosphosites in mES cell proteins and phosphosites in SysPTM by three methods. B, GO functional analysis of newly identified and overlapped phosphoproteins based on PTMBlast Method 2. C, F, and P are abbreviations for GO categories: C, cell component; F, molecular function; p, biological process. C, phosphoproteins in an mES-related protein interaction network. D, PTMPathway results showing all phosphoproteins in mouse focal adhesion pathway. E, PTMCluster results for the protein CTNA1_MOUSE. The yellow bar is a PTM position cluster predicted by PTMCluster. F, PTMPhylog results for the protein CTNA1_MOUSE. Residues shown in red are recorded in SysPTM-A; residues in blue are uniquely from SysPTM-B. Ser-641 and Ser-652 are identified phosphosites in the mES case.

We next generated an overview of phosphorylation at the network level (see Fig. 4C) by mapping phosphoproteins recorded in SysPTM and/or identified in our experiment to a published mES-related protein interaction network (47). In total, 16 (47.1%) of the proteins in the network have experimentally identified phosphosites; 10 of these 16 proteins were identified in our experiment. As seen in Fig. 4C, these phosphoproteins interact with mES cell markers (Nanog, Oct4, and Rex1), indicating potential important role of phosphorylation in the stem-cell-specific characteristics of mES cells.

Next, the association between biological pathways and phosphorylation was explored using the PTMPathway tool. Focal adhesion kinase (FAK) signaling has been reported as an important pathway in mES cells, regulating cardiogenesis (48). Using PTMPathway, we developed a new picture of the FAK pathway, highlighting 25 potential phosphoproteins (Fig. 4D). These phosphorylated components may help researchers explore further the association between the FAK pathway and stem cells.

In addition to global analysis, SysPTM also is excellent for the analysis of an individual modified proteins. PTMCluster and PTMPhylog were used, respectively, to analyze PTM site clusters and evolutionary conservation of phosphosites on the potential stem cell marker α-catenin (CTNA1_MOUSE). PTMCluster found one PTM position cluster in α-catenin, shown in Fig. 4E. For phosphosite Ser-641, PTMPhylog found the aligned sites were conserved across multiple orthologs and it could be phosphorylated in human (Fig. 4F).

In short, the multi-level analysis run by SysPTM on an in-house proteomic phosphorylation dataset from mouse embryonic stem cells demonstrates that the roles of post-translational modification in a biological organism can be analyzed systematically by combining high-throughput experiments and powerful computational analysis.

DISCUSSION AND CONCLUSION

The progress of post-translational modification research has accelerated since the introduction of mass spectrometry as a primary technology in the field of proteomics. In 2007, modificomics was advocated as a prominent and independent extension of proteomics (49), with the aim of exploring the language of post-translational modifications at the “omics” level and interpreting phenotype in the context of PTMs (39). To aid modificomics research, we have developed SysPTM, which integrates a data repository with powerful information extraction tools.

The SysPTM database (v1.1) is made up of SysPTM-A and SysPTM-B, together containing data on 117,349 experimentally determined PTM sites on 33,421 proteins covering nearly 50 PTM types. SysPTM-A contains 75,047 unique PTM sites on 21,971 proteins integrated from five databases (Swiss-Prot, Phospho.ELM, HPRD, O-GLYCBASE, and Ubiprot) and four web servers (SUMOsp, Memo, NetAcet, and LysAcet); this portion of the data base will be updated every time a new major data base version is released. SysPTM-B contains 67,596 unique PTM sites on 20675 proteins manually collected from 104 papers selected from 1118 papers published from January 2005 to October 2008; this portion of the data base will be updated every six months with data from new publications. SysPTM-B is unique in its emphasis on collecting modification datasets from MS/MS experiments, an emphasis lacking in previous PTM data bases (1922, 25). SysPTM is the most comprehensive PTM data repository available today, with the capacity to store and analyze PTM site/protein data for multiple modification types, from multiple data sources and multiple organisms.

Data quality is critical for reliable utility of SysPTM. Currently we rely on a data quality check performed primarily on original data sources, as described in “Experimental Procedures”, “Results”, and Supplemental Methods. In addition, we collect as much original identification information for each dataset as possible to record in SysPTM-B, including modification sites, peptide sequences, peptide scores, search engines, and mass spectrometry equipments. Experienced users can perform specific quality filtering by setting their own score thresholds. For each PTM site in SysPTM-B, the confidence also can be checked to some degree based on the number of papers listed that have identified this site. In the future, we plan to develop a statistical score function to integrate these naive measures (e.g. modified peptide score, false discovery rate cutoff, and count of publications) and assign a quality score for each PTM site to allow users a more ready quality check. On the other hand, when more and more proteomics datasets comprising raw mass spectra are stored in public data bases like PRIDE (50), analogous to the storage of microarray data in GEO (51), a uniform algorithm for peptide identification (e.g. PeptideProphet) and PTM site localization (e.g. Ascore) may be applied or developed; a consistent quality check system for all PTM sites will then be available.

Rich data in SysPTM make it a good resource for examining general PTM features. Abundant PTM sites provide valuable information for detecting PTM sequence properties such as flanking sequence motifs and developing PTM prediction algorithms. Schwartz et al. (52) have proposed a new general strategy to predict organism-specific PTMs; however, due to limited data, their strategy has been applied only to phosphorylation in yeast, fly, mouse, and human and acetylation in human. Using the abundant data in SysPTM, their method could now be generalized to other modification types or species.

The multi-organism feature of SysPTM makes it extremely useful for comparative modificomics. Boekhorst et al. (53) compared phosphoproteomics datasets of six eukaryotes and revealed evolutionary conservation of phosphorylation. In SysPTM, comparison across multiple species also showed that phosphoproteins are significantly more conserved than their non-phosphorylated counterparts, suggesting functional importance of phosphorylation. Conservation of other modifications also can be analyzed using SysPTM.

Additionally, with SysPTM, multi-site and multi-type modification can be investigated. Our results show that 59% of proteins in SysPTM have multiple modification sites, and 13% of proteins have multiple modification types. For instance, human epidermal growth factor receptor has 83 modification sites and 4 modification types: phosphorylation, ubiquitination, glycosylation, and disulfide bridges. Such multi-site and multi-type modification may indicate complicated regulations that enable epidermal growth factor receptor to exert its broad range of functions in biological processes (54). In this regard, SysPTM is also a valuable resource for studying the potential co-regulation of multiple modification sites and cross-talk between different PTM types.

A vast data source without proper data mining tools may be data rich but information poor. Experimental PTM researchers need tools to access background knowledge, compare their data with existing discoveries, and anticipate future research directions. Taking into consideration of these needs, four in-depth analysis tools were developed in affiliation with the SysPTM data base. PTMBlast is a similarity comparison tool supporting multi-type and multi-organism modification datasets. PTMBlast takes advantage of rich modification information in SysPTM and allows a researcher to conveniently compare his/her own dataset, to distinguish between confirmations of previously discovered PTM sites and novel site identifications. The second tool, PTMPathway, takes the data analysis one step further. Many proteins in biological pathways may be modified; modifications may trigger, mediate, or terminate pathway activity. To elucidate the complexity and diversity of PTMs in the context of pathways, however, is a formidable task. PTMPathway provides a quick view of all modified proteins in a certain pathway; in combination with PTMBlast, PTMPathway further allows for visualization of newly identified modification proteins on pathway. Therefore, PTMPathway provides a very useful tool to extend our understanding of modifications to biological pathways at a systematic level.

As stated earlier, highly conserved protein residues tend to correlate with structural or functional importance, and multi-site modification has been reported to be a common regulation mechanism. To enable analysis of residue conservation tendency for all PTM types, the PTMPhylog tool was developed. PTMCluster is the first algorithm to search proteins for multi-site PTM position clusters. Almost 10,000 clusters were found in protein sequences stored in SysPTM, suggesting a new method for studying co-regulation of multi-site modifications.

The overall process of using SysPTM was demonstrated through a case study. By analyzing an in-house phosphorylation dataset identified by MS/MS, we showed that in SysPTM, a modificomics dataset can be mapped to other sequences, to biological pathways, and to existing networks for data overview; in addition, the conservation status and clustering status of individual residues of important modified proteins can be studied. In general, the roles of single-type and multi-type modifications can be investigated in a full biological context. This is the contribution of SysPTM to modificomics research. Thus, SysPTM may become an important aid to both experimental and computational post-translational modification researchers to enhance proteomics progress in this important and challenging field.

Future Development

Plans are underway to: 1) develop a statistical tool for scoring the confidence of MS/MS-identified modifications; 2) allow users to submit datasets via a temporary data base and store them in SysPTM after manual check; and 3) design an algorithm to predict modification sites.

Acknowledgments

We acknowledge Lynne Berry from the Vanderbilt Cancer Biostatistics Center for her great editing work.

Footnotes

* This work was supported by funding provided by National High-Tech R&D Program (863) 2006AA02Z334; National Basic Research Program of China (2006CB910700, 2009CB918404) (to Y. L.); National Natural Science Foundation of China (30621091); Ministry of Science and Technology (2006CB943900, 2007CB947904, 2007CB947100, 2007CB948000), Key Research Program (CAS) KSCX2-YW-R-112 (to R. Z.); key project for prevention and treatment of major infectious diseases (2008ZX10002-021); and Shanghai Natural Science Fund 08ZR1415800 (to L. X.).

Inline graphic The on-line version of this article (available at http://www.mcp.org) contains supplemental material.

1 The abbreviations used are:

PTM
post-translational modification
PTM protein
protein experimentally identified as having one or more post-translational modifications
MS/MS
tandem mass spectrometry
GO
Gene Ontology
SysPTM-A
PTM sites culled from public resources
SysPTM-B
PTM sites culled from peer-reviewed tandem mass spectrometry literature
PTMBlast
a tool for comparing user PTM datasets with data in SysPTM
PTMPathway
a tool for mapping PTM proteins to KEGG pathways
PTMPhylog
a tool for analyzing potentially conserved PTM sites
PTMCluster
a tool for finding clusters of multi-site modifications
mES
mouse embryonic stem
E3
ubiquitin-protein isopeptide ligase
FAK
focal adhesion kinase
KEGG
Kyoto encyclopedia of genes and genomes
HPRD
human protein reference database.

REFERENCES

  • 1.de Hoog C. L., Mann M. ( 2004) Proteomics. Annu. Rev. Genomics Hum. Genet. 5, 267– 293 [DOI] [PubMed] [Google Scholar]
  • 2.Mann M., Jensen O. N. ( 2003) Proteomic analysis of post-translational modifications. Nat. Biotechnol. 21, 255– 261 [DOI] [PubMed] [Google Scholar]
  • 3.Virshup D. M., Eide E. J., Forger D. B., Gallego M., Harnish E. V. ( 2007) Reversible protein phosphorylation regulates circadian rhythms. Cold Spring Harb. Symp. Quant. Biol. 72, 413– 420 [DOI] [PubMed] [Google Scholar]
  • 4.Yang X. J. ( 2004) Lysine acetylation and the bromodomain: a new partnership for signaling. Bioessays 26, 1076– 1087 [DOI] [PubMed] [Google Scholar]
  • 5.Gill G. ( 2004) SUMO and ubiquitin in the nucleus: different functions, similar mechanisms? Genes Dev. 18, 2046– 2059 [DOI] [PubMed] [Google Scholar]
  • 6.Krueger K. E., Srivastava S. ( 2006) Posttranslational protein modifications: current implications for cancer detection, prevention, and therapeutics. Mol. Cell. Proteomics 5, 1799– 1810 [DOI] [PubMed] [Google Scholar]
  • 7.Yao D., Gu Z., Nakamura T., Shi Z. Q., Ma Y., Gaston B., Palmer L. A., Rockenstein E. M., Zhang Z., Masliah E., Uehara T., Lipton S. A. ( 2004) Nitrosative stress linked to sporadic Parkinson's disease: S-nitrosylation of parkin regulates its E3 ubiquitin ligase activity. Proc. Natl. Acad. Sci. U.S.A. 101, 10810– 10814 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Martin-Rendon E., Blake D. J. ( 2003) Protein glycosylation in disease: new insights into the congenital muscular dystrophies. Trends Pharmacol. Sci. 24, 178– 183 [DOI] [PubMed] [Google Scholar]
  • 9.Johnson B. E., Jänne P. A. ( 2005) Epidermal growth factor receptor mutations in patients with non-small cell lung cancer. Cancer Res. 65, 7525– 7529 [DOI] [PubMed] [Google Scholar]
  • 10.Sadri-Vakili G., Cha J. H. ( 2006) Mechanisms of disease: Histone modifications in Huntington's disease. Nat. Clin. Pract. Neurol. 2, 330– 338 [DOI] [PubMed] [Google Scholar]
  • 11.Zhi Y., Sandri-Goldin R. M. ( 1999) Analysis of the phosphorylation sites of herpes simplex virus type 1 regulatory protein ICP27. J. Virol. 73, 3246– 3257 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Brunet A., Sweeney L. B., Sturgill J. F., Chua K. F., Greer P. L., Lin Y., Tran H., Ross S. E., Mostoslavsky R., Cohen H. Y., Hu L. S., Cheng H. L., Jedrychowski M. P., Gygi S. P., Sinclair D. A., Alt F. W., Greenberg M. E. ( 2004) Stress-dependent regulation of FOXO transcription factors by the SIRT1 deacetylase. Science 303, 2011– 2015 [DOI] [PubMed] [Google Scholar]
  • 13.Yuan Z. L., Guan Y. J., Chatterjee D., Chin Y. E. ( 2005) Stat3 dimerization regulated by reversible acetylation of a single lysine residue. Science 307, 269– 273 [DOI] [PubMed] [Google Scholar]
  • 14.Olsen J. V., Blagoev B., Gnad F., Macek B., Kumar C., Mortensen P., Mann M. ( 2006) Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 127, 635– 648 [DOI] [PubMed] [Google Scholar]
  • 15.Elortza F., Nühse T. S., Foster L. J., Stensballe A., Peck S. C., Jensen O. N. ( 2003) Proteomic analysis of glycosylphosphatidylinositol-anchored membrane proteins. Mol. Cell. Proteomics 2, 1261– 1270 [DOI] [PubMed] [Google Scholar]
  • 16.Kim S. C., Sprung R., Chen Y., Xu Y., Ball H., Pei J., Cheng T., Kho Y., Xiao H., Xiao L., Grishin N. V., White M., Yang X. J., Zhao Y. ( 2006) Substrate and functional diversity of lysine acetylation revealed by a proteomics survey. Mol. Cell 23, 607– 618 [DOI] [PubMed] [Google Scholar]
  • 17.Denison C., Rudner A. D., Gerber S. A., Bakalarski C. E., Moazed D., Gygi S. P. ( 2005) A proteomic strategy for gaining insights into protein sumoylation in yeast. Mol. Cell. Proteomics 4, 246– 254 [DOI] [PubMed] [Google Scholar]
  • 18.Martínez-Ruiz A., Lamas S. ( 2007) Proteomic identification of S-nitrosylated proteins in endothelial cells. Methods Mol. Biol. 357, 215– 223 [DOI] [PubMed] [Google Scholar]
  • 19.Diella F., Cameron S., Gemünd C., Linding R., Via A., Kuster B., Sicheritz-Pontén T., Blom N., Gibson T. J. ( 2004) Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinformatics 5, 79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Gupta R., Birch H., Rapacki K., Brunak S., Hansen J. E. ( 1999) O-GLYCBASE version 4.0: a revised database of O-glycosylated proteins. Nucleic Acids Res. 27, 370– 372 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Watanabe K., Harayama S. ( 2001) SWISS-PROT: the curated protein sequence database on internet. Tanpakushitsu Kakusan Koso 46, 80– 86 [PubMed] [Google Scholar]
  • 22.Mishra G. R., Suresh M., Kumaran K., Kannabiran N., Suresh S., Bala P., Shivakumar K., Anuradha N., Reddy R., Raghavan T. M., Menon S., Hanumanthu G., Gupta M., Upendran S., Gupta S., Mahesh M., Jacob B., Mathew P., Chatterjee P., Arun K. S., Sharma S., Chandrika K. N., Deshpande N., Palvankar K., Raghavnath R., Krishnakanth R., Karathia H., Rekha B., Nayak R., Vishnupriya G., Kumar H. G., Nagini M., Kumar G. S., Jose R., Deepthi P., Mohan S. S., Gandhi T. K., Harsha H. C., Deshpande K. S., Sarker M., Prasad T. S., Pandey A. ( 2006) Human protein reference database–2006 update. Nucleic Acids Res. 34, D411– 414 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Lee T. Y., Huang H. D., Hung J. H., Huang H. Y., Yang Y. S., Wang T. H. ( 2006) dbPTM: an information repository of protein post-translational modification. Nucleic Acids Res. 34, D622– 627 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Hornbeck P. V., Chabra I., Kornhauser J. M., Skrzypek E., Zhang B. ( 2004) PhosphoSite: A bioinformatics resource dedicated to physiological protein phosphorylation. Proteomics 4, 1551– 1561 [DOI] [PubMed] [Google Scholar]
  • 25.Chernorudskiy A. L., Garcia A., Eremin E. V., Shorina A. S., Kondratieva E. V., Gainullin M. R. ( 2007) UbiProt: a database of ubiquitylated proteins. BMC Bioinformatics 8, 126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Xue Y., Zhou F., Fu C., Xu Y., Yao X. ( 2006) SUMOsp: a web server for sumoylation site prediction. Nucleic Acids Res. 34, W254– 257 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Chen H., Xue Y., Huang N., Yao X., Sun Z. ( 2006) MeMo: a web tool for prediction of protein methylation modifications. Nucleic Acids Res. 34, W249– 253 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Kiemer L., Bendtsen J. D., Blom N. ( 2005) NetAcet: prediction of N-terminal acetylation sites. Bioinformatics 21, 1269– 1270 [DOI] [PubMed] [Google Scholar]
  • 29.Eddy S. R. ( 1998) Profile hidden Markov models. Bioinformatics 14, 755– 763 [DOI] [PubMed] [Google Scholar]
  • 30.Bateman A., Coin L., Durbin R., Finn R. D., Hollich V., Griffiths-Jones S., Khanna A., Marshall M., Moxon S., Sonnhammer E. L., Studholme D. J., Yeats C., Eddy S. R. ( 2004) The Pfam protein families database. Nucleic Acids Res. 32, D138– 141 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Kanehisa M., Goto S. ( 2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27– 30 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Ashburner M., Ball C. A., Blake J. A., Botstein D., Butler H., Cherry J. M., Davis A. P., Dolinski K., Dwight S. S., Eppig J. T., Harris M. A., Hill D. P., Issel-Tarver L., Kasarskis A., Lewis S., Matese J. C., Richardson J. E., Ringwald M., Rubin G. M., Sherlock G. ( 2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25– 29 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Hamosh A., Scott A. F., Amberger J. S., Bocchini C. A., McKusick V. A. ( 2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514– 517 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Wheeler D. L., Barrett T., Benson D. A., Bryant S. H., Canese K., Chetvernin V., Church D. M., DiCuccio M., Edgar R., Federhen S., Geer L. Y., Kapustin Y., Khovayko O., Landsman D., Lipman D. J., Madden T. L., Maglott D. R., Ostell J., Miller V., Pruitt K. D., Schuler G. D., Sequeira E., Sherry S. T., Sirotkin K., Souvorov A., Starchenko G., Tatusov R. L., Tatusova T. A., Wagner L., Yaschenko E. ( 2007) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 35, D5– 12 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., Lipman D. J. ( 1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389– 3402 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Smith T. F., Waterman M. S. ( 1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195– 197 [DOI] [PubMed] [Google Scholar]
  • 37.Thompson J. D., Higgins D. G., Gibson T. J. ( 1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673– 4680 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Li Q., Lee B. T., Zhang L. ( 2005) Genome-scale analysis of positional clustering of mouse testis-specific genes. BMC Genomics 6, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Jensen O. N. ( 2006) Interpreting the protein language using proteomics. Nat. Rev. Mol. Cell Biol. 7, 391– 403 [DOI] [PubMed] [Google Scholar]
  • 40.Yang X. J., Grégoire S. ( 2006) A recurrent phospho-sumoyl switch in transcriptional repression and beyond. Mol. Cell 23, 779– 786 [DOI] [PubMed] [Google Scholar]
  • 41.Macek B., Gnad F., Soufi B., Kumar C., Olsen J. V., Mijakovic I., Mann M. ( 2008) Phosphoproteome analysis of E. coli reveals evolutionary conservation of bacterial Ser/Thr/Tyr phosphorylation. Mol. Cell. Proteomics 7, 299– 307 [DOI] [PubMed] [Google Scholar]
  • 42.Boutanaev A. M., Kalmykova A. I., Shevelyov Y. Y., Nurminsky D. I. ( 2002) Large clusters of co-expressed genes in the Drosophila genome. Nature 420, 666– 669 [DOI] [PubMed] [Google Scholar]
  • 43.Caron H., van Schaik B., van der Mee M., Baas F., Riggins G., van Sluis P., Hermus M. C., van Asperen R., Boon K., Voûte P. A., Heisterkamp S., van Kampen A., Versteeg R. ( 2001) The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 291, 1289– 1292 [DOI] [PubMed] [Google Scholar]
  • 44.Yang X. J. ( 2005) Multisite protein modification and intramolecular signaling. Oncogene 24, 1653– 1662 [DOI] [PubMed] [Google Scholar]
  • 45.Xu Y. X., Hirose Y., Zhou X. Z., Lu K. P., Manley J. L. ( 2003) Pin1 modulates the structure and function of human RNA polymerase II. Genes Dev. 17, 2765– 2776 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Prudhomme W., Daley G. Q., Zandstra P., Lauffenburger D. A. ( 2004) Multivariate proteomic analysis of murine embryonic stem cell self-renewal versus differentiation signaling. Proc. Natl. Acad. Sci. U.S.A. 101, 2900– 2905 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Wang J., Rao S., Chu J., Shen X., Levasseur D. N., Theunissen T. W., Orkin S. H. ( 2006) A protein interaction network for pluripotency of embryonic stem cells. Nature 444, 364– 368 [DOI] [PubMed] [Google Scholar]
  • 48.Hakuno D., Takahashi T., Lammerding J., Lee R. T. ( 2005) Focal adhesion kinase signaling regulates cardiogenesis of embryonic stem cells. J. Biol. Chem. 280, 39534– 39544 [DOI] [PubMed] [Google Scholar]
  • 49.Reinders J., Sickmann A. ( 2007) Modificomics: posttranslational modifications beyond protein phosphorylation and glycosylation. Biomol. Eng. 24, 169– 177 [DOI] [PubMed] [Google Scholar]
  • 50.Jones P., Côté R. G., Cho S. Y., Klie S., Martens L., Quinn A. F., Thorneycroft D., Hermjakob H. ( 2008) PRIDE: new developments and new datasets. Nucleic Acids Res. 36, D878– D883 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Barrett T., Troup D. B., Wilhite S. E., Ledoux P., Rudnev D., Evangelista C., Kim I. F., Soboleva A., Tomashevsky M., Edgar R. ( 2007) NCBI GEO: mining tens of millions of expression profiles–database and tools update. Nucleic Acids Res. 35, D760– 765 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Schwartz D., Chou M. F., Church G. M. ( 2008) Predicting protein post-translational modifications using meta-analysis of proteome-scale data sets. Mol. Cell. Proteomics 2, 365– 379 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Boekhorst J., van Breukelen B., Heck A. J., Snel B. ( 2008) Comparative phosphoproteomics reveals evolutionary and functional conservation of phosphorylation across eukaryotes. Genome Biol. 9, R144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Seet B. T., Dikic I., Zhou M. M., Pawson T. ( 2006) Reading protein modifications with interaction domains. Nat. Rev. Mol. Cell Biol. 7, 473– 483 [DOI] [PubMed] [Google Scholar]
  • 55.Li S. L., Li H., Li M. F., Yu Shyr., Xie L., Li Y. X. ( 2009) Improved prediction of lysine acetylation by support vector machines. Protein Pept. Lett., in press [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

An online version of SysPTM is available, where users can query protein and PTM information through search and browse pages, retrieve batch files through a statistics page, upload their own modification datasets through a submit page, and acquire help through a “FAQ” page. Supplemental Fig. S4A shows the search page interface. Users may query the database by gene name, protein description, Swiss-Prot ID/AC, IPI ID, or NCBI protein GI. The BLAST search engine also is included to allow users to perform sequence similarity searches. supplemental Fig. S4B illustrates the browse page. Through this page, users can browse PTM proteins by PTM type, data source, or KEGG pathway. Search and browse results are returned as SysPTM entries; each entry contains eight sections (supplemental Fig. S4C), each of which can be expanded by clicking to show details (supplemental Fig. S5). The sections provide the following information: “Summary” gives basic protein information, such as species, protein description, gene name, protein identifier, and three-dimensional structure; “PTMsite-Statistics” lists the number of PTM sites of each modification type, along with the SysPTM data source (SysPTM-A and/or B) for these sites; “PTMsite-Map” is an interactive interface to view PTM sites, Pfam domains, and PTM position clusters along the protein sequence; “PTMsite-Table” lists each PTM residue, position, type, and data source (SysPTM-A and/or B); “PTMsite-Source” provides more detailed information on data source (protein data base or literature reference) and original experimental evidence; “PTMsite-Cluster” gives predicted PTM position clusters with their statistical p value; “PTMprotein-Sequence” highlights PTM sites on the protein sequence; and “PTMprotein-Annotation” shows protein annotation information and links to public data bases.


Articles from Molecular & Cellular Proteomics : MCP are provided here courtesy of American Society for Biochemistry and Molecular Biology

RESOURCES