Abstract
Phosphorylation is one of the most prevalent post-translational modifications and plays a key role in regulating cellular processes. We carried out a bioinformatics analysis of pre-existing phosphoproteomics data, to profile two model species representing the largest subclasses in flowering plants the dicot Arabidopsis thaliana and the monocot Oryza sativa, to understand the extent to which phosphorylation signaling and function is conserved across evolutionary divergent plants. We identified 6537 phosphopeptides from 3189 phosphoproteins in Arabidopsis and 2307 phosphopeptides from 1613 phosphoproteins in rice. We identified phosphorylation motifs, finding nineteen pS motifs and two pT motifs shared in rice and Arabidopsis. The majority of shared motif-containing proteins were mapped to the same biological processes with similar patterns of fold enrichment, indicating high functional conservation. We also identified shared patterns of crosstalk between phosphoserines with enrichment for motifs pSXpS, pSXXpS and pSXXXpS, where X is any amino acid. Lastly, our results identified several pairs of motifs that are significantly enriched to co-occur in Arabidopsis proteins, indicating cross-talk between different sites, but this was not observed in rice.
Significance
Our results demonstrate that there are evolutionary conserved mechanisms of phosphorylation-mediated signaling in plants, via analysis of high-throughput phosphorylation proteomics data from key monocot and dicot species: rice and Arabidposis thaliana. The results also suggest that there is increased crosstalk between phosphorylation sites in A. thaliana compared with rice. The results are important for our general understanding of cell signaling in plants, and the ability to use A. thaliana as a general model for plant biology.
Abbreviations: GO, Gene ontology; pProtein, Phosphorylated protein; pS, Phospho-serine; pT, Phospho-threonine; pY, Phospho-tyrosine
Keywords: Bioinformatics, Phosphoproteomics, Motif identification, Evolutionary conservation, Pathway analysis
Graphical abstract
Highlights
-
•
The evolutionary conservation of phosphorylation signalling in plants was explored.
-
•
Nineteen pS motifs and two pT motifs shared in rice and Arabidopsis were identified.
-
•
Shared motif-containing proteins are similarly enriched in the same pathways.
-
•
Dual site, shared motifs pSXpS, pSXXpS and pSXXXpS were identified.
-
•
There is greater co-occurrence of crosstalk between pSites in Arabidopsis.
1. Introduction
Phosphorylation is regarded as one of the most prevalent post-translational modifications [1]. The reaction is catalyzed by a protein kinase to transfer the γ-phosphoryl group from adenosine triphosphate (ATP) or guanosine triphosphate (GTP), most commonly via a covalent bond to the hydroxyl group of a specific serine, threonine, or tyrosine amino acid within the target protein [2]. One feature of protein phosphorylation that makes it an ideal participant in signal transduction pathways is the reversibility of its chemical reaction through the subsequent removal of the phosphoryl group attached to the phosphorylated amino acid by a protein phosphatase, allowing for signal transduction cascade to maintain a prompt cellular response to stimuli coming from outside or within the cells [3].
Phosphorylation regulates protein function and cell signaling by triggering a change in the three-dimensional structure of the protein, which in turn influences the protein into behaving differently by activating or deactivating its catalytic function. A change in the structure of a phosphorylated protein (pProtein) can also recruit other proteins that contain structurally conserved domains to recognize and bind to specific motifs [4,5]. Phosphorylation events are crucial to understanding the functional biology of plants, since they control essential biological processes including seed germination, stomatal movement, pistil development and pollination, the innate immune response, defense and stress tolerance [[6], [7], [8], [9], [10]].
Monocots and dicots are the largest subclasses in flowering plants (Angiosperms) [11]. The monocot lineage branched off from dicots approximately 140–150 million years ago [12], yet many key mechanisms and transcription factors present in both dicots and monocots regulate the expression of biotic and abiotic stress response genes [[13], [14], [15], [16], [17], [18]]. In this work, we explore the extent to which phosphorylation-mediated signaling events are conserved, or have diverged, between two key model species – rice and Arabidopsis. We thus aim to understand the extent to which findings about signaling in these model organisms are likely to be transferable to other non-model plants. A past study by Nakagami et al. identified some conserved phosphorylation sites in plants based on analyzing orthologous phosphoproteins in rice and Arabidopsis [19]. In their analysis, around 50% of pProteins in either species had an ortholog that was also phosphorylated (50.4% rice pProteins to Arabidopsis orthologue; 56.2% Arabidopsis pProtein to rice orthologue), and around 20% phosphorylated at the same site (18% rice phospho-site; 25% Arabidopsis phospho-site). Our work extends from this study by examining the extent to which phosphorylation motifs are conserved in the two species, the functional associations of those motifs, and the evidence for crosstalk between proximal phospho-sites.
Rice (Oryza sativa) is a primary monocot model plant for cereal research due to its compact genome and evolutionary relationships with other cereals that have larger genomes [20]. According to the Ensembl plants website (Assembly: IRGSP-1.0, INSDC Assembly GCA_001433935.1) O. sativa Japonica has a 374 MB genome, 35,679 coding genes and 48,950 protein sequences recorded in UniProt (March 2017, UniProt Proteome ID: P000059680). O. sativa Indica has a 412 MB genome, 40,745 coding genes in Ensembl (database version 88.2, ASM465v1, INSDC Assembly GCA_000004655.2, Jan 2005) with 37,385 protein sequences in UniProt (UP000007015). Arabidopsis thaliana has a 136 MB genome, 27,655 coding genes recorded in Ensembl plants (Database version 88.11, TAIR10, INSDC Assembly GCA_000001735.1, Sep 2010,) and 32,113 protein sequences in UniProt (Proteome ID UP000006548). Discrepancies between counts of coding genes and protein sequences are likely due to different genome annotations present in different resources, however, for the purposes of comparison we can conclude that the rice genome (both sub-species) is around three times larger than A. thaliana, and likely has 30-50% higher gene count. At present, there are 9 wild rice varieties (other Oryza species) with genomes in Ensembl, with gene counts ranging from ~29K (Oryza meridionalis) up to 37K (Oryza rufipogon), indicating that the higher gene count is relatively stable and not a result of recent domestication as in the case of the wheat genome (104K genes).
The rice and Arabidopsis genomes contain 1512 (in Japonica; 1403 in Indica) and 1052 protein kinases, respectively, more than twice the number in humans (516) [21]. Unique protein kinases that are only found in higher plants, such as RLKs, are positive regulators for plant tolerance to salt and cold stresses [[22], [23], [24]]. However, not all duplicates have been retained: for example, plant protein kinases that are involved in housekeeping functions (such as metabolism, mitosis, and other primary functions that all living organisms need to survive) are kept in low copy numbers while other plant protein kinases, in particular kinases that are expressed only at specific developmental stages, have a high copy number of duplicates. It has previously been suggested that many of these protein kinases were duplicated and retained owing to their roles in plant-specific processes, while housekeeping genes might be under strong purifying selection acts to preserve their normal function [25].
In terms of experimental data on plant species, Arabidopsis has the most extensive proteomic data available in plants. It is one of the top ten species in the number of submitted data sets in the ProteomeXchange database [26]. Compared with Arabidopsis, fewer data are available in other plant species, and thus it is important to understand the extent to which findings in this model organism can be transferred to other plant species, in particular crops that are important for food security, such as rice.
In this study, we carried out a comparative qualitative phosphoproteomics analysis in dicot A. thaliana and monocot O. sativa, using pre-existing mass spectrometry (MS) phosphoproteomics datasets and several bioinformatic tools and resources. The aims of our study were to identify and compare features of rice and Arabidopsis phosphoproteomes, and to identify shared phosphorylation motifs and associated biological processes across flowering plants. This could enhance our understanding of regulatory systems, which in the longer term could potentially lead to improving the productivity of agronomically important plant species.
2. Materials and methods
2.1. Datasets
Two rice O. sativa Indica (PXD002222 and PXD000923) and two mouse-ear cress A. thaliana (PXD000033 and PXD000421) data sets were chosen from experiments that include phospho-enrichment steps [[27], [28], [29], [30]] (Table 1). From the available data sets in ProteomeXchange we selected those with the largest numbers of phosphopeptides reported in the original publications. MGF files for each data set were downloaded from the ProteomeXchange central website (http://www.proteomexchange.org). If MGF files were not available, raw files were converted to MGF using MSConvertGUI [31].
Table 1.
Species (tissue) |
Data set identifier | Phospho-peptide enrichment | Precursor mass error tol. | Fragment mass error tol. | Missed cleav-ages | Fixed modifications | Variable modifications | −10lgP thresholda | Instrument/fragmentation |
---|---|---|---|---|---|---|---|---|---|
Arabidopsis thaliana (aerial parts) |
PXD000033 | IMAC and TiO2 | ±5.0 ppm | ±0.5 Da | Three | Beta-methylthiolation (+45.99 Da), iTRAQ 4plex (K, N-term) (+144.10 Da) | Oxidation (M) (+15.99 Da), Phosphorylation (STY) (+79.97 Da), Acetylation (N-term) (+42.01 Da), Deamidation (NQ) (+0.98 Da), iTRAQ 4plex (Y) (+144.10 Da) | 30 | Orbitrap (Orbi-Trap) CID/HCD |
Arabidopsis thaliana (leaves) |
PXD000421 | MOAC | ±7.0 ppm | ±0.8 Da | Two | Carbamidomethylation (+57.02 Da) | Oxidation (M) (+15.99 Da), Phosphorylation (STY) (+79.97 Da), Acetylation (N-term) (+42.01 Da), Deamidation (NQ) (+0.98 Da) | 24.2 | Orbitrap (Orbi-Trap) CID |
Oryza sativa Indica (leaves) |
PXD002222 | TiO2 | ±20.0 ppm | ±0.05 Da | One | Carbamidomethylation (+57.02 Da) | Oxidation (M) (+15.99 Da), Phosphorylation (STY) (+79.97 Da), Acetylation (N-term) (+42.01 Da) | 19 | Orbitrap (Orbi-Trap) HCD |
Oryza sativa Indica (pistil) |
PXD000923 | IMAC | ±15 ppm | ±0.05 Da | One | Carbamidomethylation (+57.02 Da) | Oxidation (M) (+15.99 Da), Phosphorylation (STY) (+79.97 Da) | 21 | Triple TOF HCD |
MOAC = Metal Oxide Affinity Chromatography; IMAC = Immobilized Metal Affinity Chromatography.
PEAKS DB’s −10lgP threshold score was set as a cutoff filter to achieve false discovery rate (FDR) of 1.0% at the peptide sequence level.
2.2. Proteins and peptides identification and modification localization
PEAKS software version 7.5 [32] was used to search the spectra obtained from datasets PXD000033 and PXD000421 against the A. thaliana protein database and spectra obtained from datasets PXD002222 and PXD000923 against O. sativa Indica database. Databases were downloaded from UniProt version 59 including canonical and isoform protein sequences [33]. Trypsin was specified as the proteolytic enzyme of choice, with one non-specific cleavage and three maximum variable modifications per peptide for all data sets. Other search parameters were selected to match as closely as possible the parameters from the original analysis - a summary of search parameters and cutoff filter used in PEAKS database searching is shown in Table 1.
2.3. Extraction of confidently identified phosphopeptides and phosphoproteins
Code was written in Python (version 3.5.2) to merge exported PEAKS results (protein-peptides.csv file) for each species and extract confidently identified phosphopeptides and phosphoproteins. For the purposes of downstream analysis of sites of phosphorylation (phospho-sites), we merged peptide-spectrum matches (PSMs) identifying the same peptide and to ensure confident site analysis, we selected phosphorylation sites with Ascore ≥ 20. An Ascore of 20 will localize the site of modification with 99% certainty (p = 0.01) [34].
2.4. Statistical analysis of the extent of multi-phosphorylation
Two-proportion z-tests were performed in R version 3.3.3 to test whether the observed proportion of multi-phosphorylated peptides in Arabidopsis is greater than the observed proportion in rice.
2.5. Phospho-site distance analysis
Proteins with more than two phospho-sites identified in the source data sets were extracted from our list of confidently identified phosphoproteins. Distances between phospho-sites were measured using a sliding window of 30 amino acids, starting from the first phosphorylated residue from the N-terminus. The distance was calculated by counting the number of amino acids until the next adjacent phosphorylated residue towards the C-terminus. Each pair of sites was assigned to a group based on the type of amino acid and the direction from the N-to-C termini. A background distribution, used to determine whether distances between phosphoserines observed differed from the random expectation, was calculated by two methods: one in which we calculated the distance between all serine residues in the theoretical proteome (all proteins in the species’ FASTA file), and one in which a peptide digestion model was included, to model the random sampling of peptides that occurs in LC-MS/MS workflows. However, both models produced near identical distributions (data not shown), and thus the former is presented for simplicity in interpretation.
2.6. Phosphorylation motif prediction
Phosphopeptide sequences were submitted to Motif-x [35] to predict phosphorylation motifs present in our identified phosphoproteins. 15-mers were constructed using Python, the pre-aligned 15-mer peptides were centered on each phosphorylated serine, threonine, and tyrosine and extended seven residues towards the N-terminus and seven residues towards the C-terminus. When the site was located less than seven residues from the N/C-terminus of the protein, the 15-mer was completed with the letter “X” to reach the required length of fifteen residues. For onward analysis, the occurrence threshold (i.e. number of 15-mer peptides with each motif) was ≥5, and the significance threshold ≤0.00018, to ensure a p-value of at least 0.05 by the Bonferroni global correction method. Oryza sativa Indica and Arabidopsis thaliana proteomes obtained from UniProt version 59 were used to supply the background distribution for rice and Arabidopsis analyses, respectively. Motif-x results for Arabidopsis and rice were analyzed using Python code to identify motifs shared in the two species. A motif was classified as “shared”, if the same motif was identified passing the thresholds above in both species.
2.7. Functional classification of motif containing proteins
Python code was written to obtain accessions for proteins from Arabidopsis and rice containing shared phosphorylation motifs. Protein accessions containing each shared motif were then grouped together into a list. The PANTHER statistical overrepresentation test [36] was used to classify each shared motif list into inferred biological processes, using the Bonferroni correction to apply global correction for multiple tests.
PANTHER uses Oryza sativa Japonica as a reference list for rice. To counteract the problem of using different cultivar, OrthoMCL [37] was first used to obtain Japonica accessions, then OrthoMCL results were analyzed (via Python code) to extract accessions orthologous to rice Indica. Biological processes with fewer than five mapped proteins were excluded from further analyses. PANTHER results were analyzed (via Python code) to extract biological processes that were inferred in both species by PANTHER for each shared motif list.
2.8. Shared motif pair analysis
We next examined the “shared” phosphorylation motifs in both species independently, to examine whether particular pairs of motifs co-occurred within the same protein more often than would be expected by chance. All possible pair combinations of the twenty-one motifs were considered. Observed counts of co-occurrence less than five proteins were excluded from statistical analysis. The enrichment factor was calculated as the ratio of observed count of proteins that contains a shared motif pair with expected count across confidentially identified phosphoproteins. A one-tailed Fisher’s exact test was used to compare if the observed counts of protein containing a motif pair were significantly higher than expected values, using the 2×2 contingency table as shown in Table 2. The p-values for multiple testing were corrected by the Benjamini-Hochberg procedure.
Table 2.
Observed count | Expected count | |
---|---|---|
Proteins with motif a and b | q | mn/N |
Proteins without motif a and b | N – m – n + q | ((N − m)(N − n))/N |
3. Results and discussion
3.1. Identification of phosphorylation sites, peptides, and proteins
By combining the results from two datasets for each species and only selecting phosphopeptides with confidently identified phosphorylation sites, a total of 6537 unique phosphopeptides from 3189 unique phosphoproteins were identified in Arabidopsis (~50% of the total proteins we identified). In rice, we identified 2307 unique phosphopeptides from 1613 unique phosphoproteins (56% of the total identified proteins in rice) – see Supplementary File 1 for all identifications.
To explore whether there are apparent global differences in the extent of phosphorylation across the two species, we profiled the numbers of phospho-sites per peptide and per protein. We identified a significantly greater proportion of singly phosphorylated peptides in rice (89.7%) than in Arabidopsis (78.9%) (Table 3). The majority of multi-phosphorylated peptides have two phosphates in both species. Only 2.3% of the phosphopeptides in Arabidopsis and 1.1% of the phosphopeptides in rice were suggested to have three phosphates. We found no phosphopeptides with four or more phosphates in this study, although this may be limited by overall the length of peptides identified.
Table 3.
Arabidopsis thaliana | Oryza sativa Indica | |
---|---|---|
Identified proteins | 6332 | 2878 |
Identified peptides | 52,430 | 10,117 |
Identified phosphoproteins | 3189 | 1613 |
Identified phosphopeptides | 6537 | 2307 |
Total phospho-sitesa | 9249 | 2580 |
Singly phosphorylated proteins | 1342 (42%) | 1025 (63.5%) |
Multi-phosphorylated proteins | 1847 (58%) | 588 (36.5%) |
Shared motif containing proteins | 791 (24.80%) | 1012 (62.74%) |
pS%:pT%:pY%b | 88.3: 11.4: 0.4 | 86.7: 12.8: 0.5 |
1Pi%:2Pi%:3Pi%c | 78.9: 18.7: 2.3 | 89.7: 9.2: 1.1 |
(1Pi%:2Pi%:3Pi%) in serine containing phosphopeptidesd | 78.7: 18.9: 2.4 | 90.9: 8.0: 1.1 |
(1Pi%:2Pi%:3Pi%) in threonine containing phosphopeptidese | 71.7: 23.8: 4.5 | 87.1: 12.9: 0 |
(1Pi%:2Pi%:3Pi%) in tyrosine containing phosphopeptidesf | 58.3: 41.7: 0 | 91.7: 8.3: 0 |
Phosphorylation sites assignment with 99% certainty based on a p-value of 0.01 (Ascore ≥ 20).
Relative abundance of serine, threonine, and tyrosine phosphorylation sites based on analyzing 9249 phosphorylation sites in Arabidopsis and 2580 phosphorylation sites in rice.
Relative frequency of singly, doubly, and triply phosphorylated peptides based on analyzing a total of 6537 p-peptides Arabidopsis and 2307 p-peptides in rice - no phosphopeptides with four or more phosphates were found in this study.
Relative frequency of singly, doubly, and triply phosphorylated peptides in serine containing phosphopeptides based on analyzing a total of 7489 serine containing phosphopeptides in Arabidopsis and 2148 serine containing phosphopeptides in rice.
Relative frequency of singly, doubly, and triply phosphorylated peptides in threonine-containing phosphopeptides based on analyzing a total of 1022 threonine containing phosphopeptides in Arabidopsis and 325 threonine-containing phosphopeptides in rice.
Relative frequency of singly, doubly, and triply phosphorylated peptides in tyrosine containing phosphopeptides based on analyzing a total of 36 tyrosine containing phosphopeptides in Arabidopsis and 12 tyrosine containing phosphopeptides in rice.
Our results showed that the percentage of multi-phosphorylated peptides is significantly greater (by two fold) in Arabidopsis (21% i.e. summing values in 2Pi% and 3Pi% in Table 3) compared to rice (10.3%) with a p-value = 2.2e−16. The likelihood of observing multi-phosphorylated peptides has the potential to be influenced by experimental factors, including differences in the digestion of proteins into peptides [38] or the enrichment protocol [39]. If proteins undergo less complete tryptic digestion, we would observe longer peptides, and thus the potential for more multi-phosphorylated peptides. To investigate whether peptide length is biasing the number of multi-phosphorylated peptides, we analyzed the length distribution for phosphorylated, non-phosphorylated and all peptides for rice and Arabidopsis (box plots for the distributions are presented in Supplementary Fig. 1). The analysis shows that there is no overall difference between all peptides (rice versus Arabidopsis) – median length 14 amino acids in both cases (quartiles 11–19 amino acids). Phosphopeptides from rice (median: 13 amino acids; quartiles: 10–17) are slightly longer than Arabidopsis (median: 12; quartiles: 10–17), suggesting that peptide length does not explain why significantly more multi-phosphorylated peptides are observed in Arabidopsis. To explore whether differences in the enrichment protocols have caused more multi-phosphorylated peptides to be identified in Arabidopsis is not straightforward. However, one might hypothesize that a stronger enrichment for phosphopeptides over unmodified peptides might give rise to a higher proportion of multi-phosphorylated peptides. In Arabidopsis, 12% (6537/52,430) of the total peptides we identified were phosphorylated compared with 23% in rice (2307/10,117). As such, it does not appear that there is a stronger enrichment for phosphopeptides in Arabidopsis, which could explain why multi-phosphorylated peptides appear more common. However, we acknowledge that a comparison using matched protocols would be more ideal to rule out all possible sources of experimental bias, especially given that there methods of phospho-peptide enrichment are not identical in all data sets (Table 1). In summary, the data are suggestive of a difference in the rate of multi-phosphorylation between the two species, but this requires further examination in future studies on larger data sets.
Sugiyama et al. reported that phospho-tyrosine containing peptides in Arabidopsis are frequently (75%) multi-phosphorylated [40]. Our results showed that 41.7% of all observed pY-containing peptides are multi-phosphorylated, whereas 28.3% of pT-containing peptides and 21.3% of pS-containing peptides are multi-phosphorylated, as shown in Table 3. Whilst the extent of multi-phosphorylation on pY containing peptides is not as high as previously reported, it appears evident that phosphorylation on tyrosine is coordinated with other nearby phosphorylation events as an additional regulatory mechanism.
When examining rice data, we do not see evidence for higher extent of co-modification associated with tyrosine phosphorylation. The multi-phosphorylation percentages in serine-, threonine-, and tyrosine-containing phosphopeptides are 9.1, 12.9 and 8.3% respectively. However, given the overall low counts of tyrosine containing phosphopeptides in rice, we cannot conclude that there is significant difference.
3.2. Coordination in proximal phosphorylation sites in Arabidopsis and rice
We examined the distribution of distances between phosphorylated residues in multi-phosphorylated proteins from the N-to-C terminal direction (Fig. 1). We have also scaled the frequency of serine to serine residues in all proteins, to act as a background distribution (for comparison with phosphoserine pair distances). Most of the paired phospho-sites follow the “randomly expected” trend from the background distribution with the exception of “pSXpS” (distance = 1), “pSXXpS” (distance = 2) and “pSXXXpS” (distance = 3), where there is evidently a strong enrichment in the paired phosphoserine sites in both species. There is no evidence for enrichment of a pSpS motif (distance = 0). The data thus indicate that in both rice and Arabidopsis there is functional crosstalk between phosphoserine sites separated by 1, 2 or 3 amino acids.
3.3. Shared motifs in Arabidopsis and rice
We identified in total 76 pS, 7 pT, and one pY motifs in Arabidopsis and 51 pS, 6 pT with no pY motifs found in rice due to limited number of phosphotyrosines in this data set (Supplementary File 2). The most abundant pS motifs are RS in Arabidopsis and SP in rice (Fig. 2). While TP, is by far the most common pT motif. Nineteen pS motifs and two pT motifs are shared in rice and Arabidopsis (making them candidate motifs likely to present in general in flowering plants) of which, four motifs (SP, TP, SxD, and SPR) are among the ten most abundant motifs predicted in rice and Arabidopsis, as shown in Fig. 2. We grouped the pS motifs into types or families similar to van Wijk et al. [41], for example, TP-, SP-, SD-, GS-, T-, and S-types. Most of the pS shared motifs belong to the basic SP-type, four of which (RxxSP, SPK, SPR, SPxR) are found to be highly enriched in the Arabidopsis nucleus. While SPR found to be highly enriched in pistil tissue (female reproductive part of a flower) in rice [28]. Motifs from the SD- and SP-types (SDxE, SP, SD) are very commonly targeted by Calcium-Dependent Protein Kinases (CDPKs). These Ser/Thr protein kinases are only found in plants, green algae and protozoa [42] and have an essential role in plant defense response [43].
3.4. Unique and novel motifs
Fifty-one pS, five pT, and one pY motifs are unique to Arabidopsis (i.e. we did not identify them in our rice data). Unique motifs such as RS, RSxS, and RxxxS are among the ten most abundant motifs in Arabidopsis. In rice, 29 pS and 4 pT motifs are classed as unique. It should be noted that there is incomplete coverage of the total phospho-proteome in our data and random sampling effects, so we cannot conclude that a motif is biologically unique, we use the term solely to denote that it was found in only one of the two species. Unique rice motifs such as SF, LxRxxS together with shared motifs SP, TP, and RxxS have been linked to rice response to bacterial blight [27]. Many of the identified Arabidopsis motifs in our study overlap with motifs found in a recent meta-analysis of phosphoproteome datasets in Arabidopsis by van Wijk et al. [41]. We also were able to identify RxxSF motif which was predicted by Wang et al. [44] but was not apparent in van Wijk et al. Fifty-five of the identified Arabidopsis motifs and forty-nine of the identified rice motifs in our study are “novel motifs” i.e. have not previously been identified in another study to our knowledge. These are motifs identified by Motif-x as significantly enriched in the data. However, we use the term “novel” with caution, since often these motifs are highly similar to well recognized motifs e.g. “KxxxSP”, “KxxxxSP” are classed as novel, but “SP” is one of the most commonly seen motifs, hence previous studies may have performed grouping on similar motifs. Three of the novel motifs: KxxxSP, KxxxxSP, LxRQxS were identified in both species and thus classed as “shared”, see Supplementary File 2 for a full list of shared, unique, and novel motifs.
3.5. Gene ontology classification by biological process
The majority of shared motif containing proteins are mapped to shared biological processes in rice and Arabidopsis, as shown in Fig. 3. Similar patterns of fold enrichment for similar biological processes in rice and Arabidopsis were observed by each class of phosphorylation motif. Fold enrichment provides information of relatively how many more (or fewer) proteins in our test list map to a particular biological process than what would be expected by chance [45].
Proteins containing motifs of type RSP mapped in both species were enriched (statistically significant for rice only) for three related process category terms: “mRNA processing”, “RNA metabolism” and “nucleobase-containing compound metabolic process” – a high level process term including nucleobases, nucleosides, nucleotides and nucleic acid synthesis or metabolism. Proteins containing the related SP motif were mapped to the same processes, as well as “nitrogen compound metabolic process” (significantly enriched in rice only) – a term including proteins involved in nitrogen fixing, nitrification, denitrification and so on, and vesicle related transport (not significantly enriched in either species).
RxxS containing proteins were mapped to a wider set of biological processes including “intracellular signal transduction” and a variety of other fairly high-level terms relating to metabolic processes. Since the total numbers of proteins carrying each motif is comparatively low, high-level process categories have generally been extracted. It would only be possible to map to more specific process terms for very large counts of proteins. In general, the results indicate that shared motifs between rice and Arabidopsis map to the same biological processes, indicating that the association between kinase signaling and biological process are likely to be ancient, shared mechanisms in plants.
3.6. Motif pair analysis
We examined the co-occurrence of shared motifs in Arabidopsis and rice (Fig. 4). Proteins with at least two shared phosphorylated motifs were included in this analysis. Among shared motif-containing proteins (1012 in rice and 791 in Arabidopsis) more proteins with at least two shared phosphorylated motifs are found in Arabidopsis (340, 43%) compared to rice (268, 26.5%). Statistically significant enrichment was seen for several pairs of motifs in Arabidopsis including i) “RxxSP” with “SP”; ii) “RxxSP” with “SPxR”; iii) “SD” with “SxD”; iv) “KxxxxSP” with “SP”; v) “RxxSP” with “TP”; vi) “RxxSP” with “SPR”; vii) “SP” with “TP”; viii) “SDxD” with “SDxE”; ix) “GS with “RxxS”; x) “RSP” with “RxxSP”; xi) “SP” with “SPR”; xii) “SPK” with “TP”; xiii) “RSP” with “SPxR”; xiv) “SP” with “SxD”; xv) “KxxxSP” with “SP”; xvi) “SxD” with “TP”; xvii) “GS” with “SxD”; xviii) “GS” with “SP”; xix) “SP” with “SPxR”. It should be noted that in each case different sites have contributed to the motifs e.g. for cases where a motif is a subset of another “RxxSP” with “SP” the same sites did not contribute to both motifs. There is a trend towards motifs of the same general class e.g. SP type; SD or SxD to be most strongly enriched together. We interpret this to mean that the same kinase is likely phosphorylating multiple sites in the same protein, with similar motif types, rather than two kinases working in a coordinated manner. For cases where motifs of different classes are found to be co-occurring e.g. SP with SxD and GS with SxD, would point to protein sets likely under coordinated control from more than one kinase. As discussed above, we observe considerably fewer sites of multi-phosphorylation in rice than Arabidopsis, and this result is further exemplified in Fig. 4. There are no motifs that co-occur in rice that pass statistical significance. It has previously been demonstrated in the human kinome that those proteins phosphorylated by a single kinase on a single site, participated in fewer pathways than those phosphorylated by multiple kinases [46]. It is thus possible to speculate that the expansion in both overall gene/protein count, and in the kinase count, in rice compared to Arabidopsis, could lead to less crosstalk between phospho-sites in rice, and overall higher specificity in signaling through kinases. Further work is needed to explore the phenomenon across other plants species, so that research groups using Arabidopsis as a general model for plant biology and signaling can understand the extent to which findings are transferable.
4. Conclusion
This study provided a thorough comparative analysis of the Arabidopsis and rice phosphoproteomes. By employing several datasets from previous phosphoproteomics studies, we confidently identified 6537 phosphopeptides from 3189 phosphoproteins in Arabidopsis, and 2307 phosphopeptides from 1613 phosphoproteins in rice, with the site of phosphorylation localized with 99% certainty. We identified 21 phosphorylation motifs shared between rice and Arabidopsis, and demonstrated further that they are associated with the same biological processes. It is reasonable to assume that phosphorylation-related studies on Arabidopsis as a key model species, can be used to make inferences about economically important crops, such as rice.
For both species, there is an enrichment for phospho-serine multi-phosphorylation sites separated by one, two or three amino acids. However, there are some differences between the two species (separated by ~150 million years) in terms of overall gene and kinase count. We demonstrated that multi-phosphorylation was observed to be statistically higher in the Arabidopsis data than in the rice data, and we cannot find a confounding experimental factor that would explain the difference. Given the higher gene and kinase count in rice, it appears plausible that the difference is explained by higher specificity in signaling mechanisms in rice. Future studies on a wider range of model plant species should explore whether the functional implications of higher specificity and the extent to which this phenomenon extends to other plant species.
The following are the supplementary data related to this article.
Acknowledgements
We are pleased to acknowledge funding from BBSRC and Newton fund that supported this work: [BB/N013743/1, BB/L005239/1].
References
- 1.Lemeer S., Heck A.J.R. The phosphoproteomics data explosion. Curr. Opin. Chem. Biol. 2009;13:414–420. doi: 10.1016/j.cbpa.2009.06.022. [DOI] [PubMed] [Google Scholar]
- 2.Krebs E.G. The enzymology of control by phosphorylation. Enzymes. 1986;17:3–20. [Google Scholar]
- 3.Mishra N.S., Tuteja R., Tuteja N. Signaling through MAP kinase networks in plants. Arch. Biochem. Biophys. 2006;452:55–68. doi: 10.1016/j.abb.2006.05.001. [DOI] [PubMed] [Google Scholar]
- 4.Pawson T., Scott J.D. Signaling through scaffold, anchoring, and adaptor proteins. Science. 1997;278:2075–2080. doi: 10.1126/science.278.5346.2075. [DOI] [PubMed] [Google Scholar]
- 5.Seet B.T., Dikic I., Zhou M.-M., Pawson T. Reading protein modifications with interaction domains. Nat. Rev. Mol. Cell Biol. 2006;7:473–483. doi: 10.1038/nrm1960. [DOI] [PubMed] [Google Scholar]
- 6.Dong K., Zhen S., Cheng Z., Cao H., Ge P., Yan Y. Proteomic analysis reveals key proteins and phosphoproteins upon seed germination of wheat (Triticum aestivum L.) Front. Plant Sci. 2015;6:1017. doi: 10.3389/fpls.2015.01017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Zhang T., Chen S., Harmon A.C. Protein phosphorylation in stomatal movement. Plant Signal. Behav. 2014;9 doi: 10.4161/15592316.2014.972845. (e972845-e) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Li L., Li M., Yu L., Zhou Z., Liang X., Liu Z. The FLS2-associated kinase BIK1 directly phosphorylates the NADPH oxidase RbohD to control plant immunity. Cell Host Microbe. 2014;15:329–338. doi: 10.1016/j.chom.2014.02.009. [DOI] [PubMed] [Google Scholar]
- 9.Singh A., Jha S.K., Bagri J., Pandey G.K. ABA inducible rice protein phosphatase 2C confers ABA insensitivity and abiotic stress tolerance in Arabidopsis. PloS One. 2015;10 doi: 10.1371/journal.pone.0125168. (e0125168-e) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Silva-Sanchez C., Li H., Chen S. Recent advances and challenges in plant phosphoproteomics. Proteomics. 2015;15:1127–1141. doi: 10.1002/pmic.201400410. [DOI] [PubMed] [Google Scholar]
- 11.Paterson A.H., Bowers J.E., Chapman B.A., Peterson D.G., Rong J., Wicker T.M. Comparative genome analysis of monocots and dicots, toward characterization of angiosperm diversity. Curr. Opin. Biotechnol. 2004;15:120–125. doi: 10.1016/j.copbio.2004.03.001. [DOI] [PubMed] [Google Scholar]
- 12.Chaw S.-M., Chang C.-C., Chen H.-L., Li W.-H. Dating the monocot—dicot divergence and the origin of core eudicots using whole chloroplast genomes. J. Mol. Evol. 2004;58:424–441. doi: 10.1007/s00239-003-2564-9. [DOI] [PubMed] [Google Scholar]
- 13.Ito Y., Katsura K., Maruyama K., Taji T., Kobayashi M., Seki M. Functional analysis of rice DREB1/CBF-type transcription factors involved in cold-responsive gene expression in transgenic rice. Plant Cell Physiol. 2006;47:141–153. doi: 10.1093/pcp/pci230. [DOI] [PubMed] [Google Scholar]
- 14.Kole C. Springer; 2013. Genomics and Breeding for Climate-Resilient Crops. [Google Scholar]
- 15.Molla K.A., Karmakar S., Chanda P.K., Sarkar S.N., Datta S.K., Datta K. Tissue-specific expression of Arabidopsis NPR1 gene in rice for sheath blight resistance without compromising phenotypic cost. Plant Sci. 2016;250:105–114. doi: 10.1016/j.plantsci.2016.06.005. [DOI] [PubMed] [Google Scholar]
- 16.Sato H., Todaka D., Kudo M., Mizoi J., Kidokoro S., Zhao Y. The Arabidopsis transcriptional regulator DPB3-1 enhances heat stress tolerance without growth retardation in rice. Plant Biotechnol. J. 2016;14(8):1756–1767. doi: 10.1111/pbi.12535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Yu L., Chen X., Wang Z., Wang S., Wang Y., Zhu Q. Arabidopsis enhanced drought tolerance1/HOMEODOMAIN GLABROUS11 confers drought tolerance in transgenic rice without yield penalty. Plant Physiol. 2013;162:1378–1391. doi: 10.1104/pp.113.217596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zhao F.-Y., Zhang X.-J., Li P.-H., Zhao Y.-X., Zhang H. Co-expression of the Suaeda salsa SsNHX1 and Arabidopsis AVP1 confer greater salt tolerance to transgenic rice than the single SsNHX1. Mol. Breeding. 2006;17:341–353. [Google Scholar]
- 19.Nakagami H., Sugiyama N., Mochida K., Daudi A., Yoshida Y., Toyoda T. Large-scale comparative phosphoproteomics identifies conserved phosphorylation sites in plants. Plant Physiol. 2010;153:1161–1174. doi: 10.1104/pp.110.157347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Goff S.A. Rice as a model for cereal genomics. Curr. Opin. Plant Biol. 1999;2:86–89. doi: 10.1016/S1369-5266(99)80018-1. [DOI] [PubMed] [Google Scholar]
- 21.Wang Y., Liu Z., Cheng H., Gao T., Pan Z., Yang Q. EKPD: a hierarchical database of eukaryotic protein kinases and protein phosphatases. Nucleic Acids Res. 2014;42:D496–D502. doi: 10.1093/nar/gkt1121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Shi C.-C., Feng C.-C., Yang M.-M., Li J.-L., Li X.-X., Zhao B.-C. Overexpression of the receptor-like protein kinase genes AtRPK1 and OsRPK1 reduces the salt tolerance of Arabidopsis thaliana. Plant Sci. 2014;217:63–70. doi: 10.1016/j.plantsci.2013.12.002. [DOI] [PubMed] [Google Scholar]
- 23.Sun X.-L., Yu Q.-Y., Tang L.-L., Ji W., Bai X., Cai H. GsSRK, a G-type lectin S-receptor-like serine/threonine protein kinase, is a positive regulator of plant tolerance to salt stress. J. Plant Physiol. 2013;170:505–515. doi: 10.1016/j.jplph.2012.11.017. [DOI] [PubMed] [Google Scholar]
- 24.Stone J.M., Walker J.C. Plant protein kinase families and signal transduction. Plant Physiol. 1995;108:451–457. doi: 10.1104/pp.108.2.451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lehti-Shiu M.D., Shiu S.-H. Diversity, classification and function of the plant protein kinase superfamily. Philos. Trans. R. Soc. B. 2012;367:2619–2639. doi: 10.1098/rstb.2012.0003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Vizcaino J.A., Deutsch E.W., Wang R., Csordas A., Reisinger F., Rios D. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 2014;32:223–226. doi: 10.1038/nbt.2839. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Hou Y., Qiu J., Tong X., Wei X., Nallamilli B.R., Wu W. A comprehensive quantitative phosphoproteome analysis of rice in response to bacterial blight. BMC Plant Biol. 2015;15(1) doi: 10.1186/s12870-015-0541-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Wang K., Zhao Y., Li M., Gao F., Yang M.-K., Wang X. Analysis of phosphoproteome in rice pistil. Proteomics. 2014;14:2319–2334. doi: 10.1002/pmic.201400004. [DOI] [PubMed] [Google Scholar]
- 29.Roitinger E., Hofer M., Köcher T., Pichler P., Novatchkova M., Yang J. Quantitative phosphoproteomics of the ataxia telangiectasia-mutated (ATM) and ataxia telangiectasia-mutated and rad3-related (ATR) dependent DNA damage response in Arabidopsis thaliana. Mol. Cell. Proteomics. 2015;14:556–571. doi: 10.1074/mcp.M114.040352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Lassowskat I., Naumann K., Lee J., Scheel D. PAPE (Prefractionation-Assisted Phosphoprotein Enrichment): a novel approach for phosphoproteomic analysis of green tissues from plants. Proteome. 2013;1:254–274. doi: 10.3390/proteomes1030254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Chambers M.C., Maclean B., Burke R., Amodei D., Ruderman D.L., Neumann S. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 2012;30:918–920. doi: 10.1038/nbt.2377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Zhang J., Xin L., Shan B., Chen W., Xie M., Yuen D. PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol. Cell. Proteomics. 2012;11:M111–010587. doi: 10.1074/mcp.M111.010587. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.The UniProt C UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2017;45:D158–D169. doi: 10.1093/nar/gkw1099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Beausoleil S.A., Villén J., Gerber S.A., Rush J., Gygi S.P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 2006;24:1285–1292. doi: 10.1038/nbt1240. [DOI] [PubMed] [Google Scholar]
- 35.Schwartz D., Gygi S.P. An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets. Nat. Biotechnol. 2005;23:1391–1398. doi: 10.1038/nbt1146. [DOI] [PubMed] [Google Scholar]
- 36.Mi H., Poudel S., Muruganujan A., Casagrande J.T., Thomas P.D. PANTHER version 10: expanded protein families and functions, and analysis tools. Nucleic Acids Res. 2016;44:D336–D342. doi: 10.1093/nar/gkv1194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Li L., Stoeckert C.J., Roos D.S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–2189. doi: 10.1101/gr.1224503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Burkhart J.M., Schumbrutzki C., Wortelkamp S., Sickmann A., Zahedi R.P. Systematic and quantitative comparison of digest efficiency and specificity reveals the impact of trypsin quality on MS-based proteomics. J. Proteome. 2012;75:1454–1462. doi: 10.1016/j.jprot.2011.11.016. [DOI] [PubMed] [Google Scholar]
- 39.Yue X., Schunter A., Hummon A.B. Comparing multistep immobilized metal affinity chromatography and multistep TiO2 methods for phosphopeptide enrichment. Anal. Chem. 2015;87:8837–8844. doi: 10.1021/acs.analchem.5b01833. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Sugiyama N., Nakagami H., Mochida K., Daudi A., Tomita M., Shirasu K. Large-scale phosphorylation mapping reveals the extent of tyrosine phosphorylation in Arabidopsis. Mol. Syst. Biol. 2008;4:193. doi: 10.1038/msb.2008.32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.van Wijk K.J., Friso G., Walther D., Schulze W.X. Meta-analysis of Arabidopsis thaliana phospho-proteomics data reveals compartmentalization of phosphorylation motifs. Plant Cell. 2014;26:2367–2389. doi: 10.1105/tpc.114.125815. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Hrabak E.M., Chan C.W.M., Gribskov M., Harper J.F., Choi J.H., Halford N. The Arabidopsis CDPK-SnRK superfamily of protein kinases. Plant Physiol. 2003;132:666–680. doi: 10.1104/pp.102.011999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Boudsocq M., Sheen J. CDPKs in immune and stress signaling. Trends Plant Sci. 2013;18:30–40. doi: 10.1016/j.tplants.2012.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Wang X., Bian Y., Cheng K., Gu L.-F., Ye M., Zou H. A large-scale protein phosphorylation analysis reveals novel phosphorylation motifs and phosphoregulatory networks in Arabidopsis. J. Proteome. 2013;78:486–498. doi: 10.1016/j.jprot.2012.10.018. [DOI] [PubMed] [Google Scholar]
- 45.Huang D.W., Sherman B.T., Lempicki R.A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37:1–13. doi: 10.1093/nar/gkn923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Nishi H., Demir E., Panchenko A.R. Crosstalk between signaling pathways provided by single and multiple protein phosphorylation sites. J. Mol. Biol. 2015;427:511–520. doi: 10.1016/j.jmb.2014.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.