Abstract
Bioinformatics is used at three different steps of proteomic studies of sub-cellular compartments. First one is protein identification from mass spectrometry data. Second one is prediction of sub-cellular localization, and third one is the search of functional domains to predict the function of identified proteins in order to answer biological questions. The aim of the work was to get a new tool for improving the quality of proteomics of sub-cellular compartments. Starting from the analysis of problems found in databases, we designed a new Arabidopsis database named ProtAnnDB (http://www.polebio.scsv.ups-tlse.fr/ProtAnnDB/). It collects in one page predictions of sub-cellular localization and of functional domains made by available software. Using this database allows not only improvement of interpretation of proteomic data (top-down analysis), but also of procedures to isolate sub-cellular compartments (bottom-up quality control).
Keywords: bioinformatics, cell wall, plant, proteomics
Introduction
Bioinformatics is of paramount importance for protein analysis in proteomic studies. Proteomics generates huge amounts of data that must be interpreted to answer biological questions. It is used at three steps of proteomic studies (Fig. 1): (i) identification of proteins by peptide mass mapping or peptide sequencing; (ii) prediction of their sub-cellular localization, (iii) prediction of their function. Identification of proteins requires measurement of mass/charge ratios of tryptic peptides or of ions resulting from peptide fragmentation and comparison to sequence databases. This part of the work is done with devoted software such as ProteinProspector (http://prospector.ucsf.edu/), Mascot (http://www.matrixscience.com/search_form_select.html), and Profound (http://prowl.rockefeller.edu/prowl-cgi/profound.exe) that can be used online. Of course, reliability of such identifications depends on the quality of both sequences deposited in databases and of structural annotation of genomes. But many efforts have been done to improve it, especially with the setting of Unigene at NCBI which is defined as “An Organized View of the Transcriptome” (http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene). Analysis of protein sub-cellular localization and function is more puzzling. For Arabidopsis proteins, information on sub-cellular localization can only be found at MIPS (http://mips.gsf.de/proj/plant/jsf/athal/searchjsp/index.jsp). Information on prediction of functional domains is given in most databases. However, scores are not always given and the names of proteins may not be related to these predictions. The availability of all this information in a reliable and friendly way appeared critical when we obtained loads of data from proteomics. We wanted to use bioinformatics not only as a tool to interpret our experimental data in a “top-down analysis”, but also as “bottom-quality control” of our procedure for preparation of plant cell walls (Fig. 1)1,2. Starting from the analysis of problems found in databases, we designed a new Arabidopsis database named ProtAnnDB for Protein Annotation DataBase. It collects predictions made by available software. It allows the user to see the results in one page without any need to run their query against all of them.
Prediction of Sub-Cellular Localization of Proteins as a Valuable Tool to Assess the Quality of Sub-Cellular Proteomics
The creation and maintenance of cellular structures relies on the regulated expression and spatial targeting of proteins. In addition, determining the sub-cellular localization of a protein is an important first step towards understanding its function. Proteomics is one of the large-scale methods used to identify proteins, made possible by the sequencing of whole genomes. The homogenized cells are fractionated, most commonly through centrifugation. With proper purification and fractionation techniques, the contents of a particular fraction will correspond to a particular organelle. However, the approach is sensitive to contaminations. Purity control mainly relies on tests that should be positive for proteins expected in the purified fraction, and negative for proteins of all other cell compartments.
Plant cell wall is a particular difficult compartment since it is not surrounded by a membrane, and contains anionic carbohydrates that can bind basic proteins from other compartments. The reliability of protein profiling for a compartment like the cell wall thus strongly depends on the quality of the extraction protocol. There are two ways to obtain proteins from the cell wall compartment:4–5 (i) non-destructive methods using the culture medium of cell cultures, or the extraction of extra-cellular fluids by an infiltration/centrifugation procedure; (ii) destructive methods involving the rupture of the cells and the separation of the insoluble cell wall fraction. Since plant cell walls are mainly built up with highly dense polysaccharides, this property can be used to purify them by centrifugation through high density solutions. Several methods have been used to prepare enriched cell wall fractions and to verify their purity (Table 1). On the one hand, enzymology, immunology, and microscopy have been used to estimate the purity of the isolated fraction. On the other hand, bioinformatic analyses of the sequences of identified proteins allowed the prediction of their sub-cellular localization using PSORT (http://psort.ims.u-tokyo.ac.jp/form.html)6 and TargetP (http://www.cbs.dtu.dk/services/TargetP/).7,8 In a few cases, sub-cellular localization could not be reliably predicted. The ratio of the number of predicted secreted proteins to the total number of proteins identified has been calculated. This ratio can also be used as a purity control. It can be seen that classical methods of control support a high purity for most of the fractions. However, the concrete results of the bioinformatic predictions show a different picture with much lower degree of purity in most cases, suggesting that many fractions were not pure enough. On the contrary, although such controls have not been performed in the case of preparation of proteins from the hypocotyl cell wall fraction,1 it is remarkable that the level of purity of the cell wall protein fraction is one of the highest as calculated from bioinformatic predictions. It should be noted that the use of biochemical or immunological markers as purity criteria have produced a number of publications claiming that many well-known intracellular proteins are also secreted without a signal peptide.9 It is true that not all secreted proteins contain a signal peptide and alternative secretion pathways exist in both prokaryotes and eukaryotes. However, only a reduced number of proteins are secreted without a signal peptide in eukaryotes.10
Table 1.
Plant material | Preparation of the cell wall fraction | Method for purity control | Estimated fraction purity | Number of predicted secreted proteins | Total number of proteins | Number of proteins with uncertain sub-cellular localization | Cell wall protein fraction purity | Reference |
---|---|---|---|---|---|---|---|---|
Non-destructive methods | ||||||||
A. thaliana cell suspension cultures | washings with salt solutions | Microscopy | >60% | 51 | 96 | 0 | 53.1% | Borderies et al.23 |
A. thaliana cell suspension cultures | culture medium |
Enzymology (G-6-PDH: − ; ADH: −) Immunology (OEE, AnnAt1, cyclin D, GST: −; myrosinase: +) |
>99% | 9 | 13 | 1 | 69.2% | Oh et al.24 |
A. thaliana and O. sativa leaves | intercellular fluids | Enzymology (G-6-PDH: −) | >99% | 6 | 13 | 3 | 46.1% | Haslam et al.25 |
A. thaliana leaves | intercellular fluids | Enzymology (MDH: −) | >90% | 87 | 93 | 0 | 93.5% | Boudart et al.26 |
Destructive methods | ||||||||
A. thaliana cell suspension cultures | water extraction; 10% glycerol sedimentation |
Enzymology (callose synthase: −) Immunology (plasma membrane and chloroplasts: −; pectins: +) |
>99% | 24 | 75 | 2 | 32.0% | Chivasa et al.27 |
A. thaliana cell suspension cultures | salt solution containing 10% glycerol; extensive washings; CaCl2 final washing | Microscopy | >90% | 89 | 792 | 0 | 12.6% | Bayer et al.28 |
M. sativa stems | filtration and extensive washing | different protein patterns after 1D-E analysis of different fractions during the purification procedure | qualitative | 25 | 74 | 9 | 33.8% | Watson et al.29 |
A. thaliana etiolated hypocotyls | low salt buffer; increasing sucrose density sedimentation; extensive washing | none | not determined | 73 | 99 | 4 | 73.7% | Feiz et al.1 |
These comparisons show that the classical methods used to test for the purity of sub-cellular compartments are not conclusive for proteomic studies. Indeed, the sensitivity of mass spectrometry is much higher than that of enzymatic or immunological tests using specific markers. As a consequence, the characterization and prediction of the intrinsic signals that target proteins to the correct subcellular compartment has become a major task in bioinformatics. Although not all signals for protein sorting in cell compartments are described, bioinformatics can help in predicting subcellular localization of proteins thus contributing to the quality control of proteomic strategies (Fig. 1). In particular, sorting signals for vacuoles are of several types and probably not all are known.11 In addition, non classical pathway for protein secretion should be taken into account.10
Using Functional Domains as Efficient Tools for Annotation of Proteins
With regard to protein function and due to automatic annotation of proteins on the basis of BLAST searches (http://blast.ncbi.nlm.nih.gov/Blast.cgi),12 there are many mistakes in databases on the principle of the children game called the Chinese whispers. Even if functional domains such as InterPro, PFAM or PROSITE are now indicated in the description of protein sequences in most databases, the names proposed for proteins are often incorrect because they result from BLAST searches rather than from the presence of functional domains. Actually, BLAST results can rely on partial sequence homology as shown in the case of the family of 11 leucine-rich repeat extensins (LRXs)13 as LRXs and PEXs. Query of the NCBI Entrez Protein database (http://www.ncbi.nlm.nih.gov/sites/entrez?db=Protein) results in 14 accession numbers using the following key words: leucine-rich repeat AND extensin AND Arabidopsis. The same functional annotation was found at TAIR (http://arabidopsis.org/index.jsp) and TIGR (http://www.tigr.org/tdb/e2k1/ath1/) whereas only 6 proteins were given related names such as leucine-rich repeat/extensin or extensin-like at MIPS (http://mips.gsf.de/proj/plant/jsf/index.jsp) (Table 2). A detailed analysis of the information available in databases shows that the appropriate functional domains are listed in the description of the proteins (Table 1, supplementary data). However, the names assigned to the proteins are not correct at NCBI, TAIR, and TIGR in three cases (At2g19780, At4g06744, and At4g29240) since these names were given according to BLAST results. As shown for At2g19780 in Figure 2, significant identity was found with an LRX protein encoded by At3g24480, but only in its leucine-rich repeat (LRR) domain. All proteins that are bona fide LRXs according to Baumberger et al.13 should have at least one LRR domain and one proline-rich domain (Table 3). Annotation of At2g19780, At4g06744, and At4g29240 should be revised. On the contrary, At2g19780 and At3g24480 are annotated as “disease resistance proteins” at MIPS since many of such proteins have LRR domains. But there is no experimental evidence that these two proteins play any role in plant defense. At present, an annotation mentioning only the presence of structural LRR domains would be more relevant.
Table 2.
LRX | Extensin-like | Hypothetical protein | Unknown protein | Disease resistance protein | Not found | |
---|---|---|---|---|---|---|
NCBI | 14 | |||||
TAIR | 14 | |||||
TIGR | 14 | |||||
MIPS | 1 | 6 | 3 | 1 | 2 | 1 |
Baumberger et al.13 | 11 |
Table 3.
AGI accession number | Annotation by Baumberger et al.13 |
Leucine-rich repeat domains |
Proline-rich domains |
||||
---|---|---|---|---|---|---|---|
IPR001611 PF00560 | IPR013210 PF08263 | PS50099 | IPR003882 PR01218 | IPR003883 PF02095 | PR01217 | ||
At1g12040 | AtLRX1 | 1 | 1 | 1 | 1 | ||
At1g49490 | AtPEX2 | 1 | 1 | 1 | 1 | ||
At1g62440 | AtLRX2 | 1 | 1 | 1 | |||
At2g15880 | AtPEX3 | 1 | 1 | 1 | 1 | ||
At2g19780 | 1 | 1 | |||||
At3g19020 | AtPEX1 | 1 | 1 | 1 | 1 | 1 | |
At3g22800 | AtLRX6 | 1 | 1 | 1 | 1 | 1 | |
At3g24480 | AtLRX4 | 1 | 1 | 1 | 1 | ||
At4g06744 | 1 | ||||||
At4g13340 | AtLRX3 | 1 | 1 | 1 | 1 | 1 | |
At4g18670 | AtLRX5 | 1 | 1 | 1 | 1 | ||
At4g29240 | 1 | 1 | |||||
At4g33970 | AtPEX4 | 1 | 1 | 1 | 1 | ||
At5g25550 | AtLRX7 | 1 | 1 | 1 |
Other problems result from different annotations despite the presence of identical functional domains. This is the case for the extensin gene family which comprises 19 members.14 Three examples are given in Table 2, supplementary data. At1g21310 gets the name of its mutant (RSH for Root Shoot Hypocotyl Defective) whereas At1g26240 is annotated as “proline-rich extensin-like family protein” and At1g26250 as “proline-rich extensin, putative.” On the contrary, the same name can be attributed to proteins that belong to different families. The “Pollen Ole e1 allergen and extensin family” contains two types of proteins (Table 2, supplementary data): proteins that only contain the IPR006041 domain (Pollen Ole e1 allergen and extensin), and proteins that contain both this domain and a proline-rich region profile (PS50099). Only the MIPS database names At3g33790 a “putative proline-rich protein.”
Although functional domains are correctly listed in description of proteins, annotation may only take into account a profile with a high probability of occurrence, i.e. of poor significance. This is the case of At2g15770 annotated as “glycine-rich protein” although the score obtained for PS50315 (glycine-rich region) is very low (Table 2, supplementary data). A higher score was obtained for IPR008972 (cupredoxin) and the gene has been annotated as AtEn23 by experts.15 On the contrary, At1g15825 has been annotated as “hydroxyproline-rich glycoprotein family protein” although the only identified structural domain is PS50099 (proline-rich region). A “BLAST 2 sequences” reveals a very poor matching to the true extensin mentioned in databases (At3g19020) (data not shown). Moreover, the prediction for subcellular localization is “cytoplasm” using PSORT with a score of 0.450 and “other” using TargetP with a score of 0.602. A detailed analysis of the protein sequence reveals that at present the only possible annotation is “proline-rich protein”.
The last case will concern a protein that has been annotated as an “extensin-like protein” as inferred from sequence comparison to a protein sequence deduced from a cDNA of Brassica napus (AAK30571) (Fig. 3A). Again, the relevant functional domain is indicated in databases, i.e. IPR003612 (plant lipid transfer protein/seed storage/trypsin-alpha amylase inhibitor). The “BLAST 2 sequences” against AAK30571 gives 94% identities (Fig. 3B). However, since the annotation of the B. napus sequence is wrong, this mistake has been spread over other sequences that are also wrongly annotated.
Building of the ProtAnnDB Dedicated to Collection of Predictions of Sub-Cellular Localization and Functional Domains
The Arabidopsis protein sequences present in ProtAnnDB (http://www.polebio.scsv.ups-tlse.fr/ProtAnnDB/) are taken from the latest version of the Arabidopsis genome annotation (TAIR8, http://www.arabidopsis.org/help/helppages/BLAST_help.jsp#datasets) (32825 proteins).16 ProtAnnDB provides a link to the NCBI reference protein sequences (RefSeq) which are curated sequences (http://www.ncbi.nlm.nih.gov/RefSeq/).17 Amino acid sequences have been run against different software predicting either their sub-cellular localization or functional domains. Different programs were used because each of them has its own specificity and it is necessary to compare the results to increase confidence in the prediction. There are feature-based methods, global sequence properties-based methods, and machine learning methods such as neural networks, hidden Markov models and support vector machines.8
Sub-cellular localization of proteins can be predicted using several programs on line. TargetP, SignalP (http://www.cbs.dtu.dk/services/SignalP/),10 and Predotar (http://urgi.versailles.inra.fr/predotar/predotar.html)18 predict N-terminal targeting sequences. TMHMM (http://www.cbs.dtu.dk/services/TMHMM-2.0/) predicts transmembrane domains. Since all these programs use different algorithms to make predictions, it is necessary to compare the results. There is no such tool available on line apart from Aramemnon (http://aramemnon.botanik.uni-koeln.de/), a database describing plant putative membrane proteins based on the comparison of results from different programs predicting transmembrane spanning domains and membrane-anchoring through GPI anchors, prenylation, and myristoylation.19 Several of these programs were selected in ProtAnnDB: SignalP, TargetP, Predotar, TMHMM and Aramemnon (Fig. 4).
Prediction of functional domains can be achieved using different programs. InterProScan (http://www.ebi.ac.uk/Tools/InterProScan/) collects information from different prediction programs and proposes classification of proteins in superfamilies.20 Pfam (http://pfam.sanger.ac.uk/) comprises 9318 protein families.21 PROSITE (http://www.expasy.org/prosite/) comprises 717 patterns and 795 profiles.22 PANTHER (http://www.pantherdb.org/) classifies genes by their functions, using published experimental evidence and evolutionary relationships. Three of these programs were chosen for building ProtAnnDB: InterProScan, PROSITE and PFAM (Fig. 5).
In all cases, use of several programs based on different rationales is required to improve the quality of the prediction of sub-cellular location or function of a protein. However, some doubtful cases cannot be resolved. Finally, relevant literature, especially if it includes experimental data, is the best source of information. Databases annotated by experts in the field also provide checked information and often allow use of unified nomenclature. This information is available either in research articles or on line. It has been included in ProtAnnDB for 1565 sequences of cell wall-related proteins, such as (i) proteins involved in cell wall biogenesis or metabolism, and (ii) proteins predicted to be at the cell wall/plasma membrane interface like receptor-like kinases (Table 4).
Table 4.
Two different formats have been designed in ProtAnnDB: a tab delimited text format that can be imported in the Microsoft Office Excel format (http://www.microsoft.com/france/office/2007/programs/excel/overview.mspx) (Fig. 6) and a friendly user web interface format for each protein entry (Fig. 7). This presentation allows (i) checking the annotation of each protein separately and (ii) working with several proteins at the same time. In future versions of ProtAnnDB, it is planned to introduce other plant proteins such as those from rice and poplar since their genomes will been soon fully annotated. In addition, it is planned to allow querying the database with keywords related to sub-cellular localization and/or functional domains.
Conclusion
Proteomic studies generate huge amounts of data consisting in lists of genes. Answering biological questions using these results is a great challenge. It is usually done collecting protein annotations from databases. However, protein annotation in databases can be incomplete or misleading since they are mainly inherited from results of sequence comparisons and do not usually take into account the presence of functional domains. Moreover, the available databases do not allow finding predictions of sub-cellular localization of proteins easily. ProtAnnDB has been designed as a new tool to facilitate the processing of proteomic data. Of course, it also allows processing of transcriptomic data which encounters the same type of problem. ProtAnnDB allows collecting results of prediction of sub-cellular localization and functional domains of Arabidopsis proteins by existing bioinformatics software. It should allow improving the biological interpretation of proteomic and transcriptomic data.
Methods
ProtAnnDB contains all the AGI codes included in the TAIR8 database (http://www.arabidopsis.org/help/helppages/BLAST_help.jsp#datasets). Each AGI code is associated with the results of the predictions of sub-cellular localization and functional domains. ProtAnnDB has been filled using Perl scripts by a two-step procedure. As a first step, all the Arabidopsis protein sequences were submitted to the following programs: TargetP, SignalP, TMHMM, and Predotar for sub-cellular localization; PROSITE (ps-scan.pl) for functional domains. InterPro and PFAM domains were imported from the TAIR8_all.domain file available at TAIR (ftp://ftp.arabidopsis.org/home/tair/Proteins/Domains/). Predictions of GPI anchors have been imported from the Aramemnon database. As a second step, all the results were organized in a MySQL database, which can be queried with a friendly user web interface written in PHP.
Supplementary Materials
Acknowledgments
This work was supported by the Université Paul Sabatier in Toulouse and Centre National de la Recherche Scientifique (France).
Footnotes
Authors’ Contributions
HSC made the scripts to build up ProtAnnDB. EJ selected programs available online and provided examples of cell wall proteins misannotated in databases. RPL reviewed literature about plant cell wall proteomic studies. EJ and RPL wrote the manuscript. All authors read and approved the final manuscript.
Disclosure
The authors report no conflicts of interest.
References
- 1.Feiz L, Irshad M, Pont-Lezica RF, et al. Evaluation of cell wall preparations for proteomics: a new procedure for purifying cell walls from Arabidopsis hypocotyls. Plant Methods. 2006;2:10. doi: 10.1186/1746-4811-2-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Irshad M, Canut H, Borderies G, et al. A new picture of cell wall protein dynamics in elongating cells of Arabidopsis: confirmed actors and newcomers. BMC Plant Biology. 2008;8:94. doi: 10.1186/1471-2229-8-94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lee SJ, Saravanan RS, Damasceno CM. Digging deeper into the plant cell wall proteome. Plant Physiol Biochem. 2004;42:979–988. doi: 10.1016/j.plaphy.2004.10.014. [DOI] [PubMed] [Google Scholar]
- 4.Jamet E, Canut H, Boudart G, et al. Cell wall proteins: a new insight through proteomics. Trends in Plant Sci. 2006;11:33–9. doi: 10.1016/j.tplants.2005.11.006. [DOI] [PubMed] [Google Scholar]
- 5.Jamet E, Albenne C, Boudart G, et al. Recent advances in plant cell wall proteomics. Proteomics. 2008;8:893–908. doi: 10.1002/pmic.200700938. [DOI] [PubMed] [Google Scholar]
- 6.Nakai K, Horton P. PSORT: a program for detecting the sorting signals of proteins and predicting their subcellular localization. Trends Biochem Sci. 1999;24:34–5. doi: 10.1016/s0968-0004(98)01336-x. [DOI] [PubMed] [Google Scholar]
- 7.Nielsen H, Engelbrecht J, Brunak S, et al. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Prot Eng. 1997:10. doi: 10.1093/protein/10.1.1. [DOI] [PubMed] [Google Scholar]
- 8.Emanuelsson O, Brunak S, Von Heijne G, et al. Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc. 2007;2:953–71. doi: 10.1038/nprot.2007.131. [DOI] [PubMed] [Google Scholar]
- 9.Slabas AR, Ndimba B, Simon WJ, et al. Proteomic analysis of the Arabidopsis cell wall reveals unexpected proteins with new cellular locations. Biochem Soc Trans. 2004;32:524–528. doi: 10.1042/BST0320524. [DOI] [PubMed] [Google Scholar]
- 10.Bendtsen J, Jensen L, Blom N, et al. Feature-based prediction of nonclassical and leaderless protein secretion. Protein Eng Des Sel. 2004;17:349–56. doi: 10.1093/protein/gzh037. [DOI] [PubMed] [Google Scholar]
- 11.Hadlington J, Denecke J. Sorting of soluble proteins in the secretory pathway of plants. Curr Opin Plant Biol. 2000;3:461–468. doi: 10.1016/s1369-5266(00)00114-x. [DOI] [PubMed] [Google Scholar]
- 12.Altschul SF, Gish W, Miller W, et al. Basic local alignement search tool. J Mol Biol. 1990;215:403–10. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 13.Baumberger N, Doesseger B, Guyot R, et al. Whole-genome comparison of leucine-rich repeat extensins in Arabidopsis and rice. A conserved family of cell wall proteins form a vegetative and a reproductive clade. Plant Physiol. 2003;131:1313–26. doi: 10.1104/pp.102.014928. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Johnson KL, Jones BJ, Schultz CJ, et al. Non enzymic cell wall (glyco)proteins Rose JK.The plant cell wall Boca Raton: Blackwell Publishing; CRC Press 2003111–54. [Google Scholar]
- 15.Nersissian AM, Shipp EL. Blue copper-binding domains. Adv Protein Chem. 2002;60:271–340. doi: 10.1016/s0065-3233(02)60056-7. [DOI] [PubMed] [Google Scholar]
- 16.Reiser L, Rhee S. Using the Arabidopsis Information Resource (TAIR) to find information about Arabidopsis genes. Curr Protoc Bioinformatics. 2007;11 doi: 10.1002/0471250953.bi0111s9. [DOI] [PubMed] [Google Scholar]
- 17.Pruitt K, Tatusova T, Maglott D. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;34 doi: 10.1093/nar/gkl842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Small I, Peeters N, Legeai F, et al. Predotar: A tool for rapidly screen proteomes for N-terminal targeting sequences. Proteomics. 2004;4:1581–90. doi: 10.1002/pmic.200300776. [DOI] [PubMed] [Google Scholar]
- 19.Schwacke R, Schneider A, Van der Graaff E, et al. ARAMEMNON, a novel database for Arabidopsis integral membrane proteins. Plant Physiol. 2003;131:16–26. doi: 10.1104/pp.011577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Quevillon E, Silventoinen V, Pillai S, et al. InterProScan: protein domains identifier. Nucleic Acids Res. 2005;33:W116–120. doi: 10.1093/nar/gki442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Finn R, Tate J, Mistry J, et al. The PFAM protein families dabase. Nucleic Acids Res. 2008;36:D281–D288. doi: 10.1093/nar/gkm960. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Hulo N, Bairoch A, Bulliard V, et al. The 20 years of PROSITE. Nucleic Acids Res. 2008;36:D245–D249. doi: 10.1093/nar/gkm977. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Borderies G, Jamet E, Lafitte C, et al. Proteomics of loosely bound cell wall proteins of Arabidopsis thaliana cell suspension cultures: A critical analysis. Electrophoresis. 2003;24:3421–32. doi: 10.1002/elps.200305608. [DOI] [PubMed] [Google Scholar]
- 24.Oh IS, Park AR, Bae MS, et al. Secretome analysis reveals an Arabidopsis lipase involved in defense against Alternaria brassicicola. Plant Cell. 2005;17:2832–47. doi: 10.1105/tpc.105.034819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Haslam RP, Downie AL, Raventon M, et al. The assessment of enriched apoplastic extracts using proteomic approaches. Ann Appl Biol. 2003;143:81–91. [Google Scholar]
- 26.Boudart G, Jamet E, Rossignol M, et al. Cell wall proteins in apoplastic fluids of Arabidopsis thaliana rosettes: Identification by mass spectrometry and bioinformatics. Proteomics. 2005;5:212–21. doi: 10.1002/pmic.200400882. [DOI] [PubMed] [Google Scholar]
- 27.Chivasa S, Ndimba BK, Simon WJ, et al. Proteomic analysis of the Arabidopsis thaliana cell wall. Electrophoresis. 2002;23:1754–65. doi: 10.1002/1522-2683(200206)23:11<1754::AID-ELPS1754>3.0.CO;2-E. [DOI] [PubMed] [Google Scholar]
- 28.Bayer EM, Bottrill AR, Walshaw J, et al. Arabidopsis cell wall proteome defined using multidimensional protein identification technology. Proteomics. 2006;6:301–11. doi: 10.1002/pmic.200500046. [DOI] [PubMed] [Google Scholar]
- 29.Watson BS, Lei Z, Dixon RA, et al. Proteomics of Medicago sativa cell walls. Phytochemistry. 2004;65:1709–20. doi: 10.1016/j.phytochem.2004.04.026. [DOI] [PubMed] [Google Scholar]
- 30.Reiter W, Vanzin G. Molecular genetics of nucleotide sugar interconversion pathways in pl. Plant Mol Biol. 2001;47:95–113. [PubMed] [Google Scholar]
- 31.Egelund J, Skjot M, Geshi N, et al. A complementary bioinformatics approach to identify potential plant cell wall glycosyltransferaseencoding genes. Plant Physiol. 2004;136:2609–20. doi: 10.1104/pp.104.042978. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Schultz CJ, Rumsewicz MP, Johnson KL, et al. Using genomic resources to guide research directions. The arabinogalactan protein gene family as a test case. Plant Physiol. 2002;129:1448–63. doi: 10.1104/pp.003459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Van Hengel AJ, Roberts K. AtAGP30, an arabinogalactan-protein in the cell walls of the primary root, plays a role in root regeneration and seed germination. Plant J. 2003;36:256–70. doi: 10.1046/j.1365-313x.2003.01874.x. [DOI] [PubMed] [Google Scholar]
- 34.Liu C, Mehdy M. A nonclassical arabinogalactan protein gene highly expressed in vascular tissues, AGP31, is transcriptionally repressed by methyl jasmonic acid in Arabidopsis. Plant Physiol. 2007;145:863–74. doi: 10.1104/pp.107.102657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Roudier F, Fernandez AG, Fujita M, et al. COBRA, an Arabidopsis extracellular glycosyl-phosphatidyl inositol-anchored protein, specifically controls highly anisotropic expansion through its involvement in cellulose microfibril orientation. Plant Cell. 2005;17:1749–63. doi: 10.1105/tpc.105.031732. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Fowler TJ, Bernhardt C, Tierney ML. Characterization and expression of four proline-rich cell wall protein genes in Arabidopsis encoding two distinct subsets of multiple domain proteins. Plant Physiol. 1999;121:1081–92. doi: 10.1104/pp.121.4.1081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Raes J, Rohde A, Christensen JH, et al. Genome-wide characterization of the lignification toolbox in Arabidopsis. Plant Physiol. 2003;133:1051–71. doi: 10.1104/pp.103.026484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Pourcel L, Routaboul JM, Kerhoas L, et al. TRANSPARENT TESTA10 encodes a laccase-like enzyme involved in oxidative polymerization of flavonoids in Arabidopsis seed coat. Plant Cell. 2005;17:2966–80. doi: 10.1105/tpc.105.035154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.McCaig BC, Meagher RB, Dean JF. Gene structure and molecular analysis of the laccase-like multicopper oxidase (LMCO) gene family in Arabidopsis thaliana. Planta. 2005;221:619–36. doi: 10.1007/s00425-004-1472-6. [DOI] [PubMed] [Google Scholar]
- 40.Jacobs J, Roe JL. SKS6, a multicopper oxidase-like gene, participates in cotyledon vascular patterning during Arabidopsis thaliana development. Planta. 2005;222:652–66. doi: 10.1007/s00425-005-0012-3. [DOI] [PubMed] [Google Scholar]
- 41.Shiu SH, Bleecker AB. Receptor-like kinases from Arabidopsis form a monophyletic gene family related to animal receptor kinases. Proc Natl Acad Sci U S A. 2001;98:10763–68. doi: 10.1073/pnas.181141598. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.