Abstract
Plant cell wall (CW) synthesizing enzymes can be divided into the glycan (i.e. cellulose and callose) synthases, which are multimembrane spanning proteins located at the plasma membrane, and the glycosyltransferases (GTs), which are Golgi localized single membrane spanning proteins, believed to participate in the synthesis of hemicellulose, pectin, mannans, and various glycoproteins. At the Carbohydrate-Active enZYmes (CAZy) database where e.g. glucoside hydrolases and GTs are classified into gene families primarily based on amino acid sequence similarities, 415 Arabidopsis GTs have been classified. Although much is known with regard to composition and fine structures of the plant CW, only a handful of CW biosynthetic GT genes—all classified in the CAZy system—have been characterized. In an effort to identify CW GTs that have not yet been classified in the CAZy database, a simple bioinformatics approach was adopted. First, the entire Arabidopsis proteome was run through the Transmembrane Hidden Markov Model 2.0 server and proteins containing one or, more rarely, two transmembrane domains within the N-terminal 150 amino acids were collected. Second, these sequences were submitted to the SUPERFAMILY prediction server, and sequences that were predicted to belong to the superfamilies NDP-sugartransferase, UDP-glycosyltransferase/glucogen-phosphorylase, carbohydrate-binding domain, Gal-binding domain, or Rossman fold were collected, yielding a total of 191 sequences. Fifty-two accessions already classified in CAZy were discarded. The resulting 139 sequences were then analyzed using the Three-Dimensional-Position-Specific Scoring Matrix and mGenTHREADER servers, and 27 sequences with similarity to either the GT-A or the GT-B fold were obtained. Proof of concept of the present approach has to some extent been provided by our recent demonstration that two members of this pool of 27 non-CAZy-classified putative GTs are xylosyltransferases involved in synthesis of pectin rhamnogalacturonan II (J. Egelund, B.L. Petersen, A. Faik, M.S. Motawia, C.E. Olsen, T. Ishii, H. Clausen, P. Ulvskov, and N. Geshi, unpublished data).
The plant cell wall (CW) consists of four major polysaccharide components, namely cellulose, callose, hemicellulose, and pectin. CW synthesis/formation can be divided into three major steps. (1) Initially, the various building blocks in the form of activated glycosyl residues (NDP-sugars) are synthesized via two different pathways—the nucleotide interconversion pathway or the salvage pathway (for overview, see Carpita, 1996). The synthesis of the NDP-sugars may occur in the cytosol and/or the Golgi apparatus depending on the type of NDP-sugar synthesized (Mohnen, 1999). (2) The synthesized nucleotide sugars are then assembled into higher-order polysaccharide structures. Apart from cellulose and callose, biosynthesis of CW polysaccharides occurs in the endomembrane system (Bolwell and Northcote, 1983; Zhang and Staehelin, 1992; Sherrier and VandenBosch, 1994), from which the polysaccharides are secreted into the wall where they undergo further modifications (Fry, 1995). (3) The final step, which constitutes the assembly of the various polysaccharide structures into the wall, remains in large part a mystery. However, self-assembly of wall components most likely plays a role (for discussion of a possible mechanism, see MacDougal et al., 1997), and both enzymatic and nonenzymatic mechanisms as well as arabinogalactan proteins and other wall structural proteins (Cosgrove, 1997) participate in the complex process.
The noncellulosic polymers hemicellulose and pectin are synthesized by glycosyltransferases (GTs) presumably located in the different compartments of the Golgi apparatus. These GTs are believed to be type II membrane-bound proteins with the catalytic domain (C-terminal) facing the lumen of the Golgi apparatus (Ridley et al., 2001; Sterling et al., 2001; Geshi et al., 2004).
Although the GTs, for which the three-dimensional (3D) structures have been resolved, exhibit insignificant or at the best very low sequence similarity, they adopt one of the following folds at the 3D-structure level: the GT-A (SpsA and SpsA-like) fold or the GT-B (B-GT and B-GT-like) fold (Bourne and Henrissat, 2001; Hu and Walker, 2002; Coutinho et al., 2003; Wimmerova et al., 2003).
The Carbohydrate-Active enZYme (CAZy; http://afmb.cnrs-mrs.fr/CAZY/) database is a versatile and comprehensive database of sequence-based carbohydrate enzymes, where e.g. glucoside hydrolases and GTs are classified into families primarily based on amino acid sequence similarities (Henrissat et al., 2001). Within a given family, the 3D structure is conserved, i.e. the same 3D fold is expected to occur in each family (Coutinho et al., 2003).
Although composition of the major CW polysaccharides is reasonably well described (Carpita et al., 2001), only a handful of the biosynthetic genes have been identified. All of the seven known GTs, i.e. with proven or putative function in mannan (Edwards et al., 1999), hemicellulose (Perrin et al., 1999; Faik et al., 2002; Madson et al., 2003), and pectin synthesis (Bouton et al., 2002; Iwai et al., 2002; J. Sterling and D. Mohnen, personal communication), are classified in the CAZy database. In this study, we have set up an alternative bioinformatics scheme aimed at identifying CW GTs with a predicted type II membrane topology, which are not classified in the CAZy database. Using this alternative approach, 27 non-CAZy classified accessions with a predicted N-terminal transmembrane domain (TMD) typical of type II membrane proteins and that were predicted to adopt the GT-A or the GT-B fold were identified.
RESULTS
In an effort to obtain GTs with a type II membrane protein topology, which have not been classified in the CAZy database, the following simple bioinformatics approach was adopted (for overview, see Fig. 1).
Figure 1.
Flow chart of the bioinformatics approach used to identify 27 putative GTs not classified in the CAZy database. Web sites for the various servers used in this study are listed in “Materials and Methods.”
First, using the Transmembrane Hidden Markov Model (TMHMM) 2.0 prediction server, the entire Arabidopsis proteome (26,095 proteins) was scanned for the presence of transmembrane helices, yielding a total of 5,977 sequences with any number of predicted transmembrane helices. Within this pool, potential type II membrane proteins with either one or, in rare cases, two (derived from the predicted transmembrane helix and a hydrophobic signal peptide) predicted TMDs, which resided within the first 150 amino acids from the N terminus, were identified and extracted, yielding a total of 2,248 and 363 accessions, respectively.
The 2,248 plus 363 sequences were then submitted to the SUPERFAMILY prediction server, and 191 sequences predicted, indiscriminately of E-value, to belong to the superfamilies NDP-sugartransferase (54), UDP-glycosyltransferase/glucogen-phosphorylase (33), Gal-binding domain (23), carbohydrate-binding domain (6), or the GT-B-similar Rossman fold (75) were collected. The 191 sequences were then blasted against the CAZy database (September 10, 2003), and sequences found in the CAZy database were removed from the dataset, leaving a total of 139 sequences (24, 25, 12, 5, and 73 from the 5 superfamilies, respectively), which were not classified in the CAZy database. The 139 sequences were then run through the mGenTHREADER and 3D-Position-Specific Scoring Matrix (3D-PSSM) servers, respectively. A local set of protein IDs (Protein Data Bank [PDB]) of proteins, whose 3D structures have been resolved and which adopt either the GT-A or the GT-B fold (Table I; references for the PDB IDs can be retrieved at http://www.RCSB.ORG/), and resolved 3D structures derived from the CAZy database GT families were used to validate the output from each of the two servers. Twenty-seven of the 139 sequences (Table II) displayed similarity to one or more of the entries in Table I, i.e. the proteins predicted to adopt either the GT-A or the GT-B fold. Recently, two highly identical accessions (Q9ZSJ2 and Q9ZSJ0; Table II; Fig. 2B) were shown to be CW-specific xylosyltransferases (J. Egelund, B.L. Petersen, A. Faik, M.S. Motawia, C.E. Olsen, T. Ishii, H. Clausen, P. Ulvskov, and N. Geshi, unpublished data), corroborating that the adopted bioinformatics strategy identifies GTs related to CW biosynthesis.
Table I.
List of PDB IDs used to screen the result of the secondary structure prediction servers mGenTHREADER and 3D-PSSM
| PDB ID | Origin | Enzyme | Function |
|---|---|---|---|
| GT-A | |||
| 1ABB | Rabbit | Glycogen phosphorylase | Glycogen biosynthesis |
| 1EM6 | Human | Glycogen phosphorylase | Glycogen biosynthesis |
| 1eyr | N. meningitidis | Selenomethionyl cytosine-5′-monophosphate-acylneuraminate synthetase | Activation of sialic acid |
| 1fgg | Human | Glucuronyltransferase l | Heparan/chondroitin sulfate biosynthesis |
| 1foa | Rabbit | N-Acetylglucosaminyltransferase l | Decoration of glycoproteins |
| 1fr8 | Bovine | β-1,4-Galactosyltransferase | Decoration of glycoproteins |
| 1frw | E. coli | MobA | Molybdopterin guanine dinucleotide biosynthesis |
| 1g0r | Pseudomonas aeruginosa | Glucose-1-phosphate thymidylyltransferase | Bacterial cell wall biosynthesis |
| 1g93 | Bovine | α-1,3-Galactosyltransferase | Decoration of glycoproteins |
| 1g97 | Streptococcus pneumoniae | N-Acetylglucosamine-1-phosphate uridyltransferase | Synthesis of UDP-N-acetylglucosamine |
| 1ga8 | N. meningitidis | Lipopolysaccharide galactosyltransferase | Lipooligosaccharide biosynthesis |
| 1GA8 | N. meningitidis | Lipopolysaccharide galactosyltransferase implicated in | Lipooligosaccharide biosynthesis |
| 1GZ5 | E. coli | Trehalose phosphate synthase | Trehalose biosynthesis |
| 1h7g | E. coli | 3-Deoxy-manno-octulosonate cytidylyltransferase | Lipopolysaccharide biosynthesis |
| 1ini | E. coli | 4-Diphosphocytidyl-2-C-methylerythritol | Isoprenoid biosynthesis |
| 1j94 | Mouse | β-1,4-Galactosyltransferase | Lactose biosynthesis |
| 1ll2 | Rabbit | Glycogenin glucosyltransferase | Glycogen biosynthesis |
| 1Iz0 | Human | Glycosyltransferase A | Blood group biosynthesis |
| 1LZJ | Human | α-1→3Galactosyltransferase | Blood group biosynthesis |
| 1OMX | Mouse | α-1,4-N-Acetylhexosaminyltransferase | Heparan sulfate biosynthesis |
| 1qgq | Bacillus subtilis | NDP-sugartransferase | Synthesis of spore coat |
| 1YGP | Saccharomyces cerevisiae | Glycogen phosphorylase | Glycogen biosynthesis |
| GT-B | |||
| 1BGT | Bacteriophage T4 | β-Glucosyltransferase | Nucleotide synthesis |
| 1c3j | Bacteriophage T4 | β-Glucosyltransferase | Nucleotide synthesis |
| 1f0k | E. coli | Pyrophosphoryl-undecaprenol N-acetylglucosamine transferase | Peptidoglycan biosynthesis |
| 1f6d | E. coli | Udp-N-acetylglucosamine 2-epimerase | UDP-N-acetylglucosamine biosynthesis |
| 1FGG | Human | Glucuronyltransferase l | Heparan/chondroitin sulfate synthesis |
| 1FGX | Bovine | β-1,4-Galactosyltransferase | Glycoprotein and glycosphingolipid synthesis |
| 1FO9 | Rabbit | N-Acetylglucosaminyltransferase l | Decoration of glycoproteins |
| 1h5u | Rabbit | Glycogen phosphorylase | Glycogen biosynthesis |
| 1iir | Amycolatopsis orientalis | Udp-glucosyltransferase Gtfb | Synthesis of the Vancomycin group of antibiotics |
| 1NLM | E. coli | Pyrophosphoryl-undecaprenol N-acetylglucosamine transferase | Bacterial cell wall synthesis |
| 1PN3 | A. orientalis | Tdp-Epi-Vancosaminyltransferase Gtfa | Synthesis of the Vancomycin group of antibiotics |
| 1QG8 | B. subtilis | Nucleotide-diphospho-sugartransferase | Spore coat synthesis |
| 1QKJ | Bacteriophage T4 | β-Glucosyltransferase | Nucleotide synthesis |
| 1qm5 | E. coli | Maltodextrin phosphorylase | Phosphorolysis of maltodextrin |
The PDB IDs were obtained from Wimmerova et al. (2003; lowercase) as well as manually from the CAZy database. GT families (uppercase; only one PDB ID per family). Origins and functions of the enzymes were obtained from the PDB (http://www.pdb.mdc-berlin.de/pdb/; Berman et al., 2000).
Table II.
The 27 putative GTs identified as a result of filtering the Arabidopsis proteome through the TMHMM, SUPERFAMILY, mGenTHREADER, and 3D-PSSM servers as illustrated in Figure 1
| TrEMBL Protein ID | SuperFa | Best Fit to Known GT Foldb | E-Value
|
BLAST (NCBI) | TMD | |
|---|---|---|---|---|---|---|
| 3D-PSSM | mGenTHREADER | |||||
| Q9LZ77 | U | GT-B | 0.0017 | 0.001 | Plant and bacteria | 51-70 |
| Q9M147 | U | GT-B | 0.0112 | 0.001 | Plant and bacteria | 44-66 |
| Q9FMW3 | U | GT-B | Not found | 0.023 | Plant | 53-72 |
| Q9LU22 | U | GT-B | Not found | 0.068 | Plant and animal | 27-49 |
| Q9C9Z9 | N | GT-B | Not found | 0.022 | Plant and animal | 27-49 |
| O81786 | R | GT-B | Not found | 0.979 | Plant and animal | 45-62 |
| Q9C920 | N | GT-A | 3.56e−08 | 0.005 | Plant | 13-35 |
| Q9LTZ5 | N | GT-A | 8.51e−05 | 0.004 | Plant and bacteria | 21-42 |
| Q9FM26 | N | GT-A | 0.0173 | 0.005 | Plant and bacteria | 21-43 |
| O04568 | N | GT-A | 0.00930 | 0.009 | Plant and bacteria | 22-44 |
| Q9FXA7 | N | GT-A | 0.606 | 0.112 | Plant | 21-40 |
| Q9C9Q6 | N | GT-A | 0.063 | 0.017 | Plant | 13-35 |
| Q9C9Q5 | N | GT-A | 0.0993 | 0.022 | Plant | 13-35 |
| Q9FMN8 | N | GT-A | 0.115 | 0.015 | Plant | 26-45 |
| Q9ZSJ2 | N | GT-A | 0.271 | 0.037 | Plant | 36-55 |
| Q9FF50 | N | GT-A | 0.278 | 0.013 | Plant | 38-60 |
| Q9M146 | N | GT-A | 0.0355 | 0.059 | Plant | 42-61 |
| Q9SZU2 | N | GT-A | 0.0355 | 0.012 | Plant | 44-66 |
| Q9ZSJ0 | N | GT-A | 0.174 | 0.111 | Plant | 30-52 |
| Q9SAD6 | N | GT-A | Not found | 0,046 | Plant | 20-42 |
| Q9LKU7 | R | GT-A | Not found | 1.030 | Plant | 23-45 |
| Q9LQS0 | R | GT-A | Not found | 0.774 | Plant | 45-67 |
| Q9LYF7 | R | GT-A | 4.11 | Not found | Plant | 30-52 |
| Q9LU27 | R | GT-A | 3.51 | Not found | Plant | 31-53 |
| Q9T0G0 | R | GT-A | 0.660 | Not found | Plant and bacteria | 7-29 |
| 2TMD Proteins
| ||||||
| Q9XEE9 | U | GT-A | 2.73e−05 | 0.0008 | Plant and animal | 4-26 and 113-135 |
| Q9ZU10 | R | GT-B | Not found | 0.551 | Plant and animal | 5-27 and 47-69 |
| TrEMBL Protein ID | Length Amino Acids | SignalP | Pfamc
|
DxD in Hydrophobic Pocketd | Isoxaben Array Up/Down-Regulated | EST | |
|---|---|---|---|---|---|---|---|
| Domain | E-Value | ||||||
| Q9LZ77 | 1091 | Nonsecretory | GT1 | 0.00017 | Yes | +37% | No |
| Q9M147 | 963 | Signal anchor | GT1 | 3.9 × 10−7 | Yes | +35% | Yes |
| Q9FMW3 | 559 | Signal anchor | None | Not found | Not found | −22% | No |
| Q9LU22 | 419 | Signal anchor | None | Not found | Yes | −2% | Yes |
| Q9C9Z9 | 533 | Signal peptide | None | Not found | Not found | −4% | Yes |
| O81786 | 204 | Signal anchor | None | Not found | Not found | −18% | Yes |
| Q9C920 | 290 | Signal peptide | CTP-GT | 2.9 × 10−63 | Yes | +52% | Yes |
| Q9LTZ5 | 582 | Signal anchor | GT2 | 0.0016 | Yes | +483% | Yes |
| Q9FM26 | 583 | Signal anchor | GT2 | 0.0088 | Yes | Not found | Yes |
| O04568 | 516 | Signal anchor | None | Not found | Yes | −38% | Yes |
| Q9FXA7 | 383 | Signal anchor | None | Not found | Yes | Not found | Yes |
| Q9C9Q6 | 402 | Signal anchor | GT8 | 0.017 | Yes | Not found | Yes |
| Q9C9Q5 | 428 | Signal anchor | GT8 | 0.09 | Yes | −1% | Yes |
| Q9FMN8 | 624 | Signal anchor | None | Not found | Yes | +44% | Yes |
| Q9ZSJ2 | 361 | Signal anchor | None | Not found | Yes | Not found | No |
| Q9FF50 | 932 | Signal anchor | GT2 | Not found | Yes | Not found | Yes |
| Q9M146 | 360 | Signal anchor | None | Not found | Yes | +59% | Yes |
| Q9SZU2 | 588 | Signal anchor | GT2 | 0.011 | Yes | Not found | No |
| Q9ZSJ0 | 367 | Signal anchor | GT2 | 0.011 | Yes | −64% | Yes |
| Q9SAD6 | 371 | Signal anchor | Chemotaxis phosphatase | 0.075 | Yes | −62% | Yes |
| Q9LKU7 | 156 | Nonsecretory | Zinc finger domain | 0.00044 | Not found | −25% | No |
| Q9LQS0 | 118 | Nonsecretory | None | Not found | Not found | Not found | No |
| Q9LYF7 | 386 | Signal anchor | None | Not found | Not found | −20% | Yes |
| Q9LU27 | 384 | Signal anchor | None | Not found | Yes | −28% | No |
| Q9T0G0 | 389 | Signal peptide | Dehydrogenase | 3.7 × 10−21 | Not found | +2% | Yes |
| 2TMD Proteins
| |||||||
| 474 | Signal peptide | GT1 | 2.1 × 10−19 | Yes | −22% | Yes | |
| 200 | Signal peptide | None | Not found | Not found | Not found | Yes | |
Web sites for the various servers used in this study are listed in “Material and Methods.”
N, NDP-sugartransferases; R, NAD(P)-binding Rossmann-fold domains; U, UDP-glycosyltransferase/glycogen phosphorylase; G, galactose-binding domain-like.
3D-PSSM and/or mGenTHREADER.
Hits only shown for E-values < 0.1.
HCA analysis.
Figure 2.
Phylogenetic tree of the 27 putative GTs. Four distinct homologous groups (A–D) consisting of two to six sequences were identified in the analysis.
Filtering of the Arabidopsis Proteome
Choice of servers, strategies applied, and theoretical and practical considerations of the filtering process are described sequentially below.
Filter I: Identification of Potential Type II Membrane Proteins
In two comparative tests, TMHMM 2.0 (Krogh et al., 2001) was found to be the best of the tested prediction servers measured as having the lowest fraction of the sum of false positives and false negatives within the total number of the experimentally assigned transmembrane helices (TMH) segments ([false positives + false negatives]/no. of TMH) used in the tests (Schwacke et al., 2003; Zhou and Zhou, 2003). TMHMM 2.0 was chosen as the initial filter because of its reliable and somewhat conservative prediction strategy and because this server supports batch submissions of up to 4,000 accessions.
Filters II and III: Identification of Accessions in GT-Containing Superfamilies
The SUPERFAMILY database server was chosen as the next filter because this facility incorporates an alternative approach to the seed-based PSI-blast approach used in the CAZy database classification scheme and supports batch submissions of up to 20 accessions. The SUPERFAMILY database contains a library of hidden Markov models (HMMs) representing all proteins of known structure (Gough et al., 2001; Gough and Chothia, 2002). The SUPERFAMILY facility is based on the Structural Classification of Proteins (SCOP) protein domain classification database, which in turn is based on multiple sequence alignments designed to represent a protein family in a structural domain-based hierarchical classification scheme with several levels, including the superfamily level (Murzin et al., 1995).
Filter IV: Identification of Putative GTs within GT-Containing Superfamilies
mGenTHREADER is based upon a multilayered neural network that was trained to combine sequence alignment score, length information, and energy potentials with PSI-BLAST searches, which have been jumpstarted with structural alignment profiles from Fold Secondary Structure Prediction, PSI-BLAST profile, and predicted secondary structure (PSIPRED), predicted secondary structure, and bidirectional scoring in order to calculate the final alignment score (Jones, 1999; McGuffin and Jones, 2003).
3D-PSSM constitutes a method for protein fold recognition using one-dimensional (1D) and 3D sequence profiles coupled with secondary structure and solvation potential information (Kelley et al., 2000).
The output of the sequence-based SUPERFAMILY server was evaluated by the sophisticated mGenTHREADER (multilayered neural network) and the 3D-PSSM servers, which by operating at the fold level in addition to 1D sequence information incorporates 3D structural information, solvation potential, etc. (see also above). The difference in the number of accessions pre-filter IV and post-filter IV (139 and 27, respectively) indicate that a major fraction of the 139 accessions, predicted by the SUPERFAMILY server to belong to polysaccharide or CW relevant superfamilies, were most likely non-GT proteins, as e.g. evidenced by accessions containing an unusually high number of Pro and Ser residues (>50% of the total amino acid residues) or by proteins with an estimated molecular mass <20 kD. The 139 non-CAZy classified accessions resulting from the SUPERFAMILY filtering and BLAST searches against the local CAZy database are available as supplemental data (available at www.plantphysiol.org).
Elimination of False, For Example, Non-GT, Hits
The ability of the filtering process to eliminate accessions that encode enzymes that do have NDP-sugars as substrate but are non-GTs were examined by applying the sequential filtering procedure to two quite different proteins: (1) a putative membrane-bound Arabidopsis protein involved in synthesis of UDP-d-Xyl that in plants is incorporated in glycoproteins and CW polysaccharides, including xyloglucan (XG), and (2) an Escherichia coli protein catalyzing the epimerization of UDP-N-acetyl-d-glucosamine to UDP-N-acetyl-d-mannosamine involved in bacterial lipopolysaccharide biosynthesis. The Arabidopsis UDP-glucuronic acid decarboxylase (ATUXS2, At3g62830) is predicted to adopt a typical type II membrane protein topology and thought to be located in the Golgi apparatus (Harper and Bar-Peled, 2002). When used as a negative control, ATUXS2 passes filter II-III (Rossman fold, non-CAZy entry) but fails to pass filter IV, i.e. ATUXS2 do not adopt a GT-A or a GT-B fold structure. Furthermore, a DxD motive (as described below) is not found in ATUXS2. Whereas the E. coli UDP-N-acetyl-d-glucosamine 2-epimerase (Kiino et al. 1993; P27828) as expected do not adopt a typical type II membrane protein structure when run through the TMHMM version 2.0 server (Filter I), it is predicted to belong to the UDP-glycosyltransferase/glycogen phosphorylase superfamily by the SUPERFAMILY prediction server and is predicted to adopt a GT-B fold by mGenTHREADER. However, a DxD motif as described below is not found.
When the six known plant CW GTs were run through the 3D-PSSM and mGenTHREADER servers, the galactomannan-specific α(1-6)galactosyltransferase and the XG-specific α(1-6)xylosytransferase were predicted to adopt the GT-B and the GT-A fold, respectively, although both proteins are classified in CAZy family GT34 (Table III). However, as indicated by the poor E-values, discrimination between the GT-A and GT-B fold was not feasible. In this respect it should be noted that plants in general synthesize a number of plant-specific CW polymers (not found in any other kingdom). The uniqueness of such structures may be reflected in the structure of the biosynthetic enzymes, and these may thus not be clearly related to GTs of organisms from other kingdoms. None of the quite few characterized plant CW GTs have had their 3D structure resolved. It is in this context that servers like mGenTHREADER and 3D-PSSM, which besides sequence similarity use various parameters such as 3D information (see above) in their prediction strategy, were chosen as validation tools. The prediction ability of the various servers will undoubtedly improve as new plant CW GTs are identified and structurally analyzed. In summary, the filtering process applied here was quite efficient in eliminating evident types of false positives. The pool of 27 accessions may still comprise non-GT accessions and was thus subjected to a post-filtering analysis.
Table III.
Known type II CW GTs and their classification in the CAZy database
| GT Function | TrEMBL Protein ID | CAZy Family | SuperFa | Best Fit to Known GT-foldb | E-Value
|
Blast (NCBI) | |
|---|---|---|---|---|---|---|---|
| 3D-PSSM | mGenTHREADER | ||||||
| α(1-6)-d-xylT | Q9LZJ3 | GT-34 | None | GT-B | 5.08 | 0.187 | Plant and bacteria |
| α(1-6)-d-galT | Q9ST56 | GT-34 | None | GT-A | 0.934 | 0.152 | Plant and bacteria |
| α(1-2)-l-fucT | Q9LJK1 | GT-37 | None | GT-B | 4.54 | Not found | Plant and animal |
| Quasimodo | Q9LSG3 | GT-8 | N | GT-A | 1.08 × 10−12 | 0.009 | Plant and animal |
| β(1-2)-d-galT | Q9LVB4 | GT-47 | N | None | Not found | Not found | Plant |
| β(1-4)-d-glcAT | Q8GSQ4 | GT-47 | N | GT-B | 5.04 | 0.090 | Plant and animal |
| TMD | Length Amino Acids | SignalP | Pfamc
|
DxD in Hydrophobic Pocketd | Isoxaben Array | |
|---|---|---|---|---|---|---|
| Domain | E-Value | |||||
| 21-40 | 460 | Signal anchor | GT-34 | 1.1 × 10−121 | Yes | Down 38% |
| 13-35 | 435 | Signal anchor | GT-34 | 1.0 × 10−137 | Yes | Not found |
| None | 501 | Signal anchor | GT-10 | 1.6 × 10−17 | Yes | Not found |
| 21-43 | 599 | Signal anchor | GT-8 | 1.4 × 10−116 | Yes | Up 118% |
| None | 549 | Nonsecretory | Exotosin (GT-47) | 1.2 × 10−92 | Not found | Down 55% |
| None | 334 | Nonsecretory | Exotosin (GT-47) | 2.3 × 10−31 | Yes | Down 42% |
The glycosyltransferases listed are all from Arabidopsis. From the top, α(1-6)-d-xylose transferase, transfers d-xylose on to the β(1-4)glucan chains of xyloglucan (Faik et al., 2002); α(1-6)-d-galactose transferase, transfers d-galactose on to the β(1-4)mannan backbone of galactomannan (Edwards et al., 1999); α(1-2)-l-fucose transferase, transfers the terminal l-fucose on to the galactosyl residue of the xyloglucan sidechain (Perrin et al., 1999); Quasimodo, involved in pectin biosynthesis* (Bouton et al., 2002); β(1-2)-d-galactose transferase, transfers d-galactose on to the α(1,6)-linked xylose in xyloglucan (Madson et al., 2003); β(1-4)-d-glucuronosyl transferase, transfers d-glucuronic acid on to α(1-4)-linked Fucose in RG II (Iwai et al., 2002)*.
Function likely—activity not unequivocally demonstrated.
N, NDP-sugartransferases.
3D-PSSM and/or mGenTHREADER.
Hits only shown for E-values < 0.1.
HCA analysis (data not shown).
Post-Filtering Evaluation of the 27 Putative GTs
Homologous Sequences within and outside of the Plant Kingdom
The presence of homologous sequences was investigated by subjecting the putative GTs to global protein-protein BLAST (blastp). The majority of the searches (22 out of 27) gave rise to plant-specific or plant- and bacteria-specific hits (Table II). As e.g. mycobacterial CWs contain plant CW-like polysaccharides, e.g. arabinogalactans (Crick et al., 2001), bacterial hits may not contradict function in plant CW synthesis. When the 27 sequences were blasted against the Arabidopsis expressed sequence tag (EST) database, 21 were represented by an EST (Table II).
Phylogeny
As a consequence of the adopted overall bioinformatics approach used in this study, significant similarity throughout the 27 accessions was not anticipated. Alignment of the 27 putative GTs, however, identified four groups of clustered homologous genes, denominated A, B, C, and D, of which groups B and C (Fig. 2) display a high degree of conservation in stretches of at least 20 to 80 amino acids (data not shown). The rest of the 27 accessions constituted a heterogeneous group with extremely low or insignificant sequence identity.
The genes in group B (Q9C9Q6, Q9C9Q5, Q9FXA7, Q9M146, Q9ZSJ2, and Q9ZSJ0; Table II; Fig. 2) fall into two distinct subgroups consisting of highly identical group members (subgroup I: four sequences with 73%, 75%, and 90% identity; subgroup II: two sequences with 72% identity) but with only 11% identity between the two subgroups. The two highly identical accessions (Q9ZSJ2 and Q9ZSJ0; Table II; Fig. 2B) are the xylosyltransferases mentioned above. Genes in group C are approximately 550 amino acids long. Aside from the four GTs (accession nos. Q9LZ77, Q9M147, Q9C920, and Q9XEE9 ; Table II; Fig. 2C), which display significant similarity to CAZy GT-family-1 and CTP-GTs, similarity for the rest of the 27 sequences to other GTs with known function (plant or non-plant) was extremely weak or nonexisting.
Prediction of Subcellular Localization
For 24 of the 27 putative GTs, the SignalP server predicted a signal anchor or signal peptide in or close to the TMD (data not shown; Table II). When the six GTs of group B (Fig. 2) were run through the TargetP server (Krogh et al., 2001), a reliable prediction of their subcellular location could not be achieved. Similar results were obtained when the six GTs with known function in CW synthesis (Edwards et al., 1999; Perrin et al., 1999; Bouton et al., 2002; Faik et al., 2002; Iwai et al., 2002; Madson et al., 2003; Table III) were run through these servers (data not shown). Although localization of these CW GTs has not been proven, sufficient evidence exists for Golgi as the place of synthesis of at least the major building blocks of the CW (for review, see Mohnen, 1999). Thus, the TargetP prediction server is not able to generate a reliable prediction of the localization of the plant CW GTs.
Expression Data
Recently, sets of CW-specific microarray data derived from suspension cultured cells, which had been exposed to the herbicide N-[3-(1-ethyl-1-methylpropyl)-1,2-oxazol-5-yl]-2,6-dimethoxybenzamide (isoxaben), have been made public (see “Materials and Methods”). Isoxaben specifically inhibits cellulose synthesis (Scheible et al., 2001), and plants adapted to grow in isoxaben compensate for the almost complete loss of the cellulose-XG load bearing structure by construction of walls made predominantly of pectin (Schedletzky et al., 1990; Encina et al., 2002). The array-derived expression data for the accessions Q9C920, Q9LTZ5 (and the highly homologous Q9FM26 not present in the array dataset), and Q9M146, which in this experiment are up-regulated 52%, 483%, and 59%, respectively, might indicate function in pectin biosynthesis. However, the lack of confirmed pectin biosynthetic GTs and expression pattern (spatial and temporal) of each putative GT should be taken into consideration when interpreting the array data.
HCA Analysis
Hydrophobic cluster analysis (HCA) identified a putative sugar-nucleotide-binding domain—the so-called DxD motif (Breton et al., 1998, 2001; Wiggins and Munro, 1998; Costa et al., 2002; Stolz and Munro, 2002)—or a degeneration of this motif [DE]-X-[DE] (compare with Tarbouriech et al., 2001) surrounded by stretches of hydrophobic amino acids in 19 of the 27 sequences as exemplified in Figure 3. It should be stressed, however, that the occurrence of a DxD motif alone is not diagnostic of a GT function (Gastinel, 2001; Coutinho et al., 2003). Parsing of the 27 sequences through the Multiple EM for Motif Elicitation (MEME) version 3.0 server identified the putative sugar-nucleotide-binding DxD motif in both subgroups of group B (Fig. 4). Genes in group C share common overall features with the group B genes, e.g. varying sequence identity (69%, 35%, and 27%) and a DxD motif flanked by hydrophobic stretches situated approximately in the middle of the proteins (Table II; Fig. 3).
Figure 3.
HCA analysis showing the DxD motif within a pocket of hydrophobic amino acids. The protein sequences are represented on a duplicated α-helical net, and the clusters of contiguous hydrophobic residues (V, I, L, F, M, W, and Y) are boxed. The actual assessments of the individual HCA plots were done manually in order to identify similarities between the sequences. The one-letter code is used for amino acids except for Gly, Pro, Ser, and Thr, which are represented by symbols. Vertical lines delineate hydrophobic pockets in which the DxD motif (boxed in black) can be found. Three well-known GTs were used in the analysis. A, α(1-4)galactosyltransferase LgtC (Neisseria meningitides, TrEMBL accession no. Q8KHJ3); Persson et al. (2001). B, β(1–4)galactosyltransferase (Homo sapiens, TrEMBL accession no. Q9UBX8); Amando et al. (1999). C, Quasimodo—putatively involved in pectin biosynthesis (Arabidopsis, TrEMBL accession no. Q94BZ8); Bouton et al. (2002) and representatives from the 27 putative GTs, containing an identifiable DxD motif within a hydrophobic pocket: D, Q9ZSJ2; E, Q9M146; F, Q9C9Q5; G, A9LTZ5; H, Q9FF50; I, O045498.
Figure 4.
Conserved region of the B group accessions (compare with phylogenetic analysis Fig. 2) identified by the MEME server showing the putative DxD motif possibly involved in binding of the nucleotide sugar.
When run through the Pfam server, for 10 of the accessions a tentative CAZy GT family relationship (GT1, GT2, or GT8) could be assigned although the prediction power (E-value) in most cases was relatively poor (Table II).
Accessions Q9SAD6, Q9LKU7, Q9T0G0, O23479, Q9C990, and Q9C991 were predicted to contain other domains also with varying prediction power. Of these, a putative DxD motif as defined above could not be identified for the accessions Q9LKU7, Q9T0G0, and O23479, perhaps suggesting that the proposed GT function for at least these accessions should be considered carefully. The Pfam server is based on seed alignments, including also consensus alignment sequences of the various CAZy families. The relatively low number (10) of tentative CAZy GT family relationship assignments may be due to the Pfam/CAZy sequence-based prediction strategies versus the prediction strategies used by the mGenTHREADER and 3D-PSSM servers.
DISCUSSION
In this study, we have identified 27 putative Arabidopsis GTs, which are not classified in the CAZy database. The 27 accessions have been selected as putative GTs, being typical of Golgi localized type II membrane proteins and characterized using various prediction servers, HCA analysis, and CW-specific array datasets. Recent proof of concept of the strategy used in this study has to some extent been obtained as functions in CW biosynthesis for two GT members of the phylogenetically distinct group B (Fig. 2) were established.
Although the topology of noncellulosic backbone synthesizing enzymes remains an open question, it is tempting to suggest that the enzymes responsible for e.g. the synthesis of the α(1-4)-linked homogalacturonan backbone or the β(1-4)glucan backbone of XG might resemble the multimembrane spanning and processive cellulose synthases, which synthesizes homopolymers with the same linkage type. Enzymatic assays of proteinase-treated intact and detergent-disrupted Golgi vesicles suggest that the catalytic site of the α(1-4)galacturonosyltransferase activity resides in the lumen of the Golgi (Sterling et al., 2001). Recently, an Arabidopsis gene, classified within CAZy GT-family-8, was cloned, heterologously expressed, and shown to possess galacturonosyltransferase activity (J. Sterling and D. Mohnen, personal communication). The predicted topology of this enzyme is a typical type II membrane-spanning protein, perhaps suggesting that at least pectin synthesizing enzymes probably are type II membrane-anchored proteins. Current estimates suggest that about 1% of the open reading frames of each genome is dedicated to the task of glycosidic bond synthesis (Coutinho et al., 2003). When using the 415 CAZy classified Arabidopsis GTs as an estimate of the total number of glycosidic bond-forming enzymes, in plant this number is 1.6%, partly due to the existence of the highly complex CW. Based on the number of Arabidopsis genes that are predicted to possess signal peptides, it has been estimated that well over 2,000 genes are likely to participate in biosynthesis, assembly, and modification of CWs during plant development (Carpita et al., 2001). If soluble enzymes that participate in generation of substrates and the integral membrane-associated biosynthetic CW proteins, such as the cellulose synthase, are included, it has been estimated that some 15% of the Arabidopsis genome is dedicated to CW biogenesis and modification (Carpita et al., 2001).
CAZy GT-family-1 consists of primarily soluble enzymes with function in secondary metabolism having rather small molecules as acceptor substrates. If the 121 Arabidopsis sequences in GT-family-1 are subtracted from the total 415 sequences, 296 GTs are left for glycosylation of proteins and lipids, synthesis of various polysaccharides, and CW biosynthesis. In Arabidopsis-rich CAZy GT families, such as GT8, GT31, or GT47, alignments of Arabidopsis accessions reveal the existence of highly identical genes within the GT families, which are likely to have identical function but may be differentially expressed. For e.g. pectin synthesis alone, one of the major noncellulosic CW polysaccharides, which comprises the polysaccharides homogalacturonan and rhamnogalacturonan I and II, at least 53 distinct enzymatic activities are required (Mohnen, 1999; Ridley et al., 2001).
In this study, the use of the most conservative transmembrane span prediction server as the first filter clearly filters out an unknown number of GTs with a weak TMD profile and perhaps also GTs without a TMD, which might interact in complexes with other membrane-bound GTs. A significant number of the Arabidopsis sequences, e.g. in the CAZy database GT-family-47, do not have a predictable TMD domain and are therefore often referred to as soluble enzymes. Of the six noncellulosic plant CW GTs with known function, the XG-specific β(1-2)galactosyltransferase, the putative rhamnogalacturonan II-specific β(1-4)glucuronosyltransferase, and the XG-specific α(1-2)fucosyltransferase are not predicted to possess an N-terminal transmembrane helix when run through the TMHMM 2.0 server used in this study (Table III). However, when run through the various prediction servers available at the ARAMEMNON site, the three GTs were predicted to contain an N-terminal TMD-like structure by at least one of the programs available at this site. In conclusion, it is unresolved whether some CW GTs are soluble. Reporter gene and or tag fusion experiments may shed some light on this.
CONCLUSION
The CAZy database serves as the most complex and rich source of carbohydrate active enzymes. Classification of GTs in the CAZy database is based primarily on PSI-BLAST searches, using GTs with known function and in some cases proteins for which the 3D structures have been resolved, as the seed (Henrissat et al., 2001). With respect to GTs, it is widely accepted that secondary structure is more conserved than primary sequence. The classification scheme used in the CAZy database may not facilitate the identification of GTs that are only similar at a level higher than the primary sequence level (e.g. the fold level). A drawback of the present alternative approach may be the use of the SUPERFAMILY prediction server, which (as e.g. also Pfam) uses HMMs generated from alignments of proteins, the vast majority being of non-plant origin. We expect that a higher number of candidate GTs will be retrieved when it is possible to screen the entire Arabidopsis proteome for proteins using servers operating at the fold level.
GTs situated in the Golgi apparatus involved in synthesis of the complex Asn-linked glycans of plant glycoproteins may be found among the accessions uncovered in this study. We expect that a significant proportion of plant proteoglycan GTs are homologous to similar enzymes from other eukaryotes due to the structural similarities that exist in these glycans across kingdoms. If this assumption is valid, many of the plant proteoglycan GTs are already in CAZy.
The sequential and parallel use of several prediction servers, albeit with relatively low stringency parameter settings, inclusion of negative and positive controls of the filtering, followed by a post-filtering evaluation warrant that a substantial number of GTs indeed are found within the 27 accessions. It is, however, also clear that e.g. the use of the conservative TMHMM server has as a consequence that relevant GTs have also been eliminated and, hence, that the 27 putative GTs are but a subset of the GTs that remain to be recognized as such.
MATERIALS AND METHODS
The Arabidopsis Proteome
The Arabidopsis proteome in a nonredundant form was downloaded from EMBL (http://www.ebi.ac.uk/proteome/index.html; 08072003), converted to FASTA format, and split into 26,095 individual proteins using the Wisconsin-package version 10.3 (http://www.biobase.dk/).
TMHMM Version 2.0 Server
Predictions of transmembrane helices were carried out using the TMHMM server version 2.0 (Krogh et al., 2001; http://www.cbs.dtu.dk/services/TMHMM). All predictions were performed using standard settings. The proteome was submitted in subfiles (FASTA file format) containing the max limit of 4,000 proteins per submission. Output format was one text line per protein. The outputs from the entire proteome were collected in a single file for further screening. Proteins containing one or two TMDs starting within the N-terminal first 150 amino acid residues were extracted using the BBEdit program version 6.1.2 for MacOSX. A Unix list file containing the resulting accession numbers was generated, and these accession numbers were then used to extract relevant protein sequences from the original proteome and stored in a FASTA file. This FASTA file was then used in the next filtering process.
Superfamily Version 1.63 Server
The SUPERFAMILY facility (http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/) implements a searchable library (covering all proteins of known structure) consisting of 1,232 SCOP superfamilies, each of which is represented by a group of HMMs, i.e. SCOP-based single sequence HMMs (Gough et al., 2001). The post-TMHMM FASTA file was divided into files containing a maximum of 20 proteins. These files were submitted using the following parameters: scoring options, Global/Global model/sequence scoring (for exact domains), BLAST pre-filter P < 1.0 × 10−10. The output files were collected in one large file and sorted after superfamily domain. Proteins that were classified as belonging to one of the superfamilies listed below were independently collected using a Unix list file with relevant accession numbers, and a FASTA file was generated for each of the five superfamilies used in this study: NDP-sugartransferases, Gal-binding domain-like, UDP-glycosyltransferase/glycogen phosphorylase, carbohydrate-binding domain, and Rossman fold.
Local CAZy Database
All the Arabidopsis protein accession numbers were collected from the CAZy database (September 10, 2003). These accessions were then used to generate a Unix list file that served as template for the generation of a FASTA file from the Arabidopsis proteome. The BLAST 2.6.6 program for powerpc-MacOSX was downloaded from ftp://ftp.ncbi.nih.gov and a BLASTable database built from the FASTA file as described by the provider.
The five independent FASTA files derived from the superfamily search were blasted against the local CAZy database using the BBEdit program (standard conditions with filtering off). Proteins in the dataset, which were found in the local CAZy database, were discarded.
3D-PSSM Server
Fast Web-based methods for protein fold recognition using 1D and 3D sequence profiles coupled with secondary structure information, i.e. SCOP-based profile HMMs, included the following: 3D-PSSM Web Server version 2.6.0 (http://www.sbg.bio.ic.ac.uk; Kelley et al., 2000) and mGenTHREADER available at the PSIPRED home page (http://bioinf.cs.ucl.ac.uk/psiform.html). All predictions were performed using standard settings. Proteins larger than 800 amino acids were submitted twice either with truncations in the N or the C terminus. The outputs were then collected and screened for known GT PDB IDs. If more than one PDB ID was present in the output from the same file, only the one with the lowest E-value was listed.
mGenTHREADER Server
mGenTHREADER (Jones, 1999; http://bioinf.cs.ucl.ac.uk/psipred/) is a fold recognition server based on fold library profiles that uses the PSI-BLAST profile and predicted secondary structure (PSIPRED). PSIPRED is a secondary structure prediction method, incorporating two feed-forward neural networks, which takes the output obtained from PSI-BLAST as input. mGenTHREADER, accessible from the PSIPRED Protein Structure Prediction Servers home page, was used with the following parameters: prediction method, Fold Recognition (mGenTHREADER); output, default. The outputs were then collected and screened for known PDB IDs of known GTs (compare with Table I). If more than one PDB ID was present in the output from the same file, only the one with the lowest E-value was listed.
PDB IDs
The PDB IDs that were used for screening the output of both the 3D-PSSM and mGenTHREADER were collected from Wimmerova et al. (2003), who searched the Mycobacterium tuberculosis genome for GTs using, among other tools, fold recognition and the CAZy database (when the individual CAZy GT family contained more than one PDB ID for the particular protein, only one of the PDB IDs was used). References for the PDB IDs can be retrieved at http://www.RCSB.org/. All proteins were classified to one of the two secondary fold structures, GT-A or GT-B.
BLAST
As part of the validation process, the proteins were blasted using BLAST algorithms, which were accessible from the server at NCBI (National Center of Biotechnology Information; http://ww.ncbi.nlm.nih.gov). The search included standard protein blast (blastp) and translated blast (tblastn). All searches were performed using standard settings and the BLOSUM 80 matrix. In the case of blastp, any hits, regardless of the e-value/identity, to animal, bacterial, or plant sequences were reported. Presence of ESTs was checked by blastn searches of the Arabidopsis EST database (http://ww.ncbi.nlm.nih.gov).
SignalP Server
The candidate genes were scanned for the presence of signal peptides using the SignalP version 2.0.b2 server (Nielsen et al., 1999) World Wide Web server (http://www.cbs.dtu.dk/services/SignalP), which predicts the presence and location of signal peptide cleavage sites in amino acid sequences using HMM-based predictions (Nielsen et al., 1999). Predictions were done using the following parameters: organism group, Eukaryotes; method, HMMs; graphics, none; output format, standard. Proteins were truncated after the first 70 amino acids from the N terminus and submitted in a FASTA file format.
MEME Version 3.0
Sequences, for which the secondary structure resembled that of known GTs, were submitted as a FASTA file to the MEME v 3.0 server (http://meme.sdsc.edu/meme/website/meme.html) in order to search for conserved domains. Standard settings were used for the search.
Alignments and Phylogenetic Analysis
All sequence alignments and calculations of sequence identities were performed by use of ClustalX version 1.81, available from Université Louis Pasteur, Strasbourg (ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalX; Thompson et al., 1997). Alignments were edited and printed using the program SeqVu (SeqVu version 1.0.1; http://www.cellbiol.com/soft.htm). Trees with bootstrap values from 1,000 resampling replicates were obtained using the Njplot program, which is part of the ClustalX program package. Printed trees were modified using the TreeViewPPC version 1.6.6 software (http://taxonomy.zoology.gla.ac.uk/rod/treeview.html).
HCA Analysis
HCA plots were obtained from the drawhca server on the Internet (http://smi.snv.jussieu.fr/hca/hca-form.html). The actual assessments of the individual HCA plots were done manually as described by Breton et al. (1998).
Pfam
Proteins were analyzed for the presence of known domains using the Pfam HMM (Bateman et al., 2004) available at the St. Louis Pfam server (http://pfam.wustl.edu). The searches were performed using individual global/local search options and a cutoff E-value > 0.1. Only the best hits were reported.
ARAMEMNON
ARAMEMNON (Schwacke et al., 2003; Arabidopsis Membrane Protein Database at http://aramemnon.botanik.uni-koeln.de/) consolidates prediction of transmembrane helixes based on several TMD prediction servers. ARAMEMNON uses the following servers: Alom_v2 (http://psort.nibb.ac.jp/form2.html); HmmTop_v2 (http://www.enzim.hu/hmmtop/html/submit.html); MemSat_v1.8 (http://bioinf.cs.ucl.ac.uk/psiform.html); PHDhtm (http://cubic.bioc.columbia.edu/predictprotein/submit_exp.html#top); PHDhtm (http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_phd.html); PredTmr_v1 (http://biophysics.biol.uoa.gr/PRED-TMR/input.html); SosuiG_v1.1 (http://sosui.proteome.bio.tuat.ac.jp/cgi-bin/sosui.cgi?/sosui_submit.html); THUMBUP_v1 (http://phyyz4.med.buffalo.edu/Softwares-Services_files/thumbup.htm); Tmap (http://www.mbb.ki.se/tmap/); TMHMM_v2 (http://www.cbs.dtu.dk/services/TMHMM/); TmPred (http://www.ch.embnet.org/software/TMPRED_form.html); and TopPred_v2 (http://bioweb.pasteur.fr/seqanal/interfaces/toppred.html).
Array Data
Isoxaben array data are available at http://affymetrix.arabidopsis.info/narrays/experimentbrowse.pl.
Distribution of Materials
Upon request, all novel materials described in this publication will be made available in a timely manner for noncommercial research purposes, subject to the requisite permission from any third party owners of all or parts of the material. Obtaining any permission will be the responsibility of the requestor. Access to the novel accessions reported in this manuscript can be requested by e-mail (j.egelund@dias.kvl.dk).
Sequence data from this article have been deposited with the EMBL/GenBank data libraries under accession numbers Q8KHJ3, Q9UBX8, Q94BZ8, Q9ZSJ2, Q9M146, Q9C9Q5, A9LTZ5, Q9FF50, O045498, Q9C9Q6, Q9C9Q5, Q9FXA7, Q9M146, Q9ZSJ2, Q9ZSJ0, Q9LZ77, Q9M147, Q9C920, Q9XEE9, Q9C920, Q9LTZ5, Q9FM26, and Q9M146, Q9SAD6, Q9LKU7, Q9T0G0, O23479, Q9C990, Q9C991, Q9LKU7, Q9T0G0, and O23479.
Supplementary Material
Acknowledgments
Dr. Ahmed Faik, Michigan State University, is acknowledged for instigating this line of research in our lab; Dr. Christelle Breton, INRA France, for helpful discussions and initial HCA analysis; and Dr. Kristian Axelsen, Institute of Plant Biology, The Royal Veterinary and Agricultural University, Denmark, and Swiss Institute of Bioinformatics, Geneva, for helpful discussions and propositions throughout the process. Dr. Julian Gough and Ph.D. student Martin Madera are greatly appreciated for their skillful help with submission to the SUPERFAMILY server. Dr. William G.T. Willats is acknowledged for providing corrected array data.
This work was supported by the Danish National Research Foundation and The Danish Research Agency.
The online version of this article contains Web-only data.
References
- Amando M, Almeida R, Schwientek T, Clausen H (1999) Identification and characterization of large galactosyltransferase gene families: galactosyltransferases for all functions. Biochim Biophys Acta 1473: 35–53 [DOI] [PubMed] [Google Scholar]
- Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer ELL, et al (2004) The Pfam protein families database. Nucleic Acids Res 32: D138–D141 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28: 235–242 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bolwell GP, Northcote DH (1983) Arabinan synthase and xylan synthase activities of Phaseolus vulgaris. Subcellular localization and possible mechanism of action. Biochem J 210: 497–507 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bourne Y, Henrissat B (2001) Glycoside hydrolases and glycosyltransferases: families and functional modules. Curr Opin Struct Biol 11: 593–600 [DOI] [PubMed] [Google Scholar]
- Bouton S, Leboeuf E, Mouille G, Leydecker M-T, Talbotec J, Granier G, Lahaye M, Höfte H, Truong N-H (2002) QUASIMODO1 encodes a putative membrane-bound glycosyltransferase required for normal pectin synthesis and cell adhesion in Arabidopsis. Plant Cell 14: 1–14 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Breton C, Bettler E, Joziasse DH, Geremia RA, Imberty A (1998) Sequence-function relationships of prokaryotic and eukaryotic galactosyltransferases. J Biochem (Tokyo) 123: 1000–1009 [DOI] [PubMed] [Google Scholar]
- Breton C, Mucha J, Jeanneau C (2001) Structural and functional features of glycosyltransferases. Biochimie 83: 713–718 [DOI] [PubMed] [Google Scholar]
- Carpita N, Tierney M, Campbel M (2001) Molecular biology of the plant cell wall: searching for the genes that define structure, architecture and dynamics. Plant Mol Biol 47: 1–5 [PubMed] [Google Scholar]
- Carpita NC (1996) Structure and biogenesis of the cell walls of grasses. Annu Rev Plant Physiol Plant Mol Biol 47: 445–476 [DOI] [PubMed] [Google Scholar]
- Cosgrove DJ (1997) Relaxation in a high-stress environment: the molecular bases of extensible cell walls and cell enlargement. Plant Cell 9: 1031–1041 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Costa AA, Gomez FJ, Pereira M, Felipe MS, Jesuino RS, Deepe GS, de Almeida Soares CM (2002) Characterization of a gene which encodes a mannosyltransferase homolog of Paracoccidioides brasiliensis. Microbes Infect 4: 1027–1034 [DOI] [PubMed] [Google Scholar]
- Coutinho PM, Deleury E, Davies GJ, Henrissat H (2003) An evolving hierarchical family classification for glycosyltransferases. J Mol Biol 328: 307–317 [DOI] [PubMed] [Google Scholar]
- Crick DC, Mahapatra S, Brennan PJ (2001) Biosynthesis of the arabinogalactan-peptidoglycan complex of Mycobacterium tuberculosis. Glycobiology 11: 107R–118R [DOI] [PubMed] [Google Scholar]
- Edwards ME, Dickson CA, Chengappa S, Christopher C, Michael J, Gidley MJ, Grant Reid SJ (1999) Molecular characterisation of a membrane-bound galactosyltransferase of plant cell wall matrix polysaccharide biosynthesis. Plant J 19: 691–697 [DOI] [PubMed] [Google Scholar]
- Encina A, Sevillano JM, Acebes JL, Alvarez J (2002) Cell wall modifications of bean (Phaseolus vulgaris) cell suspensions during habituation and dehabituation to dichlobenil. Physiol Plant 114: 182–191 [DOI] [PubMed] [Google Scholar]
- Faik A, Price NC, Raikhel NV, Keegstra K (2002) An Arabidopsis gene encoding an α-xylosyltransferase involved in xyloglucan biosynthesis. Proc Natl Acad Sci USA 99: 7797–7802 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fry SC (1995) Polysaccharide-modifying enzymes in the plant cell wall. Annu Rev Plant Physiol Plant Mol Biol 46: 497–520 [Google Scholar]
- Gastinel LN (2001) Galactosyltransferases: a structural overview of their function and reaction mechanisms. Trends Glycosci Glycotechnol 13: 131–145 [Google Scholar]
- Geshi N, Jørgensen B, Ulvskov P (2004) Subcellular localization and topology of β(1,4)galactosyltransferase in potato. Planta 218: 862–868 [DOI] [PubMed] [Google Scholar]
- Gough J, Chothia C (2002) SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res 30: 268–272 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gough J, Karplus K, Hughey R, Chothia C (2001) Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol 313: 903–919 [DOI] [PubMed] [Google Scholar]
- Harper AD, Bar-Peled M (2002) Biosynthesis of UDP-xylose. Cloning and characterization of a novel Arabidopsis gene family, UXS, encoding soluble and putative membrane-bound UDP-glucuronic acid decarboxylase isoforms. Plant Physiol 130: 2188–2198 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Henrissat B, Coutinho PM, Davies J (2001) A census of carbohydrate-active enzymes in the genome of Arabidopsis thaliana. Plant Mol Biol 47: 55–72 [PubMed] [Google Scholar]
- Hu Y, Walker S (2002) Remarkable structural similarities between diverse glycosyltransferases. Chem Biol 9: 1287–1296 [DOI] [PubMed] [Google Scholar]
- Iwai H, Masaoka N, Ishii T, Satoh S (2002) A pectin glucuronosyltransferase gene is essential for intercellular attachment in the plant meristem. Proc Natl Acad Sci USA 99: 16319–16324 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones DT (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol 287: 797–815 [DOI] [PubMed] [Google Scholar]
- Kelley LA, MacCallum RM, Sternberg MJE (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 299: 499–520 [DOI] [PubMed] [Google Scholar]
- Kiino DR, Licudine R, Wilt K, Yang DH, Rothman-Denes LB (1993) A cytoplasmic protein, NfrC, is required for bacteriophage N4 adsorption. J Bacteriol 175: 7074–7080 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krogh A, Larsson B, von Heijne G, Sonnhammer ELL (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305: 567–580 [DOI] [PubMed] [Google Scholar]
- MacDougal AJ, Rigby NM, Ring SC (1997) Phase separation of plant cell wall polysaccharides and its implications for cell wall assembly. Plant Physiol 114: 353–362 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Madson M, Dunand C, Li X, Verma R, Vanzin GF, Caplan J, Shoue DA, Carpita NC, Reiter W-D (2003) The MUR3 gene of Arabidopsis encodes a xyloglucan galactosyltransferase that is evolutionarily related to animal exotosins. Plant Cell 15: 1662–1670 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McGuffin LJ, Jones DT (2003) Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics 19: 874–881 [DOI] [PubMed] [Google Scholar]
- Mohnen D (1999) Biosynthesis of pectins and galactomannans. In BM Pinto, ed, Comprehensive Natural Products Chemistry, Volume 3: Carbohydrates and Their Derivatives Including Tannins, Cellulose, and Related Lignins. Elsevier, Oxford, pp 497–527
- Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247: 536–540 [DOI] [PubMed] [Google Scholar]
- Nielsen H, Brunak S, von Heijne G (1999) Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Eng 12: 3–9 [DOI] [PubMed] [Google Scholar]
- Perrin RM, DeRocher AE, Bar-Peled M, Zeng W, Norambuena L, Orellana A, Raikhel NV, Keegstra K (1999) Xyloglucan fucosyltransferase, an enzyme involved in plant cell wall biosynthesis. Science 284: 1976–1979 [DOI] [PubMed] [Google Scholar]
- Persson K, Ly HD, Dieckelmann M, Wakarchuk WW, Withers SG, Strynadka NCJ (2001) Crystal structure of the retaining galactosyltransferase LgtC from Neisseria meningitidisin in complex with donor and acceptor sugar analogs. Nat Struct Biol 8: 166–175 [DOI] [PubMed] [Google Scholar]
- Ridley BL, O'Neill MA, Mohnen D (2001) Pectins: structure, biosynthesis, and oligogalacturonide-related signaling. Phytochemistry 57: 929–967 [DOI] [PubMed] [Google Scholar]
- Schedletzky E, Shmuel M, Delmer DP, Lamport TA (1990) Adaptation and growth of tomato cells on the herbicide 2,6-dichlorobenzonitrile leads to production of unique cell walls virtually lacking a cellulose-xyloglucan network. Plant Physiol 94: 980–987 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scheible W-R, Eshed R, Richmond T, Delmer D, Somerville C (2001) Modifications of cellulose synthase confer resistance to isoxaben and thiazolidinone herbicides in Arabidopsis Ixr1 mutants. Proc Natl Acad Sci USA 98: 10079–10084 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwacke R, Schneider A, van der Graaff E, Fischer K, Catoni E, Desimone M, Frommer WB, Flugge UI, Kunze R (2003) ARAMEMNON, a novel database for Arabidopsis integral membrane proteins. Plant Physiol 131: 16–26 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sherrier DJ, VandenBosch KA (1994) Secretion of cell wall polysaccharides in Vicia root hairs. Plant J 5: 185–195 [Google Scholar]
- Sterling JD, Quigley HF, Orellana A, Mohnen D (2001) The catalytic site of the pectin biosynthetic enzyme α-(1,4)-galacturonosyltransferase is located in the lumen of the Golgi. Plant Physiol 127: 360–371 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stolz J, Munro S (2002) The components of the Saccharomyces cerevisiae mannosyltransferase complex M-Pol I have distinct functions in mannan synthesis. J Biol Chem 277: 44801–44808 [DOI] [PubMed] [Google Scholar]
- Tarbouriech N, Charnock SJ, Davies GJ (2001) Three-dimensional structures of the Mn and Mg dTDP complexes of the family GT-2 glycosyltransferase SpsA: a comparison of related NDP-sugar glycosyltransferases. J Mol Biol 324: 655–661 [DOI] [PubMed] [Google Scholar]
- Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25: 4876–4882 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wimmerova M, Engelsen SB, Bettler E, Breton C, Imberty A (2003) Combining fold recognition and exploratory data analysis for searching for glycosyltransferases in the genome of mycobacterium tuberculosis. Biochimie 85: 691–700 [DOI] [PubMed] [Google Scholar]
- Wiggins CA, Munro S (1998) Activity of the yeast MNN1 α-1,3-mannosyltransferase requires a motif conserved in many other families of glycosyltransferases. Proc Natl Acad Sci USA 95: 7945–7950 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang GF, Staehelin LA (1992) Functional compartmentation of the Golgi apparatus of plant cells. Immunocytochemical analysis of high-pressure frozen- and freeze-substituted sycamore maple suspension culture cells. Plant Physiol 99: 1070–1083 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou H, Zhou Y (2003) Predicting the topology of transmembrane helical proteins using mean burial propensity and a hidden-Markov-model-based method. Protein Sci 12: 1547–1555 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




