Abstract
Arabinogalactan proteins (AGPs) are extracellular hydroxyproline-rich proteoglycans implicated in plant growth and development. The protein backbones of AGPs are rich in proline/hydroxyproline, serine, alanine, and threonine. Most family members have less than 40% similarity; therefore, finding family members using Basic Local Alignment Search Tool searches is difficult. As part of our systematic analysis of AGP function in Arabidopsis, we wanted to make sure that we had identified most of the members of the gene family. We used the biased amino acid composition of AGPs to identify AGPs and arabinogalactan (AG) peptides in the Arabidopsis genome. Different criteria were used to identify the fasciclin-like AGPs. In total, we have identified 13 classical AGPs, 10 AG-peptides, three basic AGPs that include a short lysine-rich region, and 21 fasciclin-like AGPs. To streamline the analysis of genomic resources to assist in the planning of targeted experimental approaches, we have adopted a flow chart to maximize the information that can be obtained about each gene. One of the key steps is the reformatting of the Arabidopsis Functional Genomics Consortium microarray data. This customized software program makes it possible to view the ratio data for all Arabidopsis Functional Genomics Consortium experiments and as many genes as desired in a single spreadsheet. The results for reciprocal experiments are grouped to simplify analysis and candidate AGPs involved in development or biotic and abiotic stress responses are readily identified. The microarray data support the suggestion that different AGPs have different functions.
With the completion of the Arabidopsis genome (Arabidopsis Genome Initiative [AGI], 2000) and the development of techniques such as microarrays, there is a wealth of information about genes and their expression. However, the precise function of only a small proportion of genes is known and around 40% of proteins are classified as unknown or hypothetical proteins (AGI, 2000). Many of the genes coding for arabinogalactan-protein (AGP) backbones are annotated as hypothetical proteins. One of the challenges facing Arabidopsis researchers working on large gene families is choosing which family members to focus their efforts on to maximize their chances of understanding gene function.
Many of the genes encoding the protein backbones of proteoglycans and glycoproteins of the extracellular matrix are encoded by large multigene families. These wall “proteins” are believed to be involved in many aspects of plant growth and development, but their roles are poorly defined. A large number of these proteins are rich in Pro and/or Hyp. The Pro-/Hyp-rich glycoproteins (P/HRGPs) were originally classified into three separate classes: the Pro-rich proteins (PRPs), the extensins, and the AGPs (Showalter, 1993; Nothnagel, 1997; Sommer-Knudsen et al., 1998; Bacic et al., 2000). Over the years, it has become apparent that this family comprises a continuum of molecules from the non- or minimally glycosylated PRPs, the moderately glycosylated extensins, through to the Pro/Hyp, Ser, Thr, and Ala-rich AGPs, where carbohydrates comprise up to 99% of the Mr of the proteoglycan (Sommer-Knudsen et al., 1998).
We have proposed the following nomenclature to provide some consistency to the naming of the P/HRGPs (Johnson et al., 2002). A protein is considered an “AGP” or an “extensin” if the entire predicted mature protein contains modules characteristic of that class of P/HRGP. When protein sequences contain several distinct regions, we use the following definitions: “chimeric” is used if one region contains a known P/HRGP motif (e.g. extensin) and the other region(s) contain unrelated motifs (e.g. Leu-rich repeat). “Hybrid” is used when there are two different P/HRGP motifs (e.g. extensin motifs and AGP motifs). Thus, the term AGP covers the “classical” AGPs. Most of the proteins previously called “nonclassical” AGPs would now be called chimeric AGPs assuming no other P/HRGP motifs were present.
For the extensins, agreement needs to be reached within the cell wall community about whether the extensin “glycomodule,” i.e. Ser-Pro3-4, is sufficient to define an extensin, or whether a larger repeat motif containing Tyr, Lys, and Val defines an extensin. Throughout this manuscript, we have adopted the latter view that a “true” extensin contains the extensin glycomodule (Kieliszewski and Shpak, 2001) within a larger motif containing Tyr, Lys, and Val. We adopted this view because it is consistent with the general view that extensins are insoluble components of the plant cell wall. P/HRGPs without Tyr, Val, and Lys, such as AGPs and the gum arabic glycoprotein (GAGP; Goodrum et al., 2000), are generally soluble.
Individual members of each P/HRGP subclass are difficult to separate biochemically. It has also been difficult to clone these genes because of the repetitive nature of their protein backbones and, in the case of AGPs, their overall low sequence similarity (Du et al., 1996). The wide-scale sequencing of Arabidopsis expressed sequence tags (ESTs) that began in the early 1990s (Höfte et al., 1993; Newman et al., 1994) made it significantly easier to identify AGP protein backbone genes (Schultz et al., 2000).
Putative orthologs of the classical AGPs originally identified in pear (Pyrus communis; Chen et al., 1994) and Nicotiana alata (Du et al., 1994) were identified from the Arabidopsis EST database. Other AGPs, for example the AGPs containing a short Lys-rich domain originally identified in tomato (Lycopersicon esculentum; Pogson and Davies, 1995; Li and Showalter, 1996), were cloned from Arabidopsis using reverse transcription-PCR (Gilson et al., 2001). Two new classes of AGP protein backbone genes were subsequently identified in Arabidopsis (Schultz et al., 2000). These were the arabinogalactan (AG)-peptides that have a predicted mature protein backbone of only 10 to 13 amino acid residues and the putative cell adhesion molecules, the fasciclin-like AGPs (FLAs) that contain one or two AGP domains, and one or two fasciclin domains (Gaspar et al., 2001).
There are no obvious orthologues of the Asn-rich chimeric (nonclassical) AGPs in Arabidopsis. There are three proteins in the Arabidopsis annotated database (encoded by At1g03820, At1g28400, and At2g33850) that contain Asn-rich C termini that are similar to NaAGP2 and PcAGP2 (Gaspar et al., 2001). Only one of these proteins, encoded by At1g03820, has a putative AGP glycomodule, and this glycomodule is only six amino acid residues in length (Ser-Pro-Ala-Pro-Ala-Pro).
AGPs are implicated in diverse roles in plant growth and development. One of the most influential findings in AGP research in the last 5 years was the biochemical evidence that certain AGPs are anchored to the plasma membrane by glycosylphosphatidylinositol (GPI) anchors (Youl et al., 1998; Svetek et al., 1999). This finding changed the conventional thinking about how AGPs might interact with other molecules at the cell surface. In Arabidopsis, there are at least 35 genes encoding AGP protein backbones and most of these are predicted to be GPI anchored (Gaspar et al., 2001). AGPs represent about 16% of the GPI-anchored proteins in plants (Borner et al., 2001). GPI-anchored proteins in animal cells are involved in cell-cell signaling by interacting with proteins in the same or neighboring cell types (Selleck, 2000). GPI anchors on AGPs provided a plausible mechanism by which AGPs may be involved in signaling pathways (Schultz et al., 1998).
One of the attractions of using Arabidopsis to study AGP function is the availability of DNA insertion mutants. Because the proposed function(s) of AGPs are diverse, it is not possible to devise a mutant screen that would specifically identify mutants in AGP genes. The publicly available pools of DNA from T-DNA-tagged lines has made it possible to identify several AGP mutants (Gaspar et al., 2001). As is the case for many other multigene families (Bouchez and Höfte, 1998; Winkler et al., 1998; Meissner et al., 1999), it has been difficult to identify phenotypes for these mutants. The only AGP mutant that has a known phenotype is agp17. Professor Stanton B. Gelvin's group (Purdue University, West Lafayette) identified the rat1 (resistant to Agrobacterium transformation) phenotype (Nam et al., 1999) in the same Feldman line that our group identified a T-DNA insertion in the promoter of AtAGP17 (Gaspar et al., 2001).
Now that the genome of Arabidopsis is sequenced, it is theoretically possible to identify all of the AGP protein backbone genes in a single species. We have used novel software approaches based on biased amino acid composition and, in the case of the AG-peptides, their short length, to identify AGPs in Arabidopsis. We have adopted a systematic approach to obtaining and evaluating the publicly available Arabidopsis resources to help us select specific subsets of genes for targeted experimental approaches. This approach can be used to help guide research direction in all gene families and should accelerate outcomes by enabling the choice of the most appropriate experiments for each family member.
RESULTS
Finding AGPs Based on Biased Amino Acid Composition
With the completion of the Arabidopsis genome, we wanted to do a final check for AGP protein backbone genes. Finding all AGP family members using BLAST is difficult due to their low degree of amino acid similarity (Du et al., 1996; Schultz et al., 2000). We used the biased amino acid composition of AGPs to find AGP protein backbone genes in the Arabidopsis genome. The proportion of Pro, Ala, Ser, and Thr (PAST) was determined for each putative protein in the Arabidopsis genome (25,617 proteins). In a test of this approach, all the proteins above 55% PAST were placed into a file for further analysis. Only 34 PAST-rich proteins were identified (Table I). All AGP protein backbones are expected to have an N-terminal secretion signal for targeting of proteins to the endoplasmic reticulum. Therefore, each PAST-rich protein was analyzed for the presence of an N-terminal secretion signal using SignalP (Nielsen et al., 1997). Twenty-eight of the 34 PAST-rich sequences were predicted to be secreted. Of these, 11 were AGPs that we had previously identified (Schultz et al., 2000), 14 were extensins, and three did not belong to either class (Table I).
Table I.
AGPs and AG-peptides identified from the Arabidopsis genome based on biased amino acid composition and size
Searching Criteriaa | Total | Secretion Signalb | AGPc | AG-Peptidesc | Chimeric AGPc | Extensinc | Hybrid P/HRGPc | Otherc |
---|---|---|---|---|---|---|---|---|
PAST > 55% | 34 | 28 | 11 | 0 | 0 | 14 | 0 | 3 |
PAST > 50% | 62 | 49 | 15 | 1 | 4 | 19 | 1 | 9 |
Short proteins (50–75 residues) | 308 | ndd | nd | nd | nd | nd | nd | nd |
Short proteins and >35% PAST | 44 | 19 | 0 | 10 | 0 | 2 | 0 | 7 |
PAST in the proportion of Pro, Ala, Ser, and Thr.
N-Terminal secretion signals were determined by SignalP (Nielsen et al., 1997).
See “Materials and Methods” for the criteria used to classify the proteins as AGPs, AG-peptides, and extensins.
nd, Not determined.
Not all of the known AGPs were identified using this high PAST percentage, so the program was run with a lower cutoff (50% PAST). Using these criteria, 62 proteins were identified, of which 49 were predicted to be secreted. Only one of the AGPs that we had previously identified (Schultz et al., 2000) was not found when the criteria was reduced to 50% PAST. AGP3 was not identified because it is not one of the annotated proteins (see “Discussion”). The additional proteins identified at 50% PAST included three new putative AGPs (At5g18690, At2g47930, and At3g06360), five extensins, and a single AG-peptide (AGP15). We also identified four proteins that contained AGP-like modules (Ala-Pro and Ser-Pro) as well as regions not rich in PAST. These are classified as chimeric AGPs (Table I). The chimeric AGPs included an ENOD20-like protein (At4g27520), a nonspecific lipid transfer protein (At1g36150), and two blue copper-binding proteins (At3g60280, At5g53870). A hybrid P/HRGP was also identified (At1g62763). The proteins that were not predicted to be secreted did not look like AGPs (i.e. did not contain Ala-Pro and Ser-Pro motifs) or any of the other classes of P/HRGPs. Most of the non-secreted proteins were unknown or hypothetical proteins.
Finding AG-Peptides Based on Size and Biased Amino Acid Composition
To identify AG-peptides the same set of 25,617 predicted proteins from the Arabidopsis genome was searched to identify proteins between 50 and 75 amino acid residues in length. A total of 308 proteins were found (Table I). This number was reduced to a manageable level by first selecting all proteins with a PAST composition of >35% and then selecting the proteins that are predicted to be secreted. As with the classical AGPs, the ones that were not predicted to be secreted did not look like AGPs and tended to have most of their PAST residues in the N-terminal secretion signal or C terminus, where the GPI anchor signal would be for an AG-peptide. Of the 19 proteins identified, two new AG-peptides were identified (At3g57690 and At5g40730), as well as the known AG-peptides (Schultz et al., 2000) and others we had identified by similarity searches (data not shown).
Finding FLAs Using Hidden Markov Models
FLAs, along with other classes of chimeric AGPs, are not identified using the biased amino acid search at the 50% PAST threshold because the length of the fasciclin domain(s) are large compared with the regions containing the AGP glycomodule(s). For example, the entire FLA7 protein is only 39% PAST. However, if the single fasciclin domain is ignored, the remaining protein is 52% PAST. When the program is run at 39% PAST, FLA7 is identified along with 344 other proteins, including a histone H1 (At2g30620, 46% PAST), a β-1,3-glucanase (At1g26450, 45% PAST), and several transcription factors (e.g. At1g60210, 49% PAST).
We were interested in determining whether fasciclin domains were associated with protein domains other than AGP domains, so we adopted a strategy to identify all Arabidopsis proteins containing fasciclin domains. Fasciclin domains are approximately 100 amino acids long and are not well conserved (Kawamoto et al., 1998). However, they have two relatively conserved regions at either end of the domain (approximately 10 amino acids each). Hidden markov models, as used by PSI-BLAST (Altschul et al., 1997) and HMMer (Durbin et al., 1998), are able to identify proteins with low identity because they use multiple sequence alignments rather than pair-wise alignments. In these approaches, a multiple sequence alignment is generated, and from this alignment, a position-specific score matrix is generated. In this way, higher weightings are given to the more conserved regions of the protein and lower weightings are given to the less conserved regions of the protein.
A hidden markov model for the 88 fasciclin domains in the Pfam database was generated using HMMbuild (Durbin et al., 1998). This position-specific score matrix (HMM) was then used to search the Arabidopsis protein database at the DeCypher Web site (http://decypher2.stanford.edu/algo-hmm/ HMM_ha.html-ssi). In this way, proteins with the highest weighted scores are identified. We identified 21 Arabidopsis proteins with fasciclin domains. All of these proteins contained at least one AGP glycomodule. FLA1 through FLA17 have been described previously (Gaspar et al., 2001).
In total, we identified 14 classical AGPs, 10 AG-peptides, three “basic” AGPs containing a Lys-rich domain, and 21 FLAs. All of these genes are listed in Table II. A flow chart of data to collect for each gene was adopted to help prioritize and guide our research (Fig. 1). A list of the Web sites used at each step is included in Figure 1 and details of the preferred options at each site are provided in “Materials and Methods.” All of the annotated proteins were checked for the presence of a N-terminal secretion signal and for the C-terminal signal required for the addition of a GPI anchor. PSORT calculates a probability for the subcellular localization of each protein, e.g. cytoplasmic, organellar (membrane or lumen), extracellular, or GPI anchored. Most of the AGP protein backbone genes are predicted to be GPI anchored (Table II). For AGPs, PSORT (Nakai and Horton, 1999) appears to be a better predictor of the potential for GPI anchoring than big-PI predictor (Eisenhaber et al., 2000). This is based on a comparison of the predictions for the only three AGP protein backbones that have had their cleavage sites determined experimentally. These AGPs are PcAGP1 from pear (Youl et al., 1998), NaAGP1 from N. alata (Youl et al., 1998), and AtAGP10 from Arabidopsis (Schultz et al., 2000). PcAGP1 and AtAGP10 are not predicted to be GPI anchored by big PI (Eisenhaber et al., 2000) and the cleavage site for NaAGP1 is not predicted correctly by big PI. All three are predicted to be GPI anchored based on PSORT prediction.
Table II.
Identification of AGP protein backbone genes in the Arabidopsis genome
Name | Sa | AGI Locusb | GenBankc | Protein IDd | AIGI Contige | ESTs on AFGC Arrayf | EST Expression
Summaryg
|
||||
---|---|---|---|---|---|---|---|---|---|---|---|
AG | R | F | S | T | |||||||
AtAGP1 | C | At5g64310 | AF082298 | BAB09862.1 | TC109594 | 193H9T7 | – | 3 | 5 | 1 | 30 |
AtAGP2 | C | At2g22470 | AF082299 | AAD22366.1 | TC103463 | 196O14T7 | – | 3 | – | – | 14 |
AtAGP3 | C | AL161596h | AF082300 | NAi | TC115981 | 171M2T7, 212113T7 | – | 3 | – | – | 11 |
AtAGP4 | C | At5g10430 | AF082301 | CAB89400.1 | TC103487j | 177B12T7, 135F11T7, 63A4T7, 177O23T7 | – | 18 | – | – | 35 |
AtAGP5 | C | At1g35230 | AF082302 | AAG51456.1 | TC110704 | 190F4T7 | – | – | – | – | 5 |
AtAGP6 | C | At5g14380 | AJ012459 | CAB87777.1 | TC104640 | 171N22XP | – | – | 3 | – | 6 |
AtAGP7 | C | At5g65390 | AF195888 | BAB11561.1 | TC105529 | None | – | – | – | – | 3 |
AtAGP9 | C | At2g14890 | AF195890 | AAC61286.1 | TC121344 | 114A13T7 | – | 26 | 1 | 2 | 48 |
AtAGP10 | C | AT4g09030 | AF195891 | CAB78027.1 | TC110607 | 193B7T7 | – | 1 | 2 | – | 6 |
AtAGP11 | C | At3g01700 | AF195892 | AAF01553.1 | TC110949 | 305B10T7 | – | 1 | 1 | – | 4 |
AtAGP25 | C | At5g18690 | – | NA | TC115221 | None | – | – | – | – | 1 |
AtAGP26 | C | At2g47930 | – | AAK60279.1 | TC105017 | 174M22XP | – | 2 | – | – | 5 |
AtAGP27 | C | At3g06360 | – | AAF08578.1 | No ESTs | None | – | – | – | – | 0 |
AtAGP12 | P | At3g13520 | AF195893 | AAL24406.1 | TC115903 | 312D10T7, 283C8T7 | – | 1 | – | – | 14 |
AtAGP13 | P | AT4g26320 | AF195894 | CAB38961.1 | TC117212 | 126P21T7 | – | – | – | – | 2 |
AtAGP14 | P | At5g56540 | AF195895 | BAA97180.1 | TC122088 | 66G9XP | – | – | – | – | 8 |
AtAGP15 | P | AT5g11740 | AF195896 | CAB87692.1 | TC103758 | 244F14T7 | 1 | 2 | 5 | 1 | 20 |
AtAGP16 | P | At2g46330 | AF195897 | AAK95262.1 | TC121580 | 212E21T7, 118D15T7 | – | 2 | 6 | – | 21 |
AtAGP20 | P | At3g61640 | – | CAB71094.1 | TC111543 | 250G12T7 | – | – | – | – | 3 |
AtAGP21 | P | At1g55330 | – | AAG51571.1 | TC105447 | None | – | – | – | – | 3 |
AtAGP22 | P | At5g532250 | – | BAB09787.1 | TC105025 | 178N2T7 | – | 1 | – | – | 5 |
AtAGP23 | P | At3g57690 | AF497624 | CAB41186.1 | TC124297 | None | – | – | – | – | 1 |
AtAGP24 | P | At5g40730 | – | NA | TC104813 | 109G13T7 | – | – | – | 1 | 7 |
AtAGP17 | K | At2g23130 | AF305939 | AAB87117.1 | TC117671 | None | 1 | – | – | – | 2 |
AtAGP18 | K | AT4g37450 | AF305940 | CAB38212.1 | TC122923 | 205N2T7 | – | – | 1 | – | 4 |
AtAGP19 | K | At1g68725 | – | AAD49970.1 | No ESTs | None | – | – | – | – | 0 |
AtFLA1 | F | At5g55730 | AF333970 | BAB09240.1 | TC103889 | 113F6XP | 2 | 2 | – | 2 | 15 |
AtFLA2 | F | At4g12730 | AF333971 | CAB40990.1 | TC115528 | E8F4T7, 181J22T7 | 7 | 20 | 3 | 2 | 44 |
AtFLA3 | F | At2g24450 | – | AAD18116.1 | No ESTs | None | – | – | – | – | 0 |
AtFLA4 | F | At3g46550 | – | CAB62325.1 | No ESTs | None | – | – | – | – | 0 |
AtFLA5 | F | AT4g31370 | – | CAA16540.1 | No ESTs | None | – | – | – | – | 0 |
AtFLA6 | F | At2g20520 | AF333972 | AAD25652.1 | TC110718 | 156A23T7 | – | – | – | – | 5 |
AtFLA7 | F | At2g04780 | AF333973 | AAD22328.1 | TC103539 | 156K12T7 | 2 | 9 | 1 | 2 | 27 |
AtFLA8 | F | At2g45470 | AF195889 | AAB82617.1 | TC103534 | 141C1T7 | 11 | 6 | 5 | 7 | 44 |
AtFLA9 | F | At1g03870 | AF333974 | AAD10681.1 | TC109608 | 190O20T7, 147L18T7, 172M22T7 | – | 10 | – | 1 | 29 |
AtFLA10 | F | At3g60900 | – | CAB82694.1 | TC123602 | None | – | 2 | – | – | 3 |
AtFLA11 | F | At5g03170 | – | CAB86084.1 | TC121587j | 23E10T7 | – | 5 | – | 2 | 10 |
AtFLA12 | F | At5g60490 | – | BAB08232.1 | TC122855 | 145O5T7 | – | 1 | – | 1 | 5 |
AtFLA13 | F | At5g44130 | – | BAB10980.1 | TC111159 | None | – | – | – | 1 | 4 |
AtFLA14 | F | At3g12660 | – | BAB02409.1 | No ESTs | None | – | – | – | – | 0 |
AtFLA15 | F | At3g52370 | – | CAC07928.1 | TC104355 | 40G10T7 | – | 1 | 1 | – | 9 |
AtFLA16 | F | At2g35860 | – | AAD21471.1 | TC116353 | 131J22T7 | 2 | 1 | – | – | 8 |
AtFLA17 | F | At5g06390 | – | BAB08961.1 | No ESTs | None | – | – | – | – | 0 |
AtFLA18 | F | At3g11700 | – | AAF02137.1 | TC123176 | 143F8T7 | – | 1 | – | – | 4 |
AtFLA19 | F | At1g15190 | – | AAD39647.1 | No ESTs | None | – | – | – | – | 0 |
AtFLA20 | F | At5g40940 | – | BAB10524.1 | No ESTs | None | – | – | – | – | 0 |
AtFLA21 | F | At5g06920 | – | BAB11150.1 | No ESTs | None | – | – | – | – | 0 |
S represents the subclass of each AGP. C, Classical AGP; P, AG-peptide; K, Lys-rich AGP; F, FLA. If the subclass designation is in italics, the protein is not predicted to be GPI anchored.
Locus identity no. assigned by AGI (2000). AGPs that have not been published previously (Schultz et al., 2000; Gaspar et al., 2001) are indicated in bold.
GenBank accession nos. of the corresponding cDNA clones that we have sequenced.
The protein identification nos. are for the proteins annotated from the genome.
Contig no. assigned by The Institute for Genomic Research (TIGR) to the alignment of ESTs for each gene (see “Materials and Methods”).
ESTs from Michigan State University (East Lansing) on the AFGC array were identified by searching the Stanford Microarray Database clone ID page (see “Materials and Methods”).
Expression summaries were obtained from the TIGR web site (December 14, 2001). The libraries shown are: AG, above ground 2 to 6 weeks (12,267 ESTs); R, roots (17,573 ESTs); F, flowers (5,719 ESTs); S, green siliques (12,589 ESTs); and T, total.
The genomic sequence in this region was generated by PCR and there are three 2-bp deletions in the sequence compared with the full-length sequence (AF082300) of the cDNA clone (EST clone ID 171M2T7).
NA, Not available.
There are two overlapping genes in this contig. Only the ESTs that overlap with the full-length cDNA clone were considered. Note added in proof: A summary of the annotation of the AGP gene family can now be found at the Munich Information Center for Protein Sequences Web site on the external annotation page (http://mips.gsf.de/cgi-bin/proj/thal/framesetter?about&externalanno.html). Additional DNA insertion lines are being sequenced by the Salk Institute Genomic Analysis Laboratory as part of the Arabidopsis sequence indexed T-DNA insertion project. This project began on September 1, 2001 and will continue for 24 months. Monthly deposits of sequences in GenBank and Salk mutant lines in ABRC and NASC will be made. For more information see http://signal.salk.edu/tdna_FAQs.html. After this paper was accepted for publication, the work cited as Borner et al. (2001) on the prediction of GPI-anchored proteins in Arabidopsis was described in more detail in G.H.H. Borner, J.D. Sherrier, T.J. Stevens, I.T. Arkin, P. Dupree (2002) Plant Physiol 129: 486–499.
Figure 1.
How to learn a lot about your gene without going to the bench. The Web site addresses for each of the steps is indicated with a superscript and detailed below. The options to select at each Web site are detailed in “Materials and Methods.” 1, Genomic information, http://www.tigr.org/tdb/e2k1/ath1/LocusNameSearch.shtml; 2, Is the protein secreted? http://www.cbs.dtu.dk/services/SignalP/#submission; 3, GPI anchoring and localization, http://psort.nibb.ac.jp/form.html; 4, protein motifs, http://hits.isb-sib.ch/cgi-bin/PFSCAN; 5, Are there ESTs? http://www.ncbi.nlm.nih.gov/BLAST/; 6, How many ESTs and where are they expressed? http://www.tigr.org/tdb/agi/searching/reports.html; 7, Are ESTs on the Arabidopsis Functional Genomics Consortium (AFGC, Stanford, CA) array? http://afgc.stanford.edu/afgc_html/QueryEST.html; 8, on Affymetrix GeneChip? http://www.biology.ucsd.edu/labs/schroeder/downloads.html; 9, Genoplante flag sequence tags, http://flagdb-genoplante-info.infobiogen.fr/projects/fst/; 10, TMRI lines, http://www.tmri.org/pages/collaborations; 11, Insertwatch lines, http://nasc.nott.ac.uk/insertwatch/; 12, Institute of Molecular Agrobiology lines, http://www.plantcell.org/cgi/content/full/11/12/2263/DC1/1.
Profilescan was used to confirm the presence of fasciclin domains in the FLAs identified using hidden markov models. Profilescan identifies many motifs, including common motifs such as potential N-linked glycosylation sites. Most of the fasciclin domains contained one or more N-linked glycosylation sites. After identifying the proteins, we were interested in determining what is known about the expression profile of each of the AGP protein backbone genes.
EST Contigs and Expression Summaries
TIGR maintains a database, the Arabidopsis Gene Index (AtGI), that combines the DNA sequence of the annotated proteins from the Arabidopsis genome and ESTs into contigs of overlapping sequence. This information is useful because it identifies expressed genes and provides an “expression summary” of each gene based on the library source of each EST. In addition, the alignment of clones can be used to check the annotation of the genomic sequences. For example, if the annotated protein is significantly shorter than the region spanned by all of the ESTs, it is possible that the protein is not correctly annotated.
To access the TIGR contigs, it is necessary to first identify a single EST by doing a translated Blast search (tBLASTn) against the EST database (Altschul et al., 1997). If an EST with high similarity (>95%) is found, the GenBank accession number is used on the AtGI search page to identify the contig (see “Materials and Methods”).
Many AGPs Are on the AFGC Microarray
Thirty-five of the 47 AGP genes (including FLAs) are represented on the current AFGC microarray (Wu et al., 2001b). Six of these genes are represented by more than one EST; for example, AGP4 is represented by four different ESTs (Table II). Currently, it is difficult to analyze all of the AFGC experiments (268 experiments on January 13, 2002) for a specific subset of genes. It is also very difficult to identify the “test” and “reciprocal” experiments because either no information is provided in the experiment description and/or related experiments are not given consecutive numbers. For example, AFGC experiments 4,940 and 11,510 are both labeled bacterial pathogen inoculation. In this case, there is limited information about the source of the RNA samples. One RNA sample is identified as “HRP” and the other as “DC3000.”
Reformatting AFGC Microarray Data to Compare Multiple Genes in All the Experiments
To make the AFGC microarray data more accessible, a computer program (Perl script) was written that makes it possible to view the ratio data for all AFGC experiments and as many genes as desired in a single spreadsheet. A “snapshot” of the output from this program is in Table III. By reformatting the data, it is easier to check the consistency of results where there are multiple ESTs for the same gene and also to identify experiments where there is no data.
Table III.
Reformating microarray data to compare multiple genes for each experimenta
Experiment IDb | Descriptionc | Genes
on the AFGC Arrayd
|
||||
---|---|---|---|---|---|---|
FLA2 AA042764 | FLA2 H37441 | FLA7 T88134 | FLA8 T46696 | FLA11 T04708 | ||
Ratio of channel 2:channel 1e | ||||||
7203 | Whole plant to root | 0.926 | 1.22 | 0.623 | 1.081 | 1.951 |
7205 | Root to whole plant | 0.997 | 1.214 | 1.916 | 0.82 | 0.379 |
11991 | CIMf to root | 0.182 | 0.235 | 0.421 | 0.667 | 0.389 |
11992 | Root to CIM | 7.567 | 1.885 | 2.117 | 1.933 | 2.346 |
12106 | SIMg to root | 0.378 | 0.688 | 0.5 | 0.992 | 0.45 |
12107 | Root to SIM | 1.715 | 1.899 | 1.487 | 0.821 | 2.366 |
15974 | CIM to SIM | 2.3 | 1.386 | 1.577 | NAh | NA |
15977 | SIM to CIM | 0.287 | 0.938 | 0.814 | 0.835 | NA |
This is a “snapshot” of a large spreadsheet containing all of the AFGC experiments (rows) and as many genes as desired (eg. columns 3, 4, … n).
The experiment ID is the one used by AFGC.
Descriptions have been modified to make it easier to know what samples are being compared (see “Materials and Methods”).
The gene name (e.g. FLA2) and the GenBank accession no. are provided by the user when saving the file on their local computer and are automatically included in the table when the data is reformatted (see “Materials and Methods”).
Values considered significant (i.e. less than 0.5 or greater than 2) are bold.
CIM, Callus-inducing media.
SIM, Shoot-inducing media.
NA, Not available.
The experiments that produced significant results for the AGPs are summarized in Table IV. The microarray results supports the suggestion that different AGPs have different functions (Knox, 1995; Schultz et al., 2000). For example, mRNAs for AGP5, AGP9, and AGP18 are more abundant in meristems than in leaves and AGP2 appears to be up-regulated by both Al stress and bacterial infection (X. campestris).
Table IV.
AFGC microarray results for four different classes of AGPs
Experiments with Significant Reciprocal Dataa | Expt IDb | AGPs
|
FLAs | ||
---|---|---|---|---|---|
Classical | Lys-rich | AG-peptides | |||
Flowers to leaves > 2.00 | 2371 | 2, 4, 6, 9, 10, 11 | 18 | 12c, 13, 14 | 1, 2c, 7, 8, 9d, 11, 12, 15, 16, 18e |
Meristem to leaf > 2.00 | 15973 | 5, 9 | 18 | – | – |
Root to whole plant > 2.00 | 7205 | 2, 3, 4f, 10 | – | 14, 22 | 11 |
Root to CIM > 2.00 | 11992 | 2, 3, 4g | 18 | 13e, 14, 22e | 2h, 7, 11, 15 |
Root to SIM > 2.00 | 12107 | 2, 3, 4i | – | 13e, 14, 22 | 11, 15 |
CIM to SIM > 2.00 | 15974 | 2, 3 | – | 14, 22 | 2i, 11k |
Indole acetic acid to water > 2.00 | 3743 | 3e | – | – | – |
abi1-1 to wild type > 2.00 | 11895 | 1l | – | – | 2j, 8 |
abi1-1 to wild type < 0.50 | 11895 | 11 | – | 22e | 1e, 6, 7, 12, 16 |
Mechanostimulation > 2.00 | 3715 | 1 | – | – | – |
Aluminum stress > 2.00 | 7304 | 1, 2 | – | – | – |
Aluminum stress < 0.50 | 7304 | 3, 5e | – | – | – |
24 to 36 h Darkness > 2.00 | 10186 | 1 | 18 | 12c | 1, 8e, 16, 18 |
Alternate oxidase, 30 min > 2.00 | 5197 | 1, 2e | 18 | 12 | 2, 18 |
Alternate oxidase, 4 h > 2.00 | 5198 | 2 | 18 | 12, 16e | 2j, 18 |
Pseudomonas syringae to HRP < 0.50 | 11510 | – | – | 12 | 9m, 18 |
Turnip crinkle virus symptom to asymptom > 2.00 | 19829 | 2n | – | – | – |
Xanthomonas campestris to wild type > 2.00 | 17455 | 2 | – | – | 1e, 2j |
axr3 Revertant to wild type > 2.00 | 11597 | 11e | – | – | 6, 12 |
axr3 Revertant to wild type < 0.50 | 11597 | 1o | – | – | 2, 8, 11e |
Chlorophyll starvation > 2.00 | 11604 | 1p | – | – | 8 |
Chlorophyll starvation < 0.50 | 11604 | 11 | – | 13 | 2j, 3, 6, 7e, 9q, 12, 18e |
Transcription inhibitor < 0.50 | 11374 | – | – | 16r, 20r | – |
The genes indicated in bold mean that the channel 2 to channel 1 ratio in both the “test” and the “reciprocal” experiments were “significant” for this gene (i.e. the ratio for the test experiment was greater than two and the ratio for the reciprocal experiment was less than 0.5, or vice versa). If one experiment was significant and the reciprocal was “borderline” (i.e. less than 0.66 or greater than 1.5, as appropriate) then the gene is in plain text.
AFGC experiment ID no.
Two independent clones and all ratios (four out of four) were significant.
Three independent clones and only one EST (two of six) were significant.
No data were available for the reciprocal experiment.
Four independent clones and seven of eight ratios were significant.
Four independent clones and six of eight ratios were significant.
Two independent clones and three of four ratios were significant and the other value was 1.89.
Four independent clones and five of eight ratios were significant.
Two independent clones and only the ratios for one EST were significant.
No data were available for either the test or the reciprocal experiment. We have included FLA11 because this gene was significant in two related experiments (experiment ID 11992 and 12107).
There are two sets of data for this clone and the ratios for the test were significant for both clones, and one of the reciprocals was borderline (i.e. 3.62 and 0.56; 2.44 and 0.862).
Three independent clones and one EST is considered significant; the other two are 0.60 and 0.68 in this experiment and both of the reciprocal experiments were significant (greater than 2.00).
This experiment was repeated twice. In one experiment, the test and reciprocal were both significant and in the repeat experiment both ratios were borderline (1.74 and 0.54).
There are two sets of data for this clone and three of four ratios were significant (the other is 0.67).
There are two sets of data for this EST and the ratios for only one set of data were significant and the other two were borderline (1.68 and 0.63).
Three independent clones and the ratios for two of the ESTs were significant.
This experiment was repeated three times. In two experiments, the control and reciprocal were both significant, the other replication was borderline or only one value significant.
Unexpected Complexity in the Response of AGP2 to Acid and Al
RNA gel-blot analysis was used to experimentally verify the results of the microarray experiment for Al stress for AGP2 because data verification is a key to interpreting DNA microarray results (Wu et al., 2001a). Arabidopsis plants were grown in liquid culture for 8 d. After this time, plants were transferred to low-pH liquid media, either with or without Al. Plants were sampled at 3, 8, and 24 h after the transfer to low-pH media, as was done for the microarray experiment (L. Kochian and O. Hoekenga, personal communication). In the microarray experiment, all three time points were combined. At 3 h, there is minimal difference between the Al-treated and untreated samples (Fig. 2). However, at 8 and 24 h, the expression of AGP2 is much higher in the plants grown in the presence of Al. The results of the RNA gel-blot analysis are consistent with the observed ratio of 3.3 for AGP2 expression when comparing Al-treated with non-treated samples.
Figure 2.
Confirmation of the Al stress microarray result for AGP2. A gene-specific probe for AGP2 was hybridized to a RNA gel blot containing RNA from Al-stressed and unstressed plants. Arabidopsis plants were grown in liquid culture for 11 d. After this time, plants were transferred to low-pH media, either with or without Al. Plants were sampled at 3, 8, and 24 h after the transfer to low-pH media. The staining of rRNAs with ethidium bromide shows the loading of RNA for each sample.
Affymetrix Array
The Affymetrix microarray GeneChips (Affymetrix, Sunnyvale, CA) are increasingly being used because of their in-built controls. Julian I. Schroeder's (University of California, San Diego) group has made it easier to determine if a gene of interest is on the Affymetrix chip (Ghassemian et al., 2001). AGP-encoding genes on the Affymetrix array include two AG-peptides (AGP12 and AGP16), a Lys-rich AGP (AGP18), and six of the FLAs (FLA2, FLA6, FLA7, FLA8, FLA9, and FLA16). We are now in a position to contact individual laboratories to determine how some of the AGP protein backbone genes respond in other experiments of interest. The Affymetrix and AFGC arrays are not the only microarray resources available, but they are widely used and they both contain about one-third of all the Arabidopsis genes. Other Arabidopsis microarray resources are listed in Wisman and Ohlrogge (2000) and Reymond (2001).
Checking for Insertion Mutants in Each Gene
Initially, we looked for tagged mutants using PCR from the pools of DNA from the Feldman lines that are distributed by the Arabidopsis Biological Resource Center (ABRC; Ohio State University, Columbus), using the method described by McKinney et al. (1995). Plant lines that contained insertions in the 3′ end of AGP3, the 5′ region of AGP17, and in the coding region of AGP16 were identified (Table V).
Table V.
Insertion mutants identified for AGP gene family members
Insertion Lines | Genes
|
|||
---|---|---|---|---|
AGPs | AG-peptides | Lys-rich AGPs | FLAs | |
Feldman linesa | 3b | 16c | 17d | 1c |
Wisconsine | Not tested | Not tested | 17b, 18b | None for FLA7 |
IMAg | None | None | None | 8c |
Insertwatchh | None | None | 18c | 2c, 6c |
Genoplante flanking sequence tags (FST) | 4d | 12b, 14d, 21bd | 18b | 3d, 10d, 16b, 18d, 19c |
Torrey Mesa Research Institute (TMRI) | 1cd, 2d, 3cd, 4d, 5d, 7d, 9cd, 10d, 11d | 13d, 14d, 15d, 16d, 20d, 22d, 23d | 18d– | 3d, 4d, 5d, 6c, 8d, 9b, 10bd, 12c, 14c, 15cd, 17cd, 18cd, 21d |
Samples of pooled DNA for 6,000 T-DNA-tagged lines were obtained from ABRC. Not all of the genes were checked.
The tag is in the 3′ region of the gene.
The tag is in the coding region of the gene.
The tag is in the 5′ region of the gene.
Primer pairs for a limited no. of genes were sent to AFGC service at Wisconsin (see “Materials and Methods”).
The tag is in the single intron of this gene.
Border sequences of 500 lines were sequenced from Ds insertion lines (Parinov et al., 1999).
Genes containing tags were identified searching the Web sites listed in Figure 1.
The sequencing of the border regions of large populations of insertion mutants makes it possible to identify mutants more efficiently (Liu et al., 1995; Parinov et al., 1999). For example, the Nottingham Arabidopsis Stock Centre (UK) has released more than 2,000 flanking sequences. These lines can be checked using the Insertwatch service of the Nottingham Arabidopsis Stock Centre. As more lines are sequenced, users will be notified by e-mail when a tag is identified in a gene of interest because the sequence (and e-mail address) is automatically registered. Genoplante have released 8,500 flanking sequences (S. Balzergue, personal communication) and TMRI has released the flanking sequences of 100,000 lines. From these various resources, we have identified insertion mutants in the coding region of 16 of the 47 AGPs and many other lines have insertions 1 to 2 kb upstream or downstream of the genes (Table V).
DISCUSSION
Finding Genes Using Biased Amino Acid Composition and Size
With the completion of the Arabidopsis genome, we wanted to be certain that most of the AGP protein backbone gene family members were identified as we are attempting a systematic analysis of AGP function. Finding AGP protein backbone genes with BLAST searches is very time consuming because for each gene, it is necessary to analyze every promising “hit” by obtaining the full-length sequence for the similar protein and determining whether the sequence matches the criteria for an AGP. By designing our own software, each of the 25,617 predicted proteins in the Arabidopsis genome was evaluated for the most prominent feature of an AGP, namely the high PAST. This analysis (>50% PAST) identified 62 candidate genes, less than the number of sequences that can be returned for a single BLAST search (Table I). Using tailored approaches for each type of AGP, we have identified three new classical AGPs and two new AG-peptides (plus three AG-peptides that we had previously identified but not published).
Only one of the AGPs (AGP3) that we had previously identified was not found using this approach (Schultz et al., 2000). AGP3 is not one of the annotated proteins, although there is a region on contig 92 (GenBank accession no. AL161596) that is almost identical to the cDNA clone for AGP3. The annotation says the genomic sequence near this region of similarity was amplified by PCR. The genomic sequence has three separate 2-bp deletions compared with the cDNA sequence (GenBank accession no. AF082300).
We are confident that we have identified most of the AGPs that have been sequenced because most of the classical AGPs and AG-peptides do not have introns and therefore their annotation is relatively straight forward. Those that do have introns have a single intron between the exon coding for the mature protein backbone and the exon coding for the C-terminal signal for addition of the GPI anchor. Therefore, even if the AGPs with introns are incorrectly annotated, because the second exon is missing, they would still be detected using the biased amino acid approach.
We have not identified all of the proteins containing AGP glycomodules. Proteins with AGP glycomodules include the FLAs, the chimeric (nonclassical) AGPs (Mau et al., 1995), some of the early nodulin genes (e.g. ENOD5; Scheres et al., 1990), the somatic embryogenesis receptor kinase (Schmidt et al., 1997), certain chitinases (Yamagami and Funatsu, 1994), and the lipid transfer protein and the blue copper-binding proteins identified in this study. With the exception of FLAs we have focused on those cell surface molecules where AG glycosylation is the dominant feature of the entire protein backbone.
Finding all of the protein backbones containing AGP glycomodules could be achieved by calculating the PAST percentage in overlapping “windows” of 15 to 25 amino acid residues. A “windows” approach should pick up less false positives than further reducing the PAST percentage. When the PAST percentage is reduced to 39%, the level needed to identify FLA7, other proteins such as a histone H1 (At2g30620, 46% PAST), a β-1,3-glucanase (At1g26450, 45% PAST), and a transcription factor (At1g60210, 49% PAST) are identified.
To identify all of the FLAs, we decided to use the fasciclin domains as our searching criteria. Twenty-one proteins in Arabidopsis contain one or two fasciclin domains and all of these proteins have at least one region containing AGP glycomodules. The size of the AGP region is variable (data not shown). The FLAs can be separated into three classes (Gaspar et al., 2001). The largest class of FLAs has two regions containing AGP glycomodules and two fasciclin domains. Another class has two regions containing AGP glycomodules flanking a single fasciclin domain (FLA7 and FLA9). The third class of FLAs has only a single region containing AGP glycomodules and a single fasciclin domain. There are no ESTs for any of the FLAs from this third class (FLA3, FLA5, FLA14, and FLA19), so it is not known if this class of FLAs is expressed.
In plants, all of the proteins containing fasciclin domains also contain AGP glycomodules, suggesting that regions of the protein backbone will be glycosylated with large O-linked polysaccharide chains. This is generally not the case with animal proteins containing fasciclin domains. However, the fasciclin domains in the protein AlgalCAM, involved in cell adhesion in Volvox carteri, are flanked by SPn and TPn motifs (Huber and Sumper, 1994). One of our goals is to determine whether the cell adhesion function of animal, insect and algal proteins containing fasciclin domains is conserved in Arabidopsis.
Other P/HRGPs in Arabidopsis
Our search for AGPs also identified 19 genes encoding extensins. The details of the extensins is described elsewhere (Johnson et al., 2002). Proteins are classified as extensins if they contained repeats of Ser-Pro3 and/or Ser-Pro4 motifs and these “submotifs” are predominantly separated by Tyr, Lys, and/or Val residues to form part of a larger repeat (Showalter, 1993; Kieliszewski and Lamport, 1994). None of the 19 extensins identified using the biased amino acid approach were predicted to be GPI anchored. There are two regions in the genome, near At3g54580 and At4g08390, where there has been a tandem duplication of extensin genes. No tandem duplications were observed for the AGPs.
Twelve of the 19 extensins identified here contain a Ser-Pro-Ser-Pro motif in the middle of every second Tyr-, Lys-, and Val-rich spacer [e.g. S(P)4YVYSS(P)4YYSPSPKV(D/Y)YK]. This suggests that these extensins will have both large AG-containing polysaccharides (attached to non-contiguous Hyp residues) and short arabinosyl chains (attached to contiguous Hyp residues; Goodrum et al., 2000). This is similar to the alternating large and small carbohydrate chains predicted in the twisted hairy rope model (Qi et al., 1991) for the GAGP. There is now good experimental support for the attachment of large and small carbohydrate chains to the protein backbone of the GAGP (Goodrum et al., 2000). However, GAGP is not an extensin because it contains few Tyr, Lys, and Val residues and it is highly soluble (Qi et al., 1991). The Tyr, Lys, and Val residues are believed to be important in cross-linking extensins into the plant cell wall (Kieliszewski and Lamport, 1994). It will be of interest to determine if the extensins that contain the Ser-Pro-Ser-Pro motif have different structural and functional properties than the extensins lacking this motif.
Extensins, unlike the AGPs, contain conserved repetitive motifs, so it may have been possible to identify many of the extensins using BLAST searches. One advantage of the biased amino acid approach is that the entire protein is returned in the output, rather than a small region of the protein. Therefore, it is possible to classify the protein immediately.
The software we have developed can be modified for many different classes of proteins with biased amino acid compositions, e.g. the Gly-rich proteins and the PRPs (Johnson et al., 2002). The 50% PAST threshold for identifying AGPs did not pick up many of the PRPs previously identified in Arabidopsis (Fowler et al., 1999). The reason for this is that the PAST percentage of PRP1 through PRP4 ranges from 32% to 45%. Although the PRPs are rich in Pro residues, they contain few Ala residues (2%–5%). Specific searching criteria need to be developed for each different family based on the amino acid bias of known family members (Johnson et al., 2002).
Identifying and Evaluating Genomic Resources
DNA insertion mutants are essential in our quest for understanding the function of AGPs. Finding tagged mutants is now considerably easier than 5 years ago due to the widespread sequencing of the genomic DNA flanking DNA insertions (Liu et al., 1995; Balzergue et al., 2001). The FST database at Genoplante is particularly user friendly because it shows all insertions within a 20-kb window of the input sequence and it displays the exon-intron borders of all the annotated proteins in this region. A search of the Genoplante database also identifies insertions in similar genes. Genoplante lines were generated in the Wassilewskija ecotype of Arabidopsis, the same genetic background as the Feldman lines (McKinney et al., 1995). The interpretation of crosses between mutants from these two different lines will not be complicated by the segregation of ecotype-specific differences. Another excellent source of tagged lines is the 100,000 TMRI lines. As shown in Table V, most of the mutants we have identified come from these new lines. The TMRI lines are in the Columbia ecotype.
The first DNA insertion mutants we identified for AGP protein backbone genes came from PCR screening of DNA pools from the Feldman lines (McKinney et al., 1995). With the exception of AGP17, preliminary characterization of the AGP mutants has not identified any obvious phenotypes (K.L. Johnson, C.J. Schultz, and A. Bacic, unpublished data). In the line with a tag in the promoter of AGP17 (ABRC stock center no. CS12955), a phenotype was uncovered by Professor Gelvin's group (Nam et al., 1999) when searching for rat mutants. This mutant has a tag 1,097 bp upstream of the start codon of AGP17 (Gaspar et al., 2001).
Finding phenotypes for the other AGP mutants is a major challenge. Two different approaches have been used by other researchers to uncover phenotypes. One is to apply a wide range of environmental conditions and/or stresses (Meissner et al., 1999; Boyes et al., 2001). An alternate strategy is to make double or even triple mutants (Krysan et al., 1999; Halpin et al., 2001). Both of these approaches are time consuming and the number of possible experiments/combination of crosses means that it is not practical to apply this approach to all of the genes in a large gene family. In many cases, mutants may not yet be available; therefore, the decision needs to be made whether to make antisense/RNA interference constructs (Wesley et al., 2001). To help focus our research on AGPs, we have developed a flow chart to help us learn as much as possible about each family member and thereby allow us to plan targeted experimental approaches (Fig. 1). This flow chart is applicable to all multigene families.
Large tissue-specific libraries from the Kasuza Research Institute (Chiba, Japan) provide an electronic RNA gel-blot analysis (Asamizu et al., 2000). The number of ESTs for each AGP from four of these tissue-specific libraries is shown in Table II. Summaries of the number of ESTs from each library are maintained by TIGR as an additional feature on the AtGI (i.e. contigs) Web page. In addition to the tissue-specific libraries, there are now several stress-enriched libraries such as the salt-induced transcripts from 10- to 14-d seedlings (TIGR no. 6523). Although only 148 ESTs have been sequenced, ESTs representing two of the AGP genes were found in this salt-stressed library. These two genes, AGP12 and FLA2, were also significant in several of the stress-related AFGC microarray experiments. For example, they were both up-regulated in the presence of antimycin A, a compound that blocks electron transport in the mitochondria and leads to the induction of the alternate oxidase pathway (Yu et al., 2001).
AFGC Microarray Experiments Are a Valuable Resource to the Arabidopsis Community
The AFGC microarray results are particularly attractive to researchers of genes with unknown function. Until now, it has been very difficult to analyze all of the members of a multigene family in all of the AFGC experiments. By reformatting the data, it is possible to view the ratio data for all AFGC experiments and as many genes as desired in a single spreadsheet. One of the other difficulties in interpreting the AFGC data is that it is very time consuming to determine what experiments were actually performed. This is necessary to identify the appropriate control (reciprocal) experiments. By grouping reciprocal experiments and providing more details about each experiment, we have made the AFGC data more accessible.
We will collaborate with the AFGC to develop a Web-based version of this software for inclusion at the Stanford Microarray Database. This will make it even easier for researchers to: (a) analyze their favorite genes, (b) check the consistency of results where there are multiple ESTs for a single gene, and (c) identify experiments where there is no data. This last point is important because a nonsignificant ratio has a very different meaning than no data.
Specific AGPs Respond to Biotic and Abiotic Stress
One hypothesis is that all AGPs of the same subclass would have the same function. The microarray results suggest that this is not the case. Rather, specific AGPs from each subclass respond to similar stimuli. Classical AGPs (AGP1 and AGP2), Lys-rich AGP18, AG-peptide AGP12, and FLAs (FLA2 and FLA18) are up-regulated by antimycin A treatment (experiments 5197 and 5198, Table IV). This treatment leads to stress by inhibiting electron transport in the mitochondria and induces the expression of alternate oxidase. In the published analysis of this microarray experiment, cluster analysis was used to identify other experiments where a significant number of genes were regulated in the same manner (Yu et al., 2001). Many of the genes that were affected by antimycin A treatment were also affected by other stresses including Al stress (experiment 7304), cadmium stress (experiment 7427), viral infection (experiment 7342), and the induction of cell death by hydrogen peroxide (experiment 9371). However, the regulation of AGP genes was only affected by one of these stresses, Al stress. The other experiments (e.g. cadmium and viral infection) are not included in Table IV because they were not significant. Only two AGP genes (AGP1 and AGP2) responded significantly to Al stress (Table IV). This highlights the ability of the plant to elicit different responses to specific external stimuli even though there is evidence for significant cross talk in different stress responses (Knight and Knight, 2001).
To validate the results of the Al microarray experimentally, RNA gel-blot analysis was used. Our results show that at 3 h after treatment with Al, there is minimal difference between the treated and untreated samples (Fig. 2). However, at 8 and 24 h, the expression of AGP2 is much higher in the plants grown in the presence of Al. Unfortunately, there was no control for the zero time point in this experiment where the media was changed to a low-pH media for both the Al-treated and untreated samples. The simplest scenario is that AGP2 expression is relatively abundant before changing the media and that minimal changes in expression levels occur in the first 3 h after the media change. This is supported by the fact that five of the 14 ESTs for AGP2 were obtained from a library made from liquid-cultured seedlings (AtGI library no. 5338). The dramatic differences seen at 8 and 24 h after the addition of the stress suggest that AGP2 is not involved in the recognition of stress, but rather, it is an integral part of the plant response to Al stress.
Precisely how AGP2 helps the plant respond to Al stress is an important question for the future. It is interesting that only 255 of 8,000 nonredundant genes on the AFGC array were specifically up-regulated by Al. AGP2 showed the 42nd highest ratio between treated and untreated samples on the array. As with all microarray experiments, it is important to validate the result before proceeding with further analysis (Wu et al., 2001b). This is particularly true for the Al stress experiment because it is labeled as showing strong spatial bias on the microarray. These findings support the decision by AFGC to release experiments showing strong spatial bias to the Arabidopsis community despite the possible bias in certain sectors of the array.
Spatial bias is where the expression ratios are influenced by the physical position of the genes on the array (Finkelstein et al., 2002). Spatial bias on the AFGC arrays is inconsistent and array dependent. It is detected by ANOVA and is where more than 10% of the variance in log2 ratio measurements can be explained by the spatial position of the spot on the array (Finkelstein et al., 2002).
AGP2 is also up-regulated by biotic stress. When wild-type plants are infected with the bacterium X. campestris, AGP2 expression is higher than in mock-treated plants. However, in etr1 plants, the levels of AGP2 expression were not affected by bacterial infection. etr1 mutants lack one of the ethylene receptors (Hall et al., 1999). This result suggests that the expression of AGP2 is, at least in part, controlled by ethylene.
A Different Subset of AGPs Respond to Developmental Changes
AGPs are known to be abundant components of the transmitting tract of styles (Sommer-Knudsen et al., 1997; Wu et al., 2001a). As such, it is not surprising that many of the AGPs are more abundant in flowers than in leaves (Table IV). Four different AGP genes, AGP6, AGP11, AGP14, and FLA1, are among the top 60 genes showing a high flower to leaf expression ratio. AGP5, AGP9, and the Lys-rich AGP AGP18 are more abundant in meristems than in leaves.
The expression level of at least one AGP from each subclass is differentially expressed in roots compared with undifferentiated or redifferentiating tissue (Table IV). These microarray experiments were performed in tissue culture by comparing roots with root explants that had been placed on callus-inducing media and subsequently on shoot-inducing media. These findings support the theory that certain AGPs are markers of cell identity (Dolan et al., 1995; Casero et al., 1998) and suggest that a series of tissue culture experiments using root explants may be useful for revealing a difference between wild-type and AGP mutant plants.
Our microarray reformatting program is complementary to a new program released by AFGC called Expression Viewer. Expression Viewer is designed to identify groups of genes that have similar expression patterns over several different experiments in the same category (i.e. abiotic stress or development). This option is accessed by selecting the second (right) icon that appears on selected ESTs (i.e. those on the array) after searching the clone list (see “Materials and Methods”). This program is particularly useful for identifying unrelated genes that are regulated in the same manner as the gene of interest.
The Expression Viewer is not as “sensitive” as our program at picking up genes with similar expression patterns over only a few experiments. For example, if you choose the EST for AGP18 (205N2T7), the results in the hormone dataset of the expression viewer only show two ESTs (both corresponding to a non-AGP gene, At1g22530). The hormone dataset of experiments includes the 30-min antimycin treatment where AGP18, FLA2, and FLA18 are all significant in the both the test and the reciprocal experiments (Table IV). In the development dataset of experiments that includes the flowers to leaves comparison, two of the FLAs, FLA1 and FLA8, are identified as being regulated in a similar manner as AGP18, as are seven other non-AGP genes (e.g. an endo-1,4-beta glucanase [122H24T7/At4g02290]). However, many other AGP genes were identified as significant in our analysis (Table IV). Our approach has the limitation that it only looks at a user-defined subset of genes. By combining these two approaches, and performing whole-gene family analysis on the different classes of genes identified by Expression Viewer, it should be possible to identify other gene families that are important for AGP function.
CONCLUSION
Evaluating the available genomic resources, as outlined in Figure 1, should help all researchers determine which family members to concentrate on for specific categories of experiments. In some cases, it will be desirable to concentrate on genes with similar expression/response profiles, or it may be preferable to choose ones with different profiles. By checking just a few Web sites, researchers can determine whether an insertion mutant already exists for the gene of interest, and failing this, they can choose to make knockouts using RNA interference (Wesley et al., 2001) constructs for crucial genes.
When experiments are performed that require the sampling of plant tissue, the EST expression summary and AFGC microarray data will help select the most appropriate tissue for analysis to maximize the levels of RNA for all genes. This information will also help in the choice of genes for analysis. Obviously, all of the genes with significant microarray results in test and reciprocal experiments should be used. In many cases, only a few genes need to be tested. The inclusion of genes that are borderline (i.e. one significant value and the other value either less than 0.66 or greater than 1.5 as appropriate) is easily done and may prove informative. If the results are not significant, this gene can be used as a control.
In cases where mutants do not have a phenotype, the information accumulated will help researchers make more informative choices about the treatments to perform to uncover phenotypes and which mutants to cross. For example, AGP1 and AGP2 are regulated in a similar fashion; therefore, they are more likely to be functionally redundant than say AGP1 and AGP6. Therefore, a cross between agp1 and agp2 mutants may be more informative than a cross between agp1 and agp6 mutants.
The new data on AGPs presented in this manuscript support the suggestion that different AGPs have different functions. We are initially focusing our research effort on those candidate AGP genes, based on EST and microarray data, which are specifically implicated in plant development and in biotic and abiotic stress responses and for which we have access to putative loss-of-function mutants.
MATERIALS AND METHODS
Plant Material
Wild-type Arabidopsis (Columbia-0 strain CS1092, ABRC) plants were used. For the Al stress experiment, seed were sterilized and grown in 50 mL of media in 250-mL conical flasks, approximately 40 seeds per flask. Flasks were rotated slowly at room temperature under continuous fluorescent light. Plants were germinated and grown for 11 d in an unbuffered media containing 250 μm NH4SO4, 250 μm Ca(NO3)2.4H20, 200 μm KH2PO4, 1 mm CaCl2.2H2O, 1 mm MgSO4.7H20, 1 mm K2SO4, 1 μm MnSO4.4H2O, 5 μm H3BO3, 0.05 μm CuSO4.5H2O, 0.2 μm ZnSO4.7H2O, 0.02 μm Na2MoO4.2H2O, 0.001 μm CoCl2.6H2O, and 1% (w/v) Suc (measured pH was 5.4). After 11 d, the media was poured off and 50 mL of fresh low pH media, either with or without 50 μm AlCl3.6H2O, was added. The low-pH media was the same as the germination media, except that KH2PO4 was omitted and replaced by 3 mm homopiperazine-1,4-bis(2-ethanesulfonic acid), to give a pH of 4.5. Three separate flasks contained low-pH media with 50 μm AlCl3 and three control flasks contained the low-pH media without Al. Two flasks were harvested for each time point (3, 8, and 24 h), one with, and one without, AlCl3.
Finding AGPs and AG-Peptides Using Biased Amino Acid Composition and Length
A Perl script (protein bias) was written to calculate the PAST for all of the proteins in the Arabidopsis annotated database. The program can be downloaded from our Web site (http://planta.waite.adelaide.edu.au/people/cs/index.htm). The annotated database (ATH1.pep; August 10, 2001) was downloaded from the TIGR Web site (ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/SEQUENCES/). For information on the Perl language, consult Wall et al. (2000). The Perl program is available as freeware for UNIX, Windows, and Macintosh operating systems. For Windows 2000, we used Achiever 5.6.1. build 630 (http://www.activeperl.com/). The protein bias program generated two different lists of proteins. The first list (called “long”) contained all the proteins above a certain PAST threshold (either 55% or 50%) and a separate list (“short”) was generated containing all the proteins between 55 and 75 amino acid residues in length and >35% PAST. A reduced PAST level was necessary to identify AG-peptides because the N- and C-terminal signals account for more than 50% of the deduced proteins. For each protein in the list, the following information was provided: (a) the AGI locus and sequence name; (b) the entire protein sequence; (c) the length of the protein; (d) the number and percentage each for Pro, Ser, Thr, and Ala; and (e) the total PAST for each protein. An additional output file was generated for both the long and the short lists so that the first 50 amino acid residues of each sequence could be pasted into an e-mail and sent to SignalP (signalp@cbs.dtu.dk) to check for the presence of N-terminal signal sequences without modifying the file. This included specifying eukaryote on the first line, and ending the file with a period. Only the first 50 amino acid residues were included for each protein because this is the length that was recommended (Nielsen et al., 1997).
Proteins were classified as AGPs if they did not contain repeats associated with extensins or PRPs (e.g. Ser-Pro4 or Pro-Pro-Xaa-Yaa-Lys), but contained predominantly Ala-Pro, Ser-Pro, or Thr-Pro throughout the protein with no more than 11 amino acid residues between consecutive Pro residues. The exception here is the Lys-rich AGPs that are a subclass of GPI-anchored AGPs. These AGPs have a Lys-rich domain of approximately 16 amino acid residues that is flanked on both sides by AGP glycomodules. AGPs were defined as chimeric if they contained a region with Ala-Pro, Ser-Pro, and/or Thr-Pro motifs and other regions with 20 or more amino acid residues between (A/S/T) P motifs (excluding the N- and C-terminal signals). These proteins included an ENOD20-like protein (At4g27520), a nonspecific lipid transfer protein (At1g36150), and two blue copper-binding proteins (At3g60280 and At5g53870). In most cases, the AGPs and the AGP chimeric protein backbones were predicted to be GPI anchored by PSORT (Nakai and Horton, 1999).
To identify AG-peptides, the “short” output from the protein bias program (again searching the 25,617 annotated proteins in the Arabidopsis database) was analyzed. Proteins were classified as AG-peptides if the encoded protein backbone was between 55 and 75 residues in length and also contained an N-terminal signal sequence and at least two consecutive Ala-Pro or Ser-Pro motifs in the mature protein backbone. AGP23 contained only two consecutive Ala-Pro or Ser-Pro motifs and all of the other AG-peptides (AGP12-AGP16 and AGP20-AGP24) contained three consecutive motifs. All of the AG-peptides, except AGP16 and AGP20, are predicted to be GPI anchored by PSORT (Nakai and Horton, 1999).
Proteins were classified as extensins if they contained repeats of Ser-Pro3 and/or Ser-Pro4 and these repeats were mostly separated by Tyr, Lys, and Val residues. One of the extensins (At4g08370) contained Ser-Pro5 repeats and an S(V/A) PR(I/V)(P/T) FIY spacer. The single “hybrid” P/HRGP (At1g62763) contained many Ser-Pro and Ser-Pro2 motifs, two SSPPPSLSLPSSPPPPPP motifs in the N-terminal domain of the mature protein (116 residues), and a C-terminal domain (171 residues) with similarity to citrus pectin methyl esterase (At1g62763). The N-terminal domain is not rich in Tyr, Lys, and Val; therefore, this protein is not considered an extensin. There are no ESTs for this “hypothetical” protein.
The proteins classified as others included several PRPs/hybrid PRPs (At2g27380, At3g22120, At4g22470, and At5g14920), a possible En/Spm transposon (At2g28440), and proteins containing no known or consistent motifs (At5g11990, At2g22510, At1g31250, and At3g22070) as determined by Profilescan (http://hits.isb-sib.ch/cgi-bin/PFSCAN). The two proteins classified as short extensins may not be annotated correctly based on TIGR contigs (data not shown).
To find FLAs, we obtained the full Pfam alignment of 88 fasciclin domains from the Pfam Web site (http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF02469). This alignment was saved as a text file and used to build a hidden markov model for fasciclin domains using the HMMbuild program in HMMER2.2 (Durbin et al., 1998). HMMER2.2 was downloaded from http://hmmer.wustl.edu/and run on a Unix machine. The hidden markov model that was generated by HMMbuild was used to search the Arabidopsis protein database from the DeCypher Web site (http://decypher2.stanford.edu/algo-hmm/HMM_ha.html-ssi) by pasting the model into the appropriate box. The Arabidopsis database available at the DeCypher Web site contained 25,571 proteins and was dated May 14, 2001. A total of 21 proteins containing fasciclin domains were identified.
A list of the CloneIDs for many of the ESTs that we sequenced is in Schultz et al. (2000). Other ESTs that we have sequenced are as follows. AGP23, ATTS1006; FLA1, 172E1T7; FLA2, 181J22T7; FLA6, 168P19T7; FLA7, 90M23T7; and FLA9, 194D6T7. These additional ESTs were full length and the GenBank accessions for the full-length cDNA clones are included in Table II. These cDNA clones can be obtained from ABRC or from us.
Key Web Sites and Options Needed to Access Information
We have developed a flow chart to streamline the information available to us about the AGP genes (Fig. 1). The links to follow and/or the options to select at each Web site are detailed below using the numbering system in Figure 1.
Link 1 is genomic information. By typing the GenBank accession or GenInfo Identifier number and selecting search, you get a page containing the following information: AGI locus (e.g. At1g35230); bacteria artificial chromosome locus (e.g. T9I1.2); and a graphical display of the annotation of the gene using the three commonly used gene prediction programs, GlimmerA (Mihaela Pertea, http://www.tigr.org/tdb/glimmerm/glmr_form. html), Genscan+ (Chris Burge, http://genes.mit.edu/GENSCAN.html), and GenMarkHMM (Mark Borodovsky, http://opal.biology.gatech.edu/GeneMark). These and other annotation programs have been reviewed recently (Aubourg and Rouzé, 2001). There are also links to nucleotide sequences (pre- and post-processing), protein sequences, and Interpro Scan results (Zdobnov and Apweiler, 2001). If there is a full-length Ceres EST for the gene, there is a comment stating that the annotation is consistent with the EST.
In link 2, SignalP is used to determine if the protein contains an N-terminal signal sequence for targeting of the protein (Nielsen et al., 1997): Only the first 50 to 70 amino acid residues are submitted and “euk” is selected as the organism group. Alternatively, a file containing a list of N-terminal sequences (in FastA format) can be e-mailed to signalp@cbs.dtu.dk.
In link 3, PSORT (Nakai and Horton, 1999) is currently the best program for predicting the presence of a C-terminal signal for the addition of a GPI anchor on AGPs in plants (see “Results”). PSORT also predicts the presence of N-terminal secretion signals; however, it uses an old version of SignalP and is not as accurate.
In link 4, Profilescan allows you to search Prosite and Pfam databases for protein motifs. You should select all four databases.
In link 5, to find the “first” EST for each gene, it is best to do a tBLASTn search against the EST database. It is necessary to choose database EST_others (the default is nonredundant [nr]). To increase the speed, limit the search to Arabidopsis ESTs (select from the pull-down menu). To maximize the chance of finding an EST, select the following options: turn filtering off (especially important for repetitive proteins), set the word size to 2, and increase the expectation to 100 or 1,000. Once an EST is found with high similarity (>95%), the GenBank accession number (GB#) can be pasted into the fourth box on the TIGR AtGI Reports page (see 6 below) to find the contig (TC#) for the specific gene.
In link 6, the GenBank accession number of any EST can be used to find the contig. To view the contig, select the link for contig (e.g. TC109594). The contig shows the overlapping ESTs and indicates the source of each EST. To determine the library source of each EST, select the expression profile button to view the summary of hits from each library.
In link 7, the best way to determine if any of the ESTs are on the AFGC array is to enter the AGI locus (e.g. At5g10430). This will provide a list of all ESTs for the gene. The ESTs that are on the array will come up with two colored icons. The left (array) icon gives you the histogram showing the ratio data of all experiments. If nothing appears when you click an icon, different Web browser should be tried (e.g. Internet Explorer or Netscape 6). To save all of the data for this EST onto a local computer, click on the “download all data” button. By saving this file as a tab-delimited file with an appropriately formatted filename (see below), this data can be reformatted using our custom software to make it easier to interpret the AFGC microarray data. The right (Expression Viewer) icon on the EST clone list is explained in “Discussion.”
First time users of the AFGC microarray facility may want to click one or more of the “outlying” green bars on the histogram (i.e. those with a ratio of less than 0.5 or greater than 2.0). The experiments with the selected ratio will appear on the right (e.g. Shoot Development Affy Scan 4). Click on the “display data” button to obtain more information about the experiment(s) of interest. Select the “option” icons on this “search results” page to obtain details about each experiment. Click on the “view” icon to see what RNA samples were used in channel 2 (red) and channel 1 (green). Note the stated ratio should be red (R) to green (G). The right most of the option icons on the search results page provides a scatter plot, which gives an indication of how many genes have significant ratios. To see genes with the 50 highest ratios in any experiment, click on the data icon, then highlight the CloneID, Gene model, Accession, and Description options, then click on the submit button (do not change any of the other preselected values the first time). If you want to see the histogram data for one of the genes in the top 50, select the “history” link.
To obtain more information about each experiment (i.e. abstracts and RNA information), go to http://afgc.stanford.edu/afgc_html/site2Cycle1.htm. An alternate way to access all of the experiments is from http://genome-www5.stanford.edu/MicroArray/SMD. From this page select public search to get to the advanced results search page. From here, select: (a) Arabidopsis, (b) All (experiments), and (c) display data.
In link 8, the Affymetrix GeneChip array has approximately 8,200 genes on it. It is an oligonucleotide-based array with 16 probe pairs per gene. For each probe pair, there is an exact match 25 oligomer and a corresponding oligomer with a single nucleotide mismatch. To determine if a gene is on the GeneChip, the AGI locus identifier is used to search the file downloaded from http://www-biology.ucsd.edu/labs/schroeder/downloads.html (Ghassemian et al., 2001). The following AGP genes are on the Affymetrix array and the Affymetrix identifier is shown in parentheses: AGP12 (18171_at), AGP18 (14947_at), AGP16 (16457_s), FLA2 (18265_at), FLA6 (14076_at), FLA7 (16513_at), FLA8 (12787_at), FLA9 (16438_at), and FLA16 (13144_at). It is also possible to obtain a list of promoter sequences for any of the genes that are on the Affymetrix array from the Schroeder lab Web site.
In link 9, FSTs from the Versailles collection of Genoplante are identified by selecting the “requests to FLAGdb/FST” link. Only partial DNA sequence is required. The first sequence should have an E value of 0.0 and represent the exact match. The last column will indicate the number of FST (if any) that are found in a 20-kb window surrounding the gene. This page displays the exon-intron borders of all the annotated proteins around your gene and a “flag” icon is used to indicate the position of the FSTs. A red flag indicates that this is the best match in the genome for the FST. A green flag indicates that there are better matches in the genome.
In link 10, it is necessary to submit sequences one at a time to search the TMRI Arabidopsis T-DNA collection. The searches are not confidential and these lines have a more restrictive material transfer agreement (MTA) than the other lines. The MTA can be downloaded from the site. The BLAST results are e-mailed to the address provided.
In link 11, single or multiple sequences can be submitted to Insertwatch. For those wishing to submit multiple sequences, this site provides a description of the required “FastA” format. If you do not want to be contacted automatically when a new flanking sequence matches your gene, use Insertblast. Only 2000 sequences are currently available, but more sequences are expected to be released.
In link 12, the IMA lines were generated by Professor Venkatesan Sundaresan (Institute of Molecular Agrobiology, Singapore). He is now at the University of California (Davis). Approximately 500 border sequences were obtained from Ds insertion lines (Parinov et al., 1999). To search the table of sequences, the bacteria artificial chromosome locus (e.g. F4L23, see link 1 above for how to obtain this number) or the GenBank GenInfo Identifier number can be used.
Reformatting AFGC Microarray Information
A Perl script, ma_analysis.pl, was written to reformat the AFGC microarray data into a format where the ratio data for ALL of the AFGC experiments and as many genes as desired can be viewed in a single spreadsheet. Input1 for this program is a tab-delimited spreadsheet (“arraystart.txt”) that contains the category of each experiment as defined by AFGC (column 1), the experiment ID (column 2), and a description of each experiment rewritten to make it more clear what is being compared (column 3). “Test” and “reciprocal” experiments (where present) are on consecutive rows, with unrelated experiments separated by a blank row. In all cases, the experimental details are based on the Channel 2 (red) to Channel 1 (green) description found in the “view” option from the “advanced search page” on the AFGC Web site (January, 2002). In some cases, e.g. leaves to flowers (experiment ID 2,370), this information is not consistent with the experiment name “flowers leaves.”
The microarray data for each gene is downloaded from the AFGC Web site (see 7 above) and saved onto the local machine as a tab-delimited file (.txt). The filename given to the data for each gene must be in a precise format because this information provides the column heading information in the final spreadsheet containing all the genes. The format is GeneName_CloneorGenBankID_Date, with an underscore separating the components; for example, AGP4_T41664_28.11.01. The underscore character cannot be used as part of “GeneName,” “CloneorGenBankID,” or “Date”.
The filenames of each file containing the microarray data are placed in another file, one filename per line and in the order that the columns should appear in the final spreadsheet. This list file is saved as a text only file (e.g. datafile.txt). The program and arraystart.txt file can be downloaded from our Web site (http://planta.waite.adelaide.edu.au/people/cs/index.htm). To run the program on either a Unix- or Windows-based computer, go to the command line and type Perl ma_analysis.pl arraystart.txt datafile.txt >arrayout.txt. The command line places the output into the file arrayout.txt in tab-delimited format and is suitable for exporting into various computer applications, including Excel (Microsoft, Redmond, WA).
To make it easier to identify all of the “significant” ratios, conditional formatting can be used to make all values greater than 2.00 or less than 0.5 bold, as in Table III. The column/row “hide” and “unhide” features can be used to make it easier to view/print the output file. As new experiments are released, they can be manually entered into the arraystart.txt file following the established format and the updated data files downloaded again from the AFGC Web site. We hope that this program can be incorporated into the AFGC Web site to make it even easier for individual researchers to perform this analysis.
RNA Gel-Blot Analysis
RNA was isolated using a modified Trizol (Invitrogen, Carlsbad, CA) protocol. Total RNA (10 μg) was electrophoresed through 1.2% (w/v) agarose gels containing formaldehyde and transferred to nylon membrane, as previously described (Schultz et al., 1997). Single-stranded digoxigenin-labeled probes were prepared using a two-stage PCR protocol (Schultz et al., 1997). Hybridization and chemiluminescent detection of digoxigenin-labeled probes was as previously described (Schultz et al., 1997).
Identifying Insertion Mutants from the Feldman Lines
Pools of DNA from the Feldman T-DNA lines (McKinney et al., 1995) were ordered from the ABRC (stock no. CD5-7). Each pool of DNA was screened with the four possible combinations of forward and reverse gene-specific primers and left or right border (T-DNA-specific primers). The primers used were AGP3-F1, 5′-TCA GGT TTC TAT CTC TCT CGT C-3′; AGP3-R1, 5′-TAC AAT CAG AAC TTC TTC CCT C-3′; AGP16-F1, 5′-TGG CGT CGA GAA ACT CCG TCA C-3′; AGP16-R1, 5′-CTC CAG AAA TCA TAA TCG AG-3′; AGP17-F1, 5′-TCG CAA TAT TCT CTT GAC GG-3′; AGP17-R1, 5′-GGC TAG AAC AAG TAG AGA CC-3′; left border primer, 5′-GAT GCA CTC GAA ATC AGC CAA TTT TAG AC-3′; and right border primer, 5′-GCT CAT GAT CAG ATT GTC GTT TCC CGC CTT-3′.
Identifying Insertion Mutants from the AFGC Facility at Wisconsin
Primers were designed to identify insertions in AGP17, AGP18, and FLA7 using the AFGC knockout facility at Wisconsin (Sussman et al., 2000). Two positive bands for AGP17 were confirmed in the first round PCR. One of the inserts was 500 bp downstream from stop codon, and the other was in the single intron. For AGP18, one insert was 1 kb downstream from stop codon and the other line showed an insert 3.5 kb downstream from stop codon. There were no inserts for FLA7. Primer sequences were: At17F, 5′-TAC ATA CAC TCG AGC ATT CTT CAC TCA AC-3′; At17R, 5′-AGC TGT AAT ACA TGA GAC AAA TGG GAG AG-3′; At18F, 5′-GAT CTC TCC TCA TCG CCT ATA TAA AAC TC-3′; At18R, 5′-AGA CTA AGA TCC AAC TAA GTA GCT GAC AC-3′; FLA7-F1, 5′-GGT TTC TTG TAA TGC AGT AGG AGA GTT CA; and FLA7-R1, 5′-TCC CAA GCC ATT TAT TGA CTT ATC TTC AG-3′.
Identifying Mutants Based on Border Sequences
The tagged lines we identified from Genoplante, TMRI, Insertwatch, and IMA are available from the appropriate organizations and the accession numbers of the lines are listed below. Researchers can obtain these lines by contacting the appropriate organization and signing an MTA. The mutant with tags in the AGP gene are listed by the gene name. The mutant from IMA is: FLA8, SGT6202. Mutants from Insertwatch are AGP18, CSHL:GT6565-3 (this mutant is not yet available); and FLA2, SINS:03_10_04 and FLA6, SINS:02_18_04. The mutant from Genoplante that is in the coding region is: FLA19, 078D04. Mutants from Genoplante that are in the 5′ region of the gene are: AGP4, 135F10; AGP14, 135B12; AGP21, 187G01; FLA3, 142D09; FLA10, 138C08; and FLA18, 135A03. Mutants from Genoplante that are in the 3′ region of the gene are: AGP12, 211G10; AGP21, 069E06; AGP18, 134D03; and FLA16, 080B05. Mutants from TMRI that are in the coding region are: AGP1, Garlic_247_H01; AGP3, Garlic_1247_D10; AGP9, Garlic_1289_D04; FLA6, Garlic_60_B01; FLA12, Garlic_917_D02; FLA14, Garlic_R3b_C02; FLA15, Garlic_720_G09; FLA17, Garlic_40_A08; and FLA18, Garlic_378_G02. Mutants from TMRI that are in the 5′ region of the gene are: AGP1, Garlic_194_E04; AGP2, Garlic_1298_B07; AGP3, Garlic_314_H08; AGP4, Garlic_500_E04; AGP5, Garlic_215_H04; AGP7, Garlic_903_G03; AGP9, Garlic_1057_G05; AGP10, Garlic_203_B12; AGP11, Garlic_909_G07; AGP13, Garlic_141_G02; AGP14, Garlic_777_G12; AGP15, Garlic_524_E11; AGP16, Garlic_859_B03; AGP18, Garlic_379_D10; AGP20, Garlic_877_C11; AGP22, Garlic_1288_D07; AGP23, Garlic_798_G03; FLA3, Garlic_1233_G05; FLA4, Garlic_56_H07; FLA5, Garlic_587_D04; FLA8, 1293_D05 and Garlic_1272_A04; FLA10, Garlic_450_G12; FLA15, Garlic_305_G02; FLA17, Garlic_596_A12; FLA18, Garlic_607_C07; and FLA21, Garlic_123_E07. Mutants from TMRI that are in the 3′ region of the gene are: FLA9, Garlic_877_H09; and FLA10, Garlic_145_E09.
ACKNOWLEDGMENTS
We are grateful to the ABRC (Ohio State University and Nottingham) for providing ESTs and DNA pools of the Feldman lines and to the AFGC for providing and continually improving the access to the microarray data and for providing the screening service for the T-DNA-tagged lines. We are especially grateful to Jeremy Gollub (AFGC) for helping us navigate our way through the microarray data. We thank Prof. Venkatesan Sundaresan for providing us with the insertion line for FLA8.
Footnotes
This work was supported by the Australian Research Council (Large Grant no. A10020017) and by the University of Melbourne (research scholarships to K.L.J. and Y.M.G.).
Article, publication date, and citation information can be found at www.plantphysiol.org/cgi/doi/10.1104/pp.003459.
LITERATURE CITED
- AGI Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408:796–815. doi: 10.1038/35048692. [DOI] [PubMed] [Google Scholar]
- Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Asamizu E, Nakamura Y, Sato S, Tabata S. A large scale analysis of cDNA in Arabidopsis thaliana: generation of 12,028 non-redundant expressed sequence tags from normalized and size-selected cDNA libraries. DNA Res. 2000;7:175–180. doi: 10.1093/dnares/7.3.175. [DOI] [PubMed] [Google Scholar]
- Aubourg S, Rouzé P. Genome annotation. Plant Physiol Biochem. 2001;39:181–193. [Google Scholar]
- Bacic A, Currie G, Gilson P, Mau S-L, Oxley D, Schultz C, Sommer-Knudsen J, Clarke AE. Structural classes of arabinogalactan-proteins. In: Nothnagel EA, Bacic A, Clarke AE, editors. Cell and Developmental Biology of Arabinogalactan-Proteins. Dordrecht, The Netherlands: Kluwer Academic/Plenum Publishers; 2000. pp. 11–23. [Google Scholar]
- Balzergue S, Dubreucq B, Chauvin S, Le-Clainche I, Le Boulaire F, de Rose R, Samson F, Biaudet V, Lecharny A, Cruaud C et al. Improved PCR-walking for large-scale isolation of plant T-DNA borders. Biotechniques. 2001;30:496–504. doi: 10.2144/01303bm06. [DOI] [PubMed] [Google Scholar]
- Borner GHH, Stevens T, Lilley KS, Dupree P. 9th International Cell Wall Meeting, Toulouse. 2001. The role of GPI-anchored cell surface proteins in Arabidopsis; p. 74. [Google Scholar]
- Bouchez D, Höfte H. Functional genomics in plants. Plant Physiol. 1998;118:725–732. doi: 10.1104/pp.118.3.725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boyes DC, Zayed AM, Ascenzi R, McCaskill AJ, Hoffman NE, Davis KR, Görlach J. Growth stage-based phenotypic analysis of Arabidopsis: a model for high throughput functional genomics in plants. Plant Cell. 2001;13:1499–1510. doi: 10.1105/TPC.010011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Casero PJ, Casimiro I, Knox JP. Occurrence of cell surface arabinogalactan-protein and extensin epitopes in relation to pericycle and vascular tissue development in the root apex of four species. Planta. 1998;204:252–259. [Google Scholar]
- Chen C-G, Pu Z-Y, Moritz RL, Simpson RJ, Bacic A, Clarke AE, Mau S-L. Molecular cloning of a gene encoding an arabinogalactan-protein from pear (Pyrus communis) cell suspension culture. Proc Natl Acad Sci USA. 1994;91:10305–10309. doi: 10.1073/pnas.91.22.10305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dolan L, Linstead P, Roberts K. An AGP epitope distinguishes a central metaxylem initial from other vascular initials in the Arabidopsis root. Protoplasma. 1995;189:149–155. [Google Scholar]
- Du H, Clarke AE, Bacic A. Arabinogalactan-proteins: a class of extracellular matrix proteoglycans involved in plant growth and development. Trends Cell Biol. 1996;6:411–414. doi: 10.1016/s0962-8924(96)20036-4. [DOI] [PubMed] [Google Scholar]
- Du H, Simpson RJ, Moritz RL, Clarke AE, Bacic A. Isolation of the protein backbone of an arabinogalactan-protein from the styles of Nicotiana alataand characterization of a corresponding cDNA. Plant Cell. 1994;6:1643–1653. doi: 10.1105/tpc.6.11.1643. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Durbin R, Eddy S, Krogh A, Mitchison G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge, UK: Cambridge University Press; 1998. [Google Scholar]
- Eisenhaber B, Bork P, Yuan Y, Löffler G, Eisenhaber F. Automated annotation of GPI anchor sites: case study C. elegans. Trends Biochem Sci. 2000;25:340–341. doi: 10.1016/s0968-0004(00)01601-7. [DOI] [PubMed] [Google Scholar]
- Finkelstein D, Ewing R, Gollub J, Sterky F, Cherry JM, Somerville S. Microarray data quality analysis: lessons from the AFGC project. Plant Mol Biol. 2002;48:119–131. doi: 10.1023/a:1013765922672. [DOI] [PubMed] [Google Scholar]
- Fowler TJ, Bernhardt C, Tierney ML. Characterization and expression of four proline-rich cell wall protein genes in Arabidopsis encoding two distinct subsets of multiple domain proteins. Plant Physiol. 1999;121:1081–1091. doi: 10.1104/pp.121.4.1081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gaspar Y, Johnson KL, McKenna JA, Bacic A, Schultz CJ. The complex structures of arabinogalactan-proteins and the journey towards understanding function. Plant Mol Biol. 2001;47:161–176. [PubMed] [Google Scholar]
- Ghassemian M, Waner D, Tchieu J, Gribskov M, Schroeder JI. An integrated Arabidopsis annotation database for Affymetrix Genechip data analysis, and tools for regulatory motif searches. Trends Plant Sci. 2001;6:448–449. doi: 10.1016/s1360-1385(01)02092-1. [DOI] [PubMed] [Google Scholar]
- Gilson P, Gaspar YM, Oxley D, Youl JJ, Bacic A. NaAGP4 is an arabinogalactan-protein whose expression is suppressed by wounding and fungal infection in Nicotiana alata. Protoplasma. 2001;215:128–139. doi: 10.1007/BF01280309. [DOI] [PubMed] [Google Scholar]
- Goodrum LJ, Patel A, Leykam JF, Kieliszewski MJ. Gum arabic glycoprotein contains glycomodules of both extensin and arabinogalactan-glycoproteins. Phytochemistry. 2000;54:99–106. doi: 10.1016/s0031-9422(00)00043-1. [DOI] [PubMed] [Google Scholar]
- Hall AE, Chen QG, Findell JL, Schaller GE, Bleeker AB. The relationship between ethylene binding and dominant insensitivity conferred by mutant forms of the ETR1 ethylene receptor. Plant Physiol. 1999;121:291–299. doi: 10.1104/pp.121.1.291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Halpin C, Barakate A, Askari BM, Abbott JC, Ryan MD. Enabling technologies for manipulating multiple genes on complex pathways. Plant Mol Biol. 2001;47:295–310. [PubMed] [Google Scholar]
- Höfte H, Desprez T, Amselem J, Chiapello H, Caboche M, Moisan A, Jourjon MF, Charpenteau JL, Berthomieu P, Guerrier D et al. An inventory of 1152 expressed sequence tags obtained by partial sequencing of cDNAs from Arabidopsis thaliana. Plant J. 1993;4:1051–1061. doi: 10.1046/j.1365-313x.1993.04061051.x. [DOI] [PubMed] [Google Scholar]
- Huber O, Sumper M. Algal-CAMs: isoforms of a cell adhesion molecule in embryos of the alga Volvox with homology to Drosophilafasciclin I. EMBO J. 1994;13:4212–4222. doi: 10.1002/j.1460-2075.1994.tb06741.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnson K, Jones B, Schultz CJ, Bacic A. Plant cell wall proteins: structural and selected non-structural. In: Rose J, editor. The Plant Cell Wall. UK: Academic Press, Sheffield; 2002. (in press) [Google Scholar]
- Kawamoto T, Noshiro M, Shen M, Nakamasu K, Hashimoto K, Kawashima-Ohya Y, Gotoh O, Kato Y. Structural and phylogenetic analyses of RGD-CAP/βig-h3, a fasciclin-like adhesion protein expressed in chick chondrocytes. Biochim Biophys Acta. 1998;1395:288–292. doi: 10.1016/s0167-4781(97)00172-3. [DOI] [PubMed] [Google Scholar]
- Kieliszewski MJ, Lamport DTA. Extensin: repetitive motifs, functional sites, post-translational codes, and phylogeny. Plant J. 1994;5:157–172. doi: 10.1046/j.1365-313x.1994.05020157.x. [DOI] [PubMed] [Google Scholar]
- Kieliszewski MJ, Shpak E. Synthetic genes for the elucidation of glycosylation codes for arabinogalactan-proteins and other hydroxyproline-rich glycoproteins. Cell Mol Life Sci. 2001;58:1386–1398. doi: 10.1007/PL00000783. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Knight H, Knight MR. Abiotic stress signalling pathways: specificity and cross-talk. Trends Plant Sci. 2001;6:262–267. doi: 10.1016/s1360-1385(01)01946-x. [DOI] [PubMed] [Google Scholar]
- Knox JP. Developmentally regulated proteoglycans and glycoproteins of the plant cell surface. FASEB J. 1995;9:1004–1012. doi: 10.1096/fasebj.9.11.7544308. [DOI] [PubMed] [Google Scholar]
- Krysan PJ, Young JC, Sussman MR. T-DNA as an insertional mutagen in Arabidopsis. Plant Cell. 1999;11:2283–2290. doi: 10.1105/tpc.11.12.2283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li S-X, Showalter AM. Cloning and developmental/stress-regulated expression of a gene encoding a tomato arabinogalactan protein. Plant Mol Biol. 1996;32:641–652. doi: 10.1007/BF00020205. [DOI] [PubMed] [Google Scholar]
- Liu Y-G, Mitsukawa N, Oosumi T, Whittier RF. Efficient isolation and mapping of Arabidopsis thalianaT-DNA insert junctions by thermal asymmetric interlaced PCR. Plant J. 1995;8:457–463. doi: 10.1046/j.1365-313x.1995.08030457.x. [DOI] [PubMed] [Google Scholar]
- Mau S-L, Chen C-G, Pu Z-Y, Moritz RL, Simpson RJ, Bacic A, Clarke AE. Molecular cloning of cDNAs encoding the protein backbones of arabinogalactan-proteins from the filtrate of suspension-cultured cells of Pyrus communis and Nicotiana alata. Plant J. 1995;8:269–281. doi: 10.1046/j.1365-313x.1995.08020269.x. [DOI] [PubMed] [Google Scholar]
- McKinney EC, Ali N, Traut A, Feldmann KA, Belostotsky DA, McDowell JM, Meagher RB. Sequence-based identification of T-DNA insertion mutations in Arabidopsis: actin mutants act2-1 and act4-1. Plant J. 1995;8:613–622. doi: 10.1046/j.1365-313x.1995.8040613.x. [DOI] [PubMed] [Google Scholar]
- Meissner RC, Jin H, Cominelli E, Denekamp M, Fuertes A, Greco R, Kranz HD, Penfield S, Petroni K, Urzainqui A et al. Function search in a large transcription factor gene family in Arabidopsis: assessing the potential of reverse genetics to identify insertional mutations in R2R3 MYBgenes. Plant Cell. 1999;11:1827–1840. doi: 10.1105/tpc.11.10.1827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nakai K, Horton P. PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci. 1999;24:34–36. doi: 10.1016/s0968-0004(98)01336-x. [DOI] [PubMed] [Google Scholar]
- Nam J, Mysore KS, Zheng C, Knue MK, Matthysse AG, Gelvin SB. Identification of T-DNA tagged Arabidopsis mutants that are resistant to transformation by Agrobacterium. Mol Gen Genet. 1999;261:429–438. doi: 10.1007/s004380050985. [DOI] [PubMed] [Google Scholar]
- Newman T, de Bruijn FJ, Green P, Keegstra K, Kende H, McIntosh L, Ohlrogge J, Raikhel N, Somerville S, Thomashow M et al. Genes galore: a summary of methods for accessing results from large-scale partial sequencing of anonymous Arabidopsis cDNA clones. Plant Physiol. 1994;106:1241–1255. doi: 10.1104/pp.106.4.1241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nielsen H, Engelbrecht J, Brunak S, von Heijne G. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 1997;10:1–6. doi: 10.1093/protein/10.1.1. [DOI] [PubMed] [Google Scholar]
- Nothnagel EA. Proteoglycans and related components in plant cells. Int Rev Cytol. 1997;174:195–291. doi: 10.1016/s0074-7696(08)62118-x. [DOI] [PubMed] [Google Scholar]
- Parinov S, Sevugan M, Ye D, Yang W-C, Kumaran M, Sundaresan V. Analysis of flanking sequences from Dissociationinsertion lines: a database for reverse genetics in Arabidopsis. Plant Cell. 1999;11:2263–2270. doi: 10.1105/tpc.11.12.2263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pogson BJ, Davies C. Characterization of a cDNA encoding the protein moiety of a putative arabinogalactan protein from Lycopersicon esculentum. Plant Mol Biol. 1995;28:347–352. doi: 10.1007/BF00020254. [DOI] [PubMed] [Google Scholar]
- Qi W, Fong C, Lamport DTA. Gum arabic glycoprotein is a twisted hairy rope: a new model based on O-galactosylhydroxyproline as the polysaccharide attachment site. Plant Physiol. 1991;96:848–855. doi: 10.1104/pp.96.3.848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reymond P. DNA microarrays and plant defence. Plant Physiol Biochem. 2001;39:313–321. [Google Scholar]
- Scheres B, van Engelen F, van der Knaap E, van de Wiel C, van Kammen A, Bisseling T. Sequential induction of nodulin gene expression in the developing pea nodule. Plant Cell. 1990;2:687–700. doi: 10.1105/tpc.2.8.687. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmidt EDL, Guzzo F, Toonen MAJ, de Vries SC. A leucine-rich repeat containing receptor-like kinase marks somatic plant cells competent to form embryos. Development. 1997;124:2049–2062. doi: 10.1242/dev.124.10.2049. [DOI] [PubMed] [Google Scholar]
- Schultz C, Gilson P, Oxley D, Youl J, Bacic A. GPI-anchors on arabinogalactan-proteins: implications for signalling in plants. Trends Plant Sci. 1998;3:426–431. [Google Scholar]
- Schultz CJ, Hauser K, Lind JL, Atkinson AH, Pu Z-Y, Anderson MA, Clarke AE. Molecular characterisation of a cDNA sequence encoding the backbone of a style-specific 120kDa glycoprotein which has features of both extensins and arabinogalactan-proteins. Plant Mol Biol. 1997;35:833–845. doi: 10.1023/a:1005816520060. [DOI] [PubMed] [Google Scholar]
- Schultz CJ, Johnson KL, Currie G, Bacic A. The classical arabinogalactan protein gene family of Arabidopsis. Plant Cell. 2000;12:1751–1767. doi: 10.1105/tpc.12.9.1751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Selleck SB. Proteoglycans and pattern formation: sugar biochemistry meets developmental genetics. Trends Genet. 2000;16:206–212. doi: 10.1016/s0168-9525(00)01997-1. [DOI] [PubMed] [Google Scholar]
- Showalter AM. Structure and function of plant cell wall proteins. Plant Cell. 1993;5:9–23. doi: 10.1105/tpc.5.1.9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sommer-Knudsen J, Bacic A, Clarke AE. Hydroxyproline-rich plant glycoproteins. Phytochemistry. 1998;47:483–497. [Google Scholar]
- Sommer-Knudsen J, Clarke AE, Bacic A. Proline- and hydroxyproline-rich gene products in the sexual tissues of flowers. Sex Plant Reprod. 1997;10:253–260. [Google Scholar]
- Sussman MR, Amasino RM, Young JC, Krysan PJ, Austin-Phillips S. The Arabidopsis knockout facility at the University of Wisconsin-Madison. Plant Physiol. 2000;124:1465–1467. doi: 10.1104/pp.124.4.1465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Svetek J, Yadav MP, Nothnagel EA. Presence of a glycosylphosphatidylinositol lipid anchor on rose arabinogalactan proteins. J Biol Chem. 1999;274:14724–14733. doi: 10.1074/jbc.274.21.14724. [DOI] [PubMed] [Google Scholar]
- Wall L, Christiansen T, Orwant J. Programming Perl. Ed 3. Sebastapol, CA: O'Reilly and Associates, Inc.; 2000. [Google Scholar]
- Wesley SV, Helliwell CA, Smith NA, Wang MB, Rouse DT, Liu Q, Gooding PS, Singh SP, Abbott D, Stoutjesdijk PA et al. Construct design for efficient, effective and high-throughput gene silencing in plants. Plant J. 2001;27:581–590. doi: 10.1046/j.1365-313x.2001.01105.x. [DOI] [PubMed] [Google Scholar]
- Winkler RG, Frank MR, Galbraith DW, Feyereisen R, Feldmann KA. Systematic reverse genetics of transfer-DNA-tagged lines of Arabidopsis. Plant Physiol. 1998;118:743–750. doi: 10.1104/pp.118.3.743. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wisman E, Ohlrogge J. Arabidopsis microarray service facilities. Plant Physiol. 2000;124:1468–1471. doi: 10.1104/pp.124.4.1468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu H, de Graaf B, Mariani C, Cheung AY. Hydroxyproline-rich glycoproteins in plant reproductive tissues: structure, functions and regulation. Cell Mol Life Sci. 2001a;58:1418–1429. doi: 10.1007/PL00000785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu S-H, Ramonell K, Gollub J, Somerville S. Plant gene expression profiling with DNA microarrays. Plant Physiol Biochem. 2001b;39:917–926. [Google Scholar]
- Yamagami T, Funatsu G. The complete amino acid sequence of a chitinase-a from the seeds of rye (Secale cereal) Biosci Biotechnol Biochem. 1994;58:322–329. doi: 10.1271/bbb.58.322. [DOI] [PubMed] [Google Scholar]
- Youl JJ, Bacic A, Oxley D. Arabinogalactan-proteins from Nicotiana alata and Pyrus communiscontain glycosylphosphatidylinositol membrane anchors. Proc Natl Acad Sci USA. 1998;95:7921–7926. doi: 10.1073/pnas.95.14.7921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu J, Nickels R, McIntosh L. A genome approach to mitochondrial-nuclear communication in Arabidopsis. Plant Physiol Biochem. 2001;39:345–353. [Google Scholar]
- Zdobnov EM, Apweiler R. InterProScan: an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001;17:847–848. doi: 10.1093/bioinformatics/17.9.847. [DOI] [PubMed] [Google Scholar]