Skip to main content
Plant Physiology logoLink to Plant Physiology
. 2002 Dec;130(4):1626–1635. doi: 10.1104/pp.012179

Contiguous Genomic DNA Sequence Comprising the 19-kD Zein Gene Family from Maize1

Rentao Song 1, Joachim Messing 1,*
PMCID: PMC166678  PMID: 12481046

Abstract

A new approach has been undertaken to analyze the sequences and linear organization of the 19-kD zein genes in maize (Zea mays). A high-coverage, large-insert genomic library of the inbred line B73 based on bacterial artificial chromosomes was used to isolate a redundant set of clones containing members of the 19-kD zein gene family, which previously had been estimated to consist of 50 members. The redundant set of clones was used to create bins of overlapping clones that represented five distinct genomic regions. Representative clones containing the entire set of 19-kD zein genes were chosen from each region and sequenced. Seven bacterial artificial chromosome clones yielded 1,160 kb of genomic DNA. Three of them formed a contiguous sequence of 478 kb, the longest contiguous sequenced region of the maize genome. Altogether, these DNA sequences provide the linear organization of 25 19-kD zein genes, one-half the number previously estimated. It is suggested that the difference is because of haplotypes exhibiting different degrees of gene amplification in the zein multigene family. About one-half the genes present in B73 appear to be expressed. Because some active genes have only been duplicated recently, they are so conserved in their sequence that previous cDNA sequence analysis resulted in “unigenes” that were actually derived from different gene copies. This analysis also shows that the 22- and 19-kD zein gene families shared a common ancestor. Although both ancestral genes had the same incremental gene amplification, the 19-kD zein branch exhibited a greater degree of far-distance gene translocations than the 22-kD zein gene family.


There is evidence that plants have rather large gene families coding for closely related gene products (http://www.Arabidopsis.org/info/genefamily/). Although in many cases members of these gene families are distributed throughout the genome, there are many examples where gene copies are clustered (Arabidopsis Genome Initiative, 2000). By understanding the organization of these gene families, we can glean new insights into gene amplification, gene function, gene regulation, and chromosomal architecture.

Traditional methods, like Southern-blot analysis of genomic DNA or clones of genomic libraries, do not provide the positional information necessary to understand the linear organization of gene families within the genome. Furthermore, chromosome-walking methods (Bender et al., 1983) based on genomic libraries are particularly difficult to conduct in larger plant genomes because of the high content of transposable elements and other repeat sequences (SanMiguel et al., 1996). To merge overlapping clones to construct contiguous genomic sequences beyond the limitations of these older methods, we have developed a new approach that consists of the following steps: (a) An expressed sequence tag (EST) database is used to organize a gene family into subfamilies so that subfamily-specific gene probes can be developed; (b) High-coverage bacterial artificial chromosome (BAC)-based genomic libraries are screened with these probes to identify sets of BAC clones for each subfamily; (c) BAC clones are DNA fingerprinted to create bins of overlapping BAC clones; (d) A comparison of DNA restriction fragment patterns of BAC clones and genomic DNA is used to assess the presence of all gene copies within the DNA fingerprinted BAC clones; and (e) BAC clones comprising individual clusters of each subfamily are sequenced. In this study, we applied this approach to a large gene family in maize (Zea mays) that encodes storage proteins in the maize endosperm.

Maize, which typically has a large genome ranging in size from 2.3 to 3.3 Gb (http://www.agron.missouri.edu/zeadna.html#Lau1985), belongs to the Gramineae family, which includes many of the cereal crops (Kellogg, 1998). It is one of the major sources of essential amino acids for livestock and humans, which are required for healthy nutrition. The majority of amino acids in maize kernels are contained within seed storage proteins that predominantly accumulate in the endosperm (Burr and Burr, 1976). These proteins are rich in the amino acids Pro and Gln, and, therefore, are called prolamins. The maize prolamins, also known as zeins, consist of a large protein family that can be divided into two classes. One of these classes has higher levels of sulfur-rich amino acids and Pro, and is encoded by only one to two gene copies. The other class, called α-zeins, contains higher levels of Leu and Gln, and is encoded by a large gene family (Heidecker and Messing, 1986).

The α-zein gene family can be divided into four subfamilies: z1A, z1B, z1C, and z1D. These subdivisions were determined using sequence homology and copy number based on DNA hybridization data as a classification scheme (for review, see Heidecker and Messing, 1986). Three of the subfamilies, z1A, z1B, and z1D, have relative molecular masses in SDS-polyacrylamide gels of 19 kD, whereas the z1C subfamily has a molecular mass of 22 kD. cDNA sequence analysis indicated that the sizes of these proteins can vary within subfamilies as a consequence of internal insertions/deletions (Heidecker et al., 1991). Our laboratory recently reported the sequence analysis of the entire z1C subfamily in maize inbred line BSSS53, which contained a total of 23 gene copies, 22 of them tandemly arranged within 168 kb (Song et al., 2001). The 23rd copy was the normal allele of the floury-2 locus (Coleman et al., 1995) that was separated from the remaining 22 copies by about 20 cM. Most of these genes exhibited a size consistent with a relative molecular mass of 22 kD; but one had a deletion (Zp22/D87), which enabled it to comigrate in SDS-polyacrylamide gels with the 19-kD zeins. Although DNA hybridization data estimated the copy number of the z1C subfamily to be 15 (Heidecker and Messing, 1986), this peculiarity is because of the different haplotypes present in various inbred lines (R. Song and J. Messing, unpublished data). Because of these different haplotypes, the size of each subfamily cannot be used to distinguish it by complexity alone. It has become clear that sequence homology is a more reliable parameter to classify the α-zein gene family.

The size of the other three subfamilies, z1A, z1B, and z1D, was estimated to be 25, 20, and 5, respectively, or containing roughly 50 members (Heidecker and Messing, 1986). It was also estimated that the remaining three subfamilies occurred on three of the 10 chromosomes in maize. Given the size and the distribution of these gene families, traditional genomic analysis was not sufficient to determine their genomic organization. Furthermore, expression analysis could not be linked to individual genes in the absence of their positional information within a contiguous genomic sequence. To provide a genomic reference of these gene families for future studies, we applied the approach described above to this set of genes in a single genetic background. Greatly aiding this approach was the recent construction of a large insert BAC library of MboI partially digested maize genomic DNA of inbred B73 (Yim et al., 2002). Following the previously outlined steps, we can show that the three α-zein subfamilies in B73 contain only 25 gene copies and are composed of five different genomic regions covering a total length of over 1 Mb. This analysis also shows the three 19-kD α-zein gene subfamilies underwent a greater degree of far-distance translocation than the 22-kD zein gene subfamily during stages of gene amplification.

RESULTS

Division of Subfamilies by EST Database Analysis

Before genomic analysis of the α-zein gene family was conducted, gene members were sorted by sequence homology. The simplest and most accurate approach to obtain the subdivisions is by sequence analysis of a random collection of cDNAs produced from immature endosperm, where the α zein genes are expressed (Burr et al., 1982). The maize EST database contains non-normalized cDNA sequences from different sources of zein mRNA (http://www. zmdb.iastate.edu/). Tentative unique contigs (TUC) of all 19-kD zein-specific sequences were collected. A total of 361 cDNA sequences fell into three groups of related sequences that correspond to the previously defined subfamilies of the 19-kD zein gene family: z1A, z1B, and z1D (Table I). The number of ESTs for each subfamily was also in agreement with its size, based on previous hybridization data (Burr et al., 1982). There were a total of 188 ESTs for z1A, 126 ESTs for z1B, and only 47 ESTs for z1D. Interestingly, within each subfamily, there appeared to be members that were expressed at much higher levels than others. Sequences from each subfamily were then used to draw consensus sequences for each of the three 19-kD zein gene subfamilies. Based on these consensus sequences, primers were designed to amplify sequences from the BSSS53 cDNA collection to generate specific probes for the isolation of the genomic regions comprising these genes.

Table I.

Maize EST contigs for DNA probe designa

TUC No. No. of ESTs z1A z1B z1D
TUC04-05-1270.1 2 Yes
TUC04-05-5009.1 3 Yes
TUC07-14-584.2 45 Yes
TUC07-14-6297.2 3 Yes
TUC07-14-6299.1 6 Yes
TUC07-14-6495.2 3 Yes
TUC07-14-6521.1 2 Yes
TUC07-14-6529.2 20 Yes
TUC07-14-6530.3 4 Yes
TUC07-14-6561.1 4 Yes
TUC07-14-6708.1 116 Yes
TUC07-14-6710.1 112 Yes
TUC07-14-6733.1 2 Yes
TUC07-14-6733.2 39 Yes
Total 361 188 126 47
a

Data downloaded from ZmDB (http://www.zmdb.iastate.edu/) in October 2000. 

BAC Clone Isolation and Identification

The three subfamily-specific probes were used to screen high-density filters of a maize BAC library constructed from maize inbred line B73 that was partially digested with MboI. A total of 83 clones were identified by hybridization under medium stringency (“Materials and Methods”). These clones were then analyzed in Southern blots at high stringency, reducing the set of positive BAC clones to 57. All 57 clones were subjected to NotI digestion and pulsed-field gel electrophoresis was used to estimate the size of the genomic DNA, which ranged from 50 to 210 kb (Table II). We obtained 25 clones for the z1A subfamily, 14 clones for the z1B subfamily, and 18 clones for the z1D subfamily. DNA fingerprinting followed by Southern-blot analysis, however, revealed five different fingerprinting contigs for the entire 19-kD zein gene family (Fig. 1). z1A and z1B each consisted of two BAC contigs (Table II), indicating they are in two noncontiguous genomic locations. The z1D subfamily also fell into two contigs; however, additional analysis showed they could be merged into one. Therefore, the z1D subfamily occupies one genomic location but spans an extended genomic region.

Table II.

Maize BAC clones of the19-kD zein gene familya

Sample No. BAC Clone Family Insert Size
kb
 1 Z337B24 z1A-2 170
 2 Z350D07 z1A-2 100
 3 Z352H24 z1A-1 160
 4 Z387B21 z1A-1 140
 5 Z408D24 z1A-2 170
 6 Z430H20 z1A-2 90
 7 Z448F14 z1A-1 130
 8 Z450H15 z1A-2 170
 9 Z488A24 z1A-2 125
10 Z490K03 z1A-1 160
11 Z491I04 z1A-1 160
12 Z508D15 z1A-2 150
13 Z514G12 z1A-1 135
14 Z514O17 z1A-1 140
15 Z532A18 z1A-1 160
16 Z539A01 z1A-1 160
17 Z553M01 z1A-1 145
18 Z304F13 z1A-2 130
19 Z307B21 z1A-1 160
20 Z311H01 z1A-1 155
21 Z329J07 z1A-1 170
22 Z290I18 z1A-2 80
23 Z402H02 z1A-2 125
24 Z405C16 z1A-2 170
25 Z433M03 z1A-2 140
26 Z337G24 z1B-2 180
27 Z337H24 z1B-2 190
28 Z350N01 z1B-2 120
29 Z350N13 z1B-2 125
30 Z385A24 z1B-1 50
31 Z393O03 z1B-2 155
32 Z411M23 z1B-2 160
33 Z492M16 z1B-1 200
34 Z512H06 z1B-1 145
35 Z524F15 z1B-1 105
36 Z531H07 z1B-2 165
37 Z550D20 z1B-2 130
38 Z562N20 z1B-1 125
39 Z573N10 z1B-1 100
40 Z358M05 z1D 210
41 Z363O06 z1D 180
42 Z368B24 z1D 160
43 Z390G14 z1D 185
44 Z391N14 z1D 155
45 Z410H16 z1D 125
46 Z425L11 z1D 180
47 Z431P22 z1D 180
48 Z468B05 z1D 120
49 Z475A21 z1D 170
50 Z505O13 z1D 185
51 Z506B20 z1D 180
52 Z513H09 z1D 180
53 Z530F22 z1D 180
54 Z540C08 z1D 165
55 Z546E13 z1D 150
56 Z576A02 z1D 140
57 Z315N03 z1D 180
a

Underlined clones were chosen for sequencing. 

Figure 1.

Figure 1

Fingerprinting and Southern-blot analysis of 19-kD-α-zein BAC clones. To maximize comparison, BAC clones were sorted into three groups, z1A, z1B, and z1D, according to different probes (three vertical panels in the figure). DNA was also fingerprinted by HindIII digestion and separated by 1% (w/v) agarose gel electrophoresis. After electrophoresis, the DNA restriction fragment pattern was recorded by a photo, as shown in the upper part of the figure. DNA from the agarose gel was subsequently blotted to nylon membranes and subjected to Southern-blot analysis using z1A-, z1B-, and z1D-specific probes. DNA fragment bands detected by specific probes were visualized by autoradiography as shown in the lower part of the figure. BAC clone designations correspond to those in Table I. M, DNA marker lane, in which a 1-kb DNA ladder was used. In the z1A and z1B panels, BAC clone designations were labeled by two different colors, indicating two different DNA fragment patterns within the group. Size rulers in kb are included on the right side of the picture.

Sequence Analysis of the Five Genomic Regions Containing 19-kD Zein Genes

The BAC clones that appeared to comprise all the zein clusters were chosen for DNA sequencing (underlined clones in Table II). To confirm their suitability for analysis, candidate clones were subjected to a comparative Southern blot of the BAC inserts with B73 genomic DNA (data not shown). Based on the banding pattern, a total of seven BAC clones was selected that accounted for all gene copies of the 19-kD zein gene family. BAC clones Z448F14 and Z350D07 were selected for the z1A subfamily, and BAC clones Z492M16 and Z531H07 were chosen for the z1B subfamily. BAC clones Z576A02, Z513H09, and Z410H16 formed a contiguous genomic sequence that contained all the members of the z1D subfamily (Table III).

Table III.

BAC sequencing summary

BAC Clone Subfamily Insert Size 19-kD Zein Gene Copy Gaps (Physical, Sequencing) Accession No.
kb
Z448F14 z1A-1 152 9 2  (1,  1) AF546186
Z350D07 z1A-2 116 3 0 AF546189
Z492M16 z1B-1 203 6 0 AF546188
Z531H07 z1B-2 180 2 1  (0,  1) AF546190
Z410H16 z1D 173 1 3  (1,  2) AF546187
Z513H09 z1D 179 1 3  (1,  2) AF546187
Z576A02 z1D 156 3 4  (1,  3) AF546187
Total 1,159 25 13  (4,  9)

DNA sequencing was carried out by the shotgun approach (Messing et al., 1981). BAC DNA was physically sheared and cloned into a pUC-sequencing vector as described in “Materials and Methods.” Each shotgun clone was sequenced from both ends to provide a pair of linked sequences (Vieira and Messing, 1982). This linkage was critical in the process of turning the assembly of shotgun reads into contiguous sequence information (contigs). Each BAC was sequenced at a 10- to 15-fold coverage before the sequences were assembled into contigs. Different sequencing chemistries, or primer walking, were carried out to fill gaps and/or place different contigs into the correct order. As a result, a total of 1,160 kb was generated from these seven BAC clones with only a few gaps remaining. Such BAC clones are regarded as phase II level sequence. However, the number of ordered pieces and the sizes of the gaps can vary substantially for phase II level sequence. The seven BACs sequenced here have zero to four gaps left in regions that do not contain storage protein gene sequences (Table III). Gaps can be caused by DNA sequences that are absent in missing shotgun libraries (physical gap) or DNA sequences that are difficult to sequence. The size of a physical gap is usually around 500 bp, whereas that of a sequencing gap is around 50 bp. BAC clone sequencing has been summarized in Table III.

Complete Set of 19-kD Zein Genes in the Maize B73 Inbred Line

All sequences generated in this study were subjected to sequence homolog searches using known 19-kD zein cDNA sequences. A total of 25 copies of the 19-kD zein gene sequences representing the entire collection of 19-kD zein gene copies in the maize inbred line B73 (Table IV) was discovered. The two z1A clones, Z448F14 and Z350D07, contained nine and three copies, respectively, of 19-kD zein gene sequences. The two z1B clones, Z492M16 and Z531H07, contained six and two copies, respectively, of 19-kD zein gene sequences. The three overlapping z1D clones, Z576A02, Z513H09 and Z410H16, contained three, one, and one 19-kD zein gene sequence copies, respectively. A diagram with the relative positions of these zein genes on each BAC clone is presented in Figure 2.

Table IV.

The 19-kD zein genes and their expression

19-kD Zein Gene Subfamily Sizea Status Expression EST Contig Hit (No. of ESTs)b Identityc
bp
Z448F14-1 z1A-1 600 Internal del No
Z448F14-2 z1A-1 804 Intact Yes TUC02-02-07-9633.1 (6) 100%
Z448F14-3 z1A-1 705 Intact Yes TUC02-02-07-13083.1 (5) 99.7%
TUC02-02-07-16440.1 (336) 99.0%
TUC02-02-07-16312.1 (2) 100%
Z448F14-4 z1A-1 705 Intact Yes TUC02-02-07-16440.1 (336) 99.5%
TUC02-02-07-13083.1 (5) 98.6%
TUC02-02-07-16312.1 (2) 98.6%
Z448F14-5 z1A-1 705 Intact Yes TUC02-02-07-16440.1 (336) 99.3%
TUC02-02-07-13083.1 (5) 98.3%
TUC02-02-07-16312.1 (2) 98.4%
Z448F14-6 z1A-1 705 Intact Yes TUC02-02-07-16440.1 (336) 98.7%
Z448F14-7 z1A-1 705 Intact Yes TUC02-02-07-16440.1 (336) 99.5%
TUC02-02-07-13083.1 (5) 98.8%
TUC02-02-07-16312.1 (2) 98.7%
Z448F14-8 z1A-1 706 Prestop No
Z448F14-9 z1A-1 332 Truncated (3′) No
Z350D07-1 z1A-2 702 Intact Yes TUC02-02-07-16249.1 (145) 100%
Z350D07-2 z1A-2 702 Intact Yes TUC02-02-07-15304.2 (43) 99.8%
Z350D07-3 z1A-2 702 Prestop No
Z492M16-1 z1B-1 726 Prestop Yes TUC02-02-07-10727.2 (32) 98.2%
Z492M16-2 z1B-1 726 Prestop Yes TUC02-02-07-14129.1 (13) 99.6%
Z492M16-3 z1B-1 725 Prestop No
Z492M16-4 z1B-1 723 Intact Yes TUC02-02-07-4151.1 (144) 98.1%
TUC02-02-07-16414.2 (5) 98.7%
Z492M16-5 z1B-1 722 Prestop Yes TUC02-02-07-14845.1 (2) 100%
Z492M16-6 z1B-1 723 Intact Yes TUC02-04-28-4151.1 (144) 98.1%
TUC02-02-07-16414.2 (5) 98.1%
Z531H07-1 z1B-2 726 Prestop No
Z531H07-2 z1B-2 723 Prestop No
Z410H16-1 z1D 674 Truncated (5′) No
Z513H09-1 z1D 726 Intact Yes TUC02-04-28-4156.1 (6) 98.3%
Z576A02-1 z1D 201 Truncated (5′) No
Z576A02-2 z1D 723 Intact Yes TUC02-04-28-4169.1 (40) 99.0%
Z576A02-3 z1D 692 Truncated (3′) No
a

Size refers to coding region. 

b

EST data from ZmDB (http://www.zmdb.iastate.edu/) in July 2002. 

c

Sequence match > 500 bp. 

Figure 2.

Figure 2

The distribution of 19-kD α-zein genes in different BAC clones. Each BAC clone was presented as a bar with the clone name and size (in parentheses) above it. BAC clones were orientated according to the transcriptional direction of 19-kD zein genes (all from 5′ to 3′). BAC clones were sorted by z1A, z1B, and z1D three subfamilies, as indicated with three boxed sections in the figure. A ruler with size in kb is shown at the left upper corner of the figure. Along the bars, red ovals indicate 19-kD zein genes, and the numbers within them indicate their order from 5′ to 3′ in each BAC clone, corresponding to their gene names in Table IV. Blue ovals indicate the position of other predicted genes.

Among the 19-kD α zein genes, only one-half of them (12 of 25) had intact coding regions, whereas the other one-half (13 of 25) were “damaged” by truncation, internal deletion, or stop codons (Table IV). The intact copies of 19-kD zein genes exhibited two different size ranges: z1B and z1D had coding sizes of 723 and 726 bp, whereas those of z1A displayed coding sizes of 702 and 705 bp. There was one exception to this dichotomy in size, gene Z448F14-2 in the z1A subfamily that had an 804-bp coding region, which is very close to the size of most of the 22-kD zein gene coding regions (801 bp). As previously mentioned, the reverse has also been described forZp22/D87 from inbred line BSSS53, a 22-kD zein gene, that has a coding region of 716 bp, which is close to that of 19-kD zeins (Song et al., 2001).

Expression of the 19-kD Zein Gene Family

Because the analysis conducted in this study provided us with not only all 19-kD zein genes, but also their linear arrangement within the genome, we can now begin to investigate how each gene is regulated. As a first step, we matched the EST database with each of the genomic 19-kD zein gene sequences. This analysis greatly depended on the degree of polymorphism occurring among these genes and between various haplotypes because the ESTs in the database were derived from different inbred lines than B73 (http://zmdb.iastate.edu/zmdb/EST/libraries.html). However, in our study of the 22-kD zein genes (z1C subfamily), we noted that orthologous positions between different inbred lines exhibit a higher degree of conservation than nonorthologous positions, except for very recently amplified gene copies (Llaca and Messing, 1998; R. Song and J. Messing, unpublished data). To gain a preliminary overview of which genes were likely to be expressed, we accounted for the divergence between inbred lines by setting a threshold of 98% identity over a minimum length of 500 bp of EST sequences with genes from the 19-kD zein family (Table IV).

Based on this analysis, all 19-kD zein genes with intact coding regions are expressed, although their expression levels vary greatly from each other. All 19-kD zein genes with truncations appear to be incapable of accumulating mRNA levels that were detectable within the size of these libraries. However, mRNAs of three 19-kD zein genes from the z1B subfamily (Z492M16-1, Z492M16-2, and Z492M16-5) that contain in-frame stop codons appear to accumulate mRNA, and, therefore, might produce truncated versions of α-zein proteins. This is in contrast to other genes with in-frame stop codons that do not direct the accumulation of mRNAs (Van Hoof and Green, 1996; Patracek et al., 2000). Although 15 of 25 genes do accumulate mRNAs at detectable levels, these levels differ by more than 20- to 30-fold. One of the difficulties in quantifying the mRNA levels of individual genes with this EST data set was the very recent amplification of some members of the 19-kD zein gene family. EST contig TUC02-02-07-16440.1 from ZmDB, which contained a total of 336 ESTs, apparently contained mixed ESTs from five different 19-kD zein genes from the z1A subfamily (Z448F14-3, Z448F14-4, Z448F14-5, Z448F14-6, and Z448F14-7). Another EST contig, TUC02-02-07-4151.1 also from ZmDB, which consists of 144 ESTs, contained mixed ESTs from two expressed 19-kD zein genes from the z1B subfamily (Z492M16-4 and Z492M16-6).

Distance Analysis of α-Zein Genes

Given the different physical linkages of 19-kD zein genes, one would assume that amplification and mobility of the zein genes occurred at different times in evolution. This assumption would be consistent with our previous analysis of the 22-kD zein genes in inbred BSSS53 (Song et al., 2001). Therefore, all the coding sequences of the 19-kD zein genes were used for distance analysis employing the Clustal method (Higgins and Sharp, 1989). Two severely truncated 19-kD zein gene copies were excluded to avoid a bias in the comparison (Z448F14-9 and Z576A02-1). Although we already knew that there were differences in gene copy numbers and sequence polymorphisms between different inbred lines, we included the z1C coding sequences of inbred BSSS53 in the distance analysis as a reference because allelic sequence differences are too small to disturb a phylogenetic tree between these subfamilies (shown in Fig. 3). From this analysis, the z1A, z1B, and z1D subfamilies separated from the z1C subfamily, confirming that the 22- and 19-kD zein genes fall into two groups of genes that originated from a common ancestor.

Figure 3.

Figure 3

Phylogenetic analysis of the maize α-zein genes. The coding regions of maize α-zein (19- and 22-kD zein) genes were aligned by the Clustal method to generate a phylogenetic tree. Two 19-kD zein genes with large sequence truncation (Z448F14-9 and Z576A02-1) were not included in this study. The 19-kD zein genes with an asterisk on their names mark those with intact coding regions. The data of 22-kD zein genes came from our previous study (Song et al., 2001). The two major clades in the figure correspond to 19-kD zein genes (top) and 22-kD zein genes (bottom). The 19-kD zein gene clade is split into three smaller clades, corresponding to the three subfamilies z1A, z1B, and z1D, respectively. The 22-kD zein gene clade contains a single subfamily z1C. Gene names were color coded according to their relationship within the different genomic locations: Z448F14 (yellow), Z350D07 (orange), Z492M16 (green), Z531H07 (blue), z1D contigs (light blue), z1C gene cluster (pink), and fl2 locus (red). A ruler in the bottom of the figure provides an estimated evolutionary time scale in million years ago (mya).

DISCUSSION

Complexity of the 19-kD α-Zein Gene Family

Here, we have described the genomic organization of a large gene family in maize comprising all members of the 19-kD zein genes. These genes fall into three subfamilies, and are located in five distinct genomic regions. Because the maize genome has not yet been sequenced, the question that must be posed is whether or not the experimental approach described in this study was capable of uncovering all the members of this gene family? We established three basic criteria for determining the comprehensive isolation of the 19-kD zein genes. First, the probes used in this study were developed from DNA sequence information of an EST database, rather than from hybridization data. DNA sequence information can reveal members of a gene family that have diverged to a degree that even under reduced stringency would not be detectable by DNA hybridization experiments. Sequence divergence was then addressed by selecting as many DNA probes as necessary to detect all members of the gene family by DNA hybridization.

When this project commenced 2 years ago, the ZmDB database contained more than 300 ESTs of 19-kD zein cDNAs. At present, the data set has more than doubled in size, and it remains consistent with the results from the first data set. Second, the BAC library used for this study had a comprehensive coverage of the maize genome. This BAC library was constructed from maize inbred line B73, has an average insert size of 167 kb and a total of 105,579 clones, which provided a 7-fold genome coverage based on a genome size of 2.5 Gb (Arumunganathan and Earle, 1991). Third, the first screening of BAC high-density filters was carried out under a medium-stringency hybridization condition, although we used three different probes. Under such conditions, even clones from the 22-kD zein gene family that have further diverged from 19-kD zein genes were detected (Heidecker and Messing, 1983; also see Fig. 3). Therefore, we believe that, based on these criteria, the isolation of 19-kD zein genes of maize inbred line B73 was complete.

Genomic Organization of the 19-kD α-Zein Gene Family

Segregation studies of polymorphic 19-kD zein proteins or RFLP of 19-kD zein genes placed these genes on maize chromosomes 1, 4, and 7 (Soave et al., 1981; Soave et al., 1982; Wilson et al., 1989;Woo et al., 2001). In a previous study, we demonstrated that the two other locations on chromosome 4, where α-zein genes have been mapped, represent the 22-kD zein gene family, or the z1C subfamily (Song et al., 2001). In this study, BAC clones were sorted into three groups according to three different probes. Within each group, BAC clones were analyzed by DNA fingerprinting and Southern-blotting techniques. The resulting fingerprint bins indicated a total of five unlinked genomic locations.

There is additional evidence that these five genomic regions are not contiguous. A systematic effort was undertaken to DNA fingerprint the clones of the entire BAC library that we utilized for the isolation of the 19-kD zein genes (Maize Physical Mapping, http://genome.arizona.edu/fpc/maize). Furthermore, two additional libraries, made with HindIII and EcoRI from the same B73 germplasm, are also in the process of being fingerprinted. The BAC DNA fingerprints were used to assemble BAC clones into fingerprinted contigs (FPCs) with a program called WebFPC. This analysis also included all the zein clones shown in Table II. To date, more than 232,000 BAC clones, representing 12-fold coverage of the maize genome from the three different BAC libraries, have been analyzed. All five genomic regions comprising the 19-kD zein genes fall into individual FPCs (Table V). Because these FPCs are larger contiguous sequences than the zein gene clusters, they also indicate that these five regions are in noncontiguous locations of the genome. In addition, we have sequenced the ends of several BAC clones, sometimes referred to as sequence-tagged connectors (STC). These STCs were compared with the sequenced BAC clones, thereby confirming their linkage to the correct genomic location. These data confirms the reliability of the WebFPC program because all zein clones were placed in correct bins. Therefore, future additional coverage of FPCs and their placement on the genetic map, coupled with the development of an STC database, will greatly facilitate the genomic analysis of complex large gene families.

Table V.

Maize BAC WebFPC summarya

Subfamily FPC Contig No. of Clones No. of Markers Sequenced FPC Link Linked Position Accession No.
z1A-1 1,266 108 13 Z448F14 Z448F14 umc1943; ch4/105.8 AF546186
z1A-2 289 85 7 Z350D07 Z430H20 n/ab AF546189
z1B-1 2,620 97 20 Z492M16 Z492M16 cesa9; ch7/47.5 AF546188
z1B-2 1,028 22 3 Z531H07 Z531H07 n/a AF546190
z1D 1,624 110 6 Z410H16 Z410H16 n/a AF546187
a

Data from maize physical mapping (http://genome.arizona.edu/fpc/maize/) in July 2002. 

b

n/a, Not available. 

Comparison with EST Databases

Sequence analysis of the five genomic regions comprising the 19-kD zein gene family revealed a total of 25 gene copies. Among the 19-kD zein genes, only about one-half exhibited an intact coding region. The remainder of the gene copies displayed either in-frame stop codons or truncations at the 5′ or 3′ end. In terms of gene expression, all intact copies appeared to be expressed. If truncated copies were transcribed, mRNAs must be rapidly turned over because no transcript for these genes was detected. However, it was surprising that some of the genes containing in-frame stop codons appeared to be expressed, whereas others appeared not to be. It was previously shown that the introduction of in-frame stop codons in plant mRNAs leads to mRNA instability (Van Hoof and Green, 1996; Patracek et al., 2000). Furthermore, a 22-kD zein mRNA from inbred W22 with an in-frame stop codon was shown to accumulate mRNA at levels two orders of magnitude lower (5% versus 0.045%) than 22-kD zein mRNAs without an in-frame stop codon (Liu and Rubenstein, 1993). Therefore, it is possible that the genes in B73 with an in-frame stop codon that appear to be expressed do not have in-frame stop codons in the inbreds from which they were derived. Such single nucleotide differences between orthologous genes were also detected in W22 and BSSS53 and usually affect a C to T conversion of the CAG or CAA Gln codon (Llaca and Messing, 1998).

A large, privately held EST database of 6,732 endosperm-specific cDNAs from inbred B73 also served as a source of an expression analysis of zein genes (Woo et al., 2001). Although the genomic data presented here were also derived from the B73 inbred, the Woo et al. (2001) study predicted fewer expressed genes than our studies indicate. For instance, we discovered that two highly expressed 19-kD zein genes in the Woo et al. (2001) study (az19B1 and az19B3) are actually a mixture of different expressed genes that are highly homologous. This result clearly demonstrates that the assembly of ESTs in “unigene” sets must be considered tentative, and it will be important to maintain single cDNA reads in the databases for the completion of linear genomic analysis of genes.

Recent Amplified Genes Are Tightly Clustered

A complication in gene expression studies is that gene copies are highly homologous and are mainly discriminated by their chromosomal position. Interestingly, these copies are tandemly arranged within a short physical distance. The phylogenetic analysis also showed that these highly homologous α-zein gene copies must have recently been amplified relative to the other members of the gene family. The more extensive amplification of the z1A and z1B subfamilies resulted in gene translocations, followed by additional amplification, explaining the five genomic locations of the 19-kD zein genes. On the other hand, the translocation of a single 22-kD zein gene (Fl2) did not give rise to additional amplification (Song et al., 2001). Furthermore, the z1D subfamily formed without any far-distance gene translocation, but it also represents the subfamily with the lowest degree of gene amplification. Interestingly, previous copy number estimates of the 19-kD α zein gene subfamily obtained by hybridization data of inbred lines, other than B73, deviate substantially for the z1A subfamily (25 versus 12 copies in B73), and for the z1B subfamily (20 versus eight copies in B73; Heidecker and Messing, 1986). However, the estimate for the z1D subfamily is the same (five versus five in B73), suggesting that the longer stretches of spacer region between copies of the z1D subfamily might have prevented generation of more haplotypes based on gene amplification. Comparison with other haplotypes, therefore, should provide illuminating insights into how these genes were amplified.

There also seems to be a difference of how the 22- and 19-kD α-zein genes amplified. Both branches have undergone several rounds of gene amplification, most of them long after the allotetraploidization of maize within the last 5 million years (Gaut and Doebley, 1997; Song et al., 2001). Interestingly, this period coincided with the extensive expansion of the maize genome by retrotranspositions (SanMiguel et al., 1998). The 19-kD zein branch split into two major clades, one representing the z1D subfamily and the other representing the z1A and z1B subfamilies, probably representing originally three genomic locations. A major difference between the two branches is that the expansion of the 19-kD branch involved a greater degree of far-distance movements within the maize genome than the 22-kD branch. Gene copies of the 22-kD branch are mostly contained within a 168-kb genomic region, whereas copies of the 19-kD branch are distributed among five different genomic locations.

MATERIALS AND METHODS

Design of 19-kD Zein Probes

The 19-kD zein-related EST sequences were downloaded from ZmDB (maize [Zea mays] genome database, http://www.zmdb.iastate.edu/) in October 2000. These sequences were assembled on a Macintosh G3 computer (Apple Computer, Cupertino, CA) using the Seq-Man program of the Lasergene software package (DNAStar, Inc., Madison, WI). Known sequences of z1A, z1B, and z1D were included in the comparison. These sequences were aligned by homology and assembled into three contigs (see Table I). Based on the previously described representative member of each known subfamily (Heidecker and Messing, 1986), the contigs were classified as z1A, z1B, and z1D subfamilies, respectively. Primers were designed based on consensus sequences derived from Seq-Man assemble, which allowed PCR amplification for each of the z1A, z1B, and z1D subfamilies, respectively: z1A, 5′ primer: 5′ AGTGCTGCTACGGCGACCATT; 3′ primer, 5′ CGGAAGCCACAAACATCAGACAA; z1B, 5′ primer: 5′CGGCACGAGGCAACATAGAAAGT; 3′ primer, 5′ TAAAAGAGGGCACCACCAATGATG; z1D, 5′ primer: 5′ ATACAATCCTACAGGCTACAAGAG; and 3′ primer, 5′ GTGGGCTGCTGCAATAAGGTG.

These primers were used to amplify corresponding fragments from cDNA clones that were isolated from 18-d after pollination endosperm of inbred line BSSS53 (R. Song and J. Messing, unpublished data). PCR fragments were purified from an agarose gel and labeled using standard procedures.

BAC Clone Isolation and Characterization

A maize BAC library CHORI201 was used for screening 19-kD zein gene sequences. The library was constructed by using genomic DNA of inbred line B73 partially digested by MboI (Yim et al., 2002). BAC library high-density filters were made using a Total Array System (BioRobotics, Inc., Comberton, Cambridge, UK). Hybridization was carried out in 5× SSC with 7.5% (w/v) SDS at 65°C overnight. Membranes were washed under medium stringency with 1× SSC and 0.1% (w/v) SDS at 65°C twice, for 20 min each. The membranes were wrapped and exposed to x-ray films. Tentative, positive BAC clones were streaked on Luria-Bertani broth plates and DNA minipreparations were carried out with a 3-mL overnight culture. BAC DNA was digested with BamHI or HindIII, and separated by agarose gel electrophoresis. DNA was blotted to membranes and hybridized with the probes described above in 5× SSC with 7.5% (w/v) SDS at 65°C overnight. High-stringency washing conditions were performed twice for 20 min with 0.1× SSC and 0.1% (w/v) SDS at 65°C. Clones passing this stringency selection were sorted into groups according to the DNA hybridization probes.

All positive BAC clones were estimated for insert size using the NotI sites in the vector that flanked the cloned DNA, followed by pulsed-field gel electrophoresis using a CHEF apparatus (Bio-Rad Laboratories, Inc., Hercules, CA). BAC end sequencing was carried out with some BAC clones using the Big Dye Terminator chemistry and the ABI3700 capillary sequencer (Applied Biosystems, Inc., Foster City, CA).

BAC Sequencing

Large-scale BAC DNA preparations were conducted by using a Large Construction Kit (Qiagen, Inc., Valencia, CA). BAC DNAs were physically sheared using the HydroShear instrument (Genomic Instrumentation Services, Inc., San Carlos, CA) with varying speed codes. Two different speed codes (11 and 14) were routinely used, which gave an average DNA fragment size of 3 and 6 kb, respectively. The ends of the sheared DNA were repaired with T4 DNA polymerase and the products separated in agarose gels. DNA fractions from 2 to 4 kb (from speed code 11) or 5 to 8 kb (from speed code 14) were recovered from gels using a Gel Extraction Kit (Qiagen, Inc.). DNA was ligated into a dephosphorylated cloning vector such as pUC18/SmaI/BAP (Amersham Pharmacia Biotech, Inc., Piscataway, NJ) with a vector versus insert molar ratio of 3:1 or 1:1. Ligated DNA was electroporated into Escherichia coli strain DH10B and transformants were selected on Luria-Bertani broth agar plates with appropriate selective agents. Ten to 20 clones were checked for cloning quality before large-scale shotgun sequencing.

A DNA template minipreparation for large-scale shotgun sequencing was carried out by using QIAprep 96 turbo Miniprep Kit (Qiagen, Inc.). Sequencing reactions of both ends of each clone were carried out by using Big Dye Terminator chemistry (Applied Biosystems, Inc.). Sequencing products were analyzed on ABI3700 capillary sequencers (Applied Biosystems, Inc.). Base calling and quality assessment was conducted with the PHRED program (Ewing and Green, 1998). Sequence assembly with the PHRAP program and assembled shotgun reads were viewed and edited with CONSED (Gordon et al., 1998). Primer walking and full shotgun sequencing were used to close gaps or orientate contigs. The dGTP Big Dye Terminator kit (Applied Biosystems, Inc.) was used to resolve the sequence of some regions that were difficult to sequence. All BAC clones have been advanced to phase II level sequence and deposited into the HTGS division of GenBank.

Sequence Analysis

Draft sequences generated from high-throughput DNA sequencing (phase II) were subjected to gene prediction programs with FGENESH (Softberry, Inc., Mount Kisco, NY). The predicted protein sequences were then subjected to BLASTP searches against databases in GenBank (Altschul et al., 1990). Only hits with significant homology (e value of less than 10−5 to other species) have been considered. Coding sequences of known 19-kD zein genes were compared with BAC sequences using the BLAST2 (Tatusova and Madden, 1999). The 19-kD zein sequences identified in this analysis were then further analyzed by sequence distance analysis using the MegAlign program of Lasergene (DNAStar, Inc.). Gene sequences with large truncations were not included in the MegAlign analysis.

The coding regions of the different 19-kD zein genes were also subjected to sequence homology searches against the ZmDB maize EST database by BLASTN (http://www.zmdb.iastate.edu/cgi-bin/ZmDBblast/ZMDB; Altschul et al., 1990). Taking into consideration that the EST database was made by ESTs from different maize inbred lines, 98% sequence identity over a minimum length of 500 bp was set as threshold for the comparison of genomic and cDNA sequences.

WebFPC Analysis

A BAC clone address was used to identify a maize FPC in the maize WebFPC database (http://genome.arizona.edu/fpc/maize/). Furthermore, within an identified FPC, different BAC clones could be searched. Some FPCs have been anchored with genetic markers; therefore, their chromosomal location and genetic map position can be determined.

ACKNOWLEDGMENTS

We thank Drs. Gregorio Segal and Barbara Miesak More for helpful comments on the manuscript, and Steve Kavchok and Steve Young for technical assistance.

Footnotes

1

This work was supported by the Department of Energy (grant no. DE–FG05–95ER20194 to J.M.).

Article, publication date, and citation information can be found at www.plantphysiol.org/cgi/doi/10.1104/pp.012179.

LITERATURE CITED

  1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;21:5403–5410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  2. Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408:796–815. doi: 10.1038/35048692. [DOI] [PubMed] [Google Scholar]
  3. Arumunganathan K, Earle ED. Nuclear DNA content of some important plant species. Plant Mol Biol Rep. 1991;9:208–219. [Google Scholar]
  4. Bender W, Spierer P, Hogness DS. Chromosomal walking and jumping to isolate DNA from the Ace and rosy loci and the bithorax complex in Drosophila melanogaster. J Mol Biol. 1983;168:17–33. doi: 10.1016/s0022-2836(83)80320-9. [DOI] [PubMed] [Google Scholar]
  5. Burr B, Burr FA. Zein synthesis in maize endosperm by polyribosomes attached to protein bodies. Proc Natl Acad Sci USA. 1976;73:515–519. doi: 10.1073/pnas.73.2.515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Burr B, Burr FA, St. John TP, Thomas M, Davis RD. Zein storage gene family of maize. J Mol Biol. 1982;154:33–49. doi: 10.1016/0022-2836(82)90415-6. [DOI] [PubMed] [Google Scholar]
  7. Coleman CE, Lopes MA, Gillikin JW, Boston RS, Larkins BA. A defective signal peptide in the maize high-lysine mutant floury 2. Proc Natl Acad Sci USA. 1995;92:6828–6831. doi: 10.1073/pnas.92.15.6828. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Ewing B, Green P. Base-calling of automated sequencer traces using PHRED: II. Error probabilities. Genome Res. 1998;8:186–194. [PubMed] [Google Scholar]
  9. Gaut BS, Doebley JF. DNA sequence evidence for the segmental allotetraploid origin of maize. Proc Natl Acad Sci USA. 1997;94:6809–6814. doi: 10.1073/pnas.94.13.6809. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Gordon D, Abajian C, Green P. CONSED: a graphical tool for sequencing finishing. Genome Res. 1998;8:95–202. doi: 10.1101/gr.8.3.195. [DOI] [PubMed] [Google Scholar]
  11. Heidecker G, Messing J. Sequence analysis of zein cDNAs obtained by an efficient mRNA cloning method. Nucleic Acids Res. 1983;11:4891–4906. doi: 10.1093/nar/11.14.4891. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Heidecker G, Chaudhuri S, Messing J. Highly clustered zein gene sequences reveal evolutionary history of the multigene family. Genomics. 1991;10:719–732. doi: 10.1016/0888-7543(91)90456-o. [DOI] [PubMed] [Google Scholar]
  13. Heidecker G, Messing J. Structural analysis of plant genes. Annu Rev Plant Physiol. 1986;37:439–466. [Google Scholar]
  14. Higgins DG, Sharp PM. Fast and sensitive multiple sequence alignments on a microcomputer. CABIOS. 1989;5:151–153. doi: 10.1093/bioinformatics/5.2.151. [DOI] [PubMed] [Google Scholar]
  15. Kellogg EA. Relationships of cereal crops and other grasses. Proc Natl Acad Sci USA. 1998;95:2005–2010. doi: 10.1073/pnas.95.5.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Liu CN, Rubenstein I. Transcriptional characterization of an alpha-zein gene cluster in maize. Plant Mol Biol. 1993;22:323–336. doi: 10.1007/BF00014939. [DOI] [PubMed] [Google Scholar]
  17. Llaca V, Messing J. Amplicons of maize zein genes are conserved within genic but expanded and constricted in intergenic regions. Plant J. 1998;15:211–220. doi: 10.1046/j.1365-313x.1998.00200.x. [DOI] [PubMed] [Google Scholar]
  18. Messing J, Crea R, Seeburg PH. A system for shotgun DNA sequencing. Nucleic Acids Res. 1981;9:309–321. doi: 10.1093/nar/9.2.309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Patracek ME, Nuygen T, Thompson WF, Dickey LF. Premature termination codons destabilize ferredoxin-1 mRNA when ferredoxin-1 is translated. Plant J. 2000;21:563–569. doi: 10.1046/j.1365-313x.2000.00705.x. [DOI] [PubMed] [Google Scholar]
  20. SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL. The paleontology of intergene retrotransposons of maize. Nat Genet. 1998;20:43–45. doi: 10.1038/1695. [DOI] [PubMed] [Google Scholar]
  21. SanMiguel P, Tikhonov A, Jin YK, Motchoulskaia N, Zakharov D, Melake-Berhan A, Springer PS, Edwards KJ, Lee M, Avramova Z et al. Nested retrotransposons in the intergenic regions of the maize genome. Science. 1996;274:765–768. doi: 10.1126/science.274.5288.765. [DOI] [PubMed] [Google Scholar]
  22. Soave C, Reggiani R, Di Fonzo N, Salamini F. Clustering of genes for 20 kd zein subunits in the short arm of maize chromosome 7. Genetics. 1981;97:363–377. doi: 10.1093/genetics/97.2.363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Soave C, Riggiani R, Di Fonzo N, Salamini F. Genes for zein subunits on maize chromosome 4. Biochem Genet. 1982;20:1027–1038. doi: 10.1007/BF00498930. [DOI] [PubMed] [Google Scholar]
  24. Song R, Llaca V, Linton E, Messing J. Sequence, regulation and evolution of the maize 22-kD α zein gene family. Genome Res. 2001;11:1817–1825. doi: 10.1101/gr.197301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Tatusova TA, Madden TL. Blast 2 sequences: a new tool for comparing protein and nucleotide sequences. FEMS Microbiol Lett. 1999;174:247–250. doi: 10.1111/j.1574-6968.1999.tb13575.x. [DOI] [PubMed] [Google Scholar]
  26. Van Hoof A, Green PJ. Premature nonsense codons decrease the stability of phytohemagglutinin mRNA in a position-dependent manner. Plant J. 1996;10:415–424. doi: 10.1046/j.1365-313x.1996.10030415.x. [DOI] [PubMed] [Google Scholar]
  27. Vieira J, Messing J. The pUC plasmids, an M13 mp7 derived system for insertion mutagenesis and sequencing with synthetic universal primers. Gene. 1982;19:259–268. doi: 10.1016/0378-1119(82)90015-4. [DOI] [PubMed] [Google Scholar]
  28. Wilson CM, Sprague GF, Nelsen TC. Linkage among zein genes determined by isoelectric focusing. Theor Appl Genet. 1989;77:217–226. doi: 10.1007/BF00266190. [DOI] [PubMed] [Google Scholar]
  29. Woo YM, Hu DW, Larkins BA, Jung R. Genomics analysis of genes expressed in maize endosperm identifies novel seed proteins and clarifies patterns of zein gene expression. Plant Cell. 2001;13:2297–2317. doi: 10.1105/tpc.010240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Yim YS, Davis G, Duru N, Musket T, Linton EW, Messing J, McMullen MD, Soderlund C, Polacco M, Gardiner J, Coe EH., Jr Characterization of three maize BAC libraries toward anchoring of the physical map to the genetic map using high density BAC filter hybridization. Plant Physiol. 2003;130:1686–1696. doi: 10.1104/pp.013474. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Plant Physiology are provided here courtesy of Oxford University Press

RESOURCES