Abstract
Establishing the architecture of the gene regulatory networks (GRNs) responsible for controlling the transcription of all genes in an organism is a natural development that follows elucidation of the genome sequence. Reconstruction of the GRN requires the availability of a series of molecular tools and resources that so far have been limited to a few model organisms. One such resource consists of collections of transcription factor (TF) open reading frames (ORFs) cloned into vectors that facilitate easy expression in plants or microorganisms. In this study, we describe the development of a publicly available maize TF ORF collection (TFome) of 2034 clones corresponding to 2017 unique gene models in recombination-ready vectors that make possible the facile mobilization of the TF sequences into a number of different expression vectors. The collection also includes several hundred co-regulators (CoREGs), which we classified into well-defined families, and for which we propose here a standard nomenclature, as we have previously done for TFs. We describe the strategies employed to overcome the limitations associated with cloning ORFs from a genome that remains incompletely annotated, with a partial full-length cDNA set available, and with many TF/CoREG genes lacking experimental support. In many instances this required the combination of genome-wide expression data with gene synthesis approaches. The strategies developed will be valuable for developing similar resources for other agriculturally important plants. Information on all the clones generated is available through the GRASSIUS knowledgebase (http://grassius.org/).
Keywords: Zea mays, maize, transcription factor, recombination-ready, yeast one-hybrid, GRASSIUS, gene regulatory network, grasses, yeast two-hybrid
Introduction
Control of transcription is a process of fundamental biological importance to all organisms. Transcription factors (TFs) are proteins that recognize specific cis-regulatory elements and activate or repress the transcription of specific sets of target genes by interacting with other TFs, co-regulators (CoREGs), chromatin modifiers and components of the basal transcription machinery. In plants, roughly 5–7% of all protein-coding genes correspond to TFs (Riano-Pachon et al., 2007; Yilmaz et al., 2009). Understanding the function of TFs often requires establishing the DNA sequences that they recognize, identifying direct target genes using techniques such as yeast one-hybrid or chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-Seq), identifying partners using yeast two-hybrid or co-immunoprecipitation approaches or using TFs in transient or stable transformation experiments to determine the effect of TF activity on gene expression. Such approaches can be performed in the absence of mutants of the TF gene, which may be difficult to obtain. Most of these approaches require the availability of a clone containing the entire open reading frame (ORF) for the TF. We refer to the collection of all (or a significant set of) TF ORFs for an organism as the TFome.
Several TFomes have been developed for Arabidopsis (Paz-Ares, 2002; Gong et al., 2004; Castrillo et al., 2011; Ou et al., 2011), which are available through the Arabidopsis Biological Resource Center (ABRC). These collections are in the recombination-ready vectors of the Gateway® system, allowing the rapid transfer of the ORFs to an increasing number of Gateway®-compatible vectors, such as those suitable for expression in yeast, bacteria or plants (Karimi et al., 2002; Curtis and Grossniklaus, 2003; Deplancke et al., 2004; Earley et al., 2006). These collections have been utilized in yeast-one hybrid studies to identify a regulator that participates in a transcriptional feedback loop of the Arabidopsis circadian clock (Pruneda-Paz et al., 2009), in yeast two-hybrid studies to identify interactors with components of the Arabidopsis mediator complex (Ou et al., 2011) and in protoplast transactivation studies to identify negative regulators of genes involved in the abiotic stress response (Wehner et al., 2011). They have also been used for the development of protein microarrays (Gong et al., 2008) and Arabidopsis TF ORF over-expression lines (Weiste et al., 2007; Coego et al., 2014). As testimony to the utility of Arabidopsis TFome collections and individual clones for the research community, clones for the PKU-Yale collection (Gong et al., 2004) were ordered more than 5000 times and the entire 1152-clone collection was requested nine times from the ABRC.
The advent of partial or complete sequences for the genomes and transcriptomes for a large number of plants makes it conceivable that, in the near future, TFomes for other plants will become available, significantly accelerating basic research as well as biotechnological applications. However, so far, similar resources are not available for plants of agricultural importance.
Here, we describe the development of a publicly available TFome in recombination-ready vectors comprising 2017 unique maize TFs and CoREGs. The collection was generated from a combination of full-length cDNAs (FL-cDNA) obtained from mRNA from the Arizona Genomics Institute (AGI) (Soderlund et al., 2009) by reverse-transcription followed by PCR (RT-PCR), from genomic DNA for genes with no introns and by de novo gene synthesis using GeneArt® technology. The difficulties caused by the high GC content of maize genes (Schnable et al., 2009) required codon optimization for gene synthesis. Three codon bias models (maize, yeast, maize–yeast) were evaluated for a subset of maize genes for expression in yeast and maize cells. We found that maize models performed well in both biological systems. The generated sequences, synthesis information and request links for the comprehensive maize TFome collection are available through GRASSIUS (http://grassius.org/tfomecollection.html). Collectively, this newly available maize resource will contribute to a better understanding of GRNs.
Results
From maize genome sequence to recombination-ready transcription factor clones
To develop the maize TFome, we first targeted maize genes that had been identified and curated as TFs in the grass regulatory information server, GRASSIUS. A TF, according to current definitions, is a protein that contains a characteristic structural motif, a DNA-binding domain, which is involved in recognizing a specific DNA sequence (Yilmaz et al., 2009).
In addition, we also targeted for cloning proteins that affect transcription through association with TFs or chromatin (called here CoREGs). Co-regulators can control gene expression by a number of different mechanisms including chromatin remodeling, histone modifications or by associating with TFs through protein–protein interactions and affecting TF activity. From the maize genome we identified 24 CoREG families (Table1) using previously described sets of rules (Perez-Rodriguez et al., 2010) as well as a careful analysis of the literature. Since there are currently no general rules for naming CoREGs, we named them by extending the nomenclature guidelines put forth by researchers for grass TFs (Gray et al., 2009). Information on maize CoREGs (independent of whether they are represented in the TFome or not) and the proposed nomenclature rules are part of the Grass CoRegDB database in GRASSIUS (http://grassius.org/grasscoregdb.html). Similarly, because we call the collection of TF ORFs the TFome, we refer to the collection of all CoREG ORFs as the CoREGome.
Table 1.
Families, nomenclature guidelines and domain rules for the collection of maize co-regulator open reading frames (CoREGome)
| Family (number of members) | Maize nomenclature | Must contain the PFAM domain | Must not contain the PFAM domain |
|---|---|---|---|
| AUX/IAA family (44) | ZmIAA | AUX_IAA | Auxin_resp |
| BSD family (10) | ZmBSD | BSD | N/A |
| Co-activator p15 (3) | ZmKELP | PC4 | N/A |
| DDT (6) | ZmDDT | DDT | Homeobox |
| FHA (18) | ZmFHA | FHA | N/A |
| GNC5 related histone N-acetyltransferases (45) | ZmHAG | Acetyltransf_1 OR Acetyltransf_3 | PHD |
| High Mobility Group (14) | ZmHMG | HMG_box | ARID |
| Interact with SP6 (2) | ZmIWS | TFIIS_C and TFIIS_M | N/A |
| LIM (14) | ZmLIM | LIM | N/A |
| LUG (3) | ZmLUG | LUFS | N/A |
| Multiprotein bridging factor (3) | ZmMBF | MBF1 | N/A |
| Mediator subunit 6 (1) | ZmMED6 | Med6 | N/A |
| Mediator subunit 7 (2) | ZmMED7 | Med7 | N/A |
| Mediator subunit 26 (22) | ZmMED26 | Med26 | N/A |
| Mediator subunit 31 (1) | ZmMED31 | Med31 | MED31 |
| Retinoblastoma related (5) | ZmRB | RB_A and RB_B | N/A |
| Rcd1-like (10) | ZmRcd1L | Rcd1 | N/A |
| SNF2 (39) | ZmSNF2 | SNF2_N | PHD or AP2 |
| SWI/SNF-SWI3 (4) | ZmSWI3 | SWIRM | Myb_DNA-binding |
| SWI/SNF-BAF60 (24) | ZmBAF60 | SWIB | N/A |
| TAZ zinc-finger (7) | ZmTAZ | zf-TAZ | N/A |
| TRAF(46) | ZmTRAF | BTB | zf-TAZ,BACK,MATH, or NPH3 |
| Ultrapetala (2) | ZmULT | ULT | N/A |
| WD40 (1) | ZmWD40 | WD40 | N/A |
N/A, not applicable.
Phylogenetic analysis of maize CoREG families
Amino acid sequence conservation provides a powerful tool for confirming the gene models used in the generation of a TFome collection. Phylogenetic trees are useful for establishing evolutionary relationships and helping to determine which genes might correspond to paralogs/orthologs of others. Comprehensive evolutionary analyses (with the corresponding phylogenetic trees) have been published for many families of TFs (Bharathan et al., 1999; Theissen et al., 2000; Dias et al., 2003; Zhang and Wang, 2005; Zhou et al., 2012). However, little is know about the evolutionary relationships within maize CoREG families. Thus, we aligned the sequences of the CoREGs and used them for the generation of phylogenetic trees. The value of the CoREGome is widened by such phylogenetic analysis and will assist others in defining the number and function of CoREGs present in maize and other grass species.
As an example of how we used comparative sequence and phylogenetic analyses to inform us about the development of the CoREGome, we describe here two cases corresponding to the largest families of maize CoREGs. The HAG family of CoREGs comprises a diverse set of proteins characterized by histone acetyltransferase activity that employs a common acetyl donor, acetyl-coenzyme A (acetyl-CoA). They share the acetyl-CoA-binding site but vary considerably in other domains, reflecting their remarkable substrate specificity. The HAGs have demonstrated roles in both long- and short-term epigenetic regulation of chromatin modification and transcriptional switching mechanisms (Bharathan et al., 1999; Benhamed et al., 2006; Servet et al., 2010). In rice, a recent analysis revealed at least four main families of histone acetyltransferases (CBP, TAFII250, GNAT/HAG and MYST) present in the genome (Liu et al., 2012). The HAG family can be further subdivided into the GCN5, HAT1 and ELP3 subfamilies, typified by OsHAG702, OsHAG704 and OsHAG703, respectively (Liu et al., 2012). In maize, an initial bioinformatics analysis assigned 43 possible members to the HAG family. Of these, seven exhibited uncertain models and were not cloned (GRMZM2G050137, GRMZM2G114184, GRMZM2G135849, GRMZM2G136389, GRMZM2G302778, GRMZM2G458082, GRMZM5G813007). The remaining members were cloned and their sequences used for phylogenetic analyses. Because of the sequence diversity, full-length sequence alignments are meaningless, so a region surrounding the acetyl-CoA-binding site was used as the operational taxonomic unit (Dyda et al., 2000). The alignments revealed at least three subfamilies that could be defined with the aid of OsHAG702, OsHAG703 and OsHAG704 (Figure S1 in Supporting Information). OsHAG702 is 89.4% similar to ZmHAG23, and an alignment of the conserved domain (cd04301) was used to define other GCN5 subfamily members (Figure S1a). Similarly, OsHAG703 was found to be 97% similar to ZmHAG11, and a total of nine genes were assigned to the ELP subfamily (Figure S1b). The remaining 10 HAG proteins exhibited an acetyl-CoA-binding site (pfam00583) more similar to that of OsHAG704, and were therefore assigned as members of the HAT1 subfamily (Figure S1c). Further analyses will be required to define what other motifs are shared by members of these subfamilies, and as is the case for TFs, further subdivisions may be warranted in some cases.
In contrast to the analysis of the HAG family, which resulted in significant new information regarding evolutionary relationships between family members, the analysis of the AUX/IAA family of CoREGs largely confirmed previous studies (Wang et al., 2010; Ludwig et al., 2013), with a few notable exceptions (Figure S2). A small number of genes (ZmIAA27, −28, and −33) were revised using RNA-Seq data to correct errors in existing gene models. We also identified five more family members that were not included in previous analyses (Ludwig et al., 2013) and are part of the phylogenetic reconstruction. These models include ZmIAA43, which is similar to ZmIAA34. ZmIAA37 and ZmIAA38 differ in the amino-terminal regions, and appear to represent new family members. ZmIAA44 is very short and a gap in the genomic sequence upstream of the model suggests an incomplete model. In this phylogenetic analysis (Figure S2), ZmIAA42 appears to have several shared motifs, but its sequence has drifted significantly from other family members (thus its position as an outlier) leading to the possibility that this gene is a pseudogene. Phylogenetic analyses based on validated sequences, such as those described here, will continue to provide a solid basis on which to confirm or deny existing gene models and discover new members worthy of investigation.
The cloning process
When the annotation of the maize B73 reference genome (release 5b.60) indicated the presence of multiple potential transcripts encoding different ORFs for a single TF or CoREG, the longest transcript supported either by the presence of a FL-cDNA, an expressed sequence tag (EST) or data derived from RNA-Seq experiments (Figure1a) was targeted. The rationale for selecting the longest ORF was that it codes for the maximum protein interaction space for identifying the protein–DNA and protein–protein interactions that comprise GRNs (Deplancke et al., 2006; Brady et al., 2011). It is also technically easier to delete an alternatively spliced exon if needed than it is to insert one by site-directed mutagenesis. Three main types of DNA templates were used for amplifying and cloning of TF or CoREG ORFs:
From FL-cDNAs. To identify for which TFs or CoREGs the FL-cDNA was available at the Arizona Genomics Institute, we performed tBLASTn searches of the curated sequences in GRASSIUS against the entire maize FL-cDNA collection, and ordered those FL-cDNA clones that had 100% identity, or which showed fewer than three non-synonymous differences from the B73 reference genome, and which were still encoding putative TFs (Figure1a). Alternatively spliced isoforms not represented by a transcript model were also targeted, as long as they maintained the reading frame of the original transcript.
From genomic DNA (gDNA). Gene models in which the predicted ORF was not interrupted by introns were amplified directly from gDNA.
From complementary DNA (cDNA). For gene models in which the predicted ORF was interrupted by one or more introns, we thoroughly validated splicing patterns manually using RNA-Seq data from 17 different maize tissues, visualized in Integrative Genomics Viewer (IGV) (Thorvaldsdottir et al., 2013) (see Table S1). The majority of the gene models that were targeted to be amplified from cDNA (62%, corresponding to 533 models) had at least one annotated transcript that had its structure validated by available maize RNA-Seq data. We targeted the ORF of the longest transcript supported in these cases. The RNA-Seq data supported alternative structures for 242 of the gene models targeted for amplification from cDNA, and an additional 87 models had low or no reads. Of those 87 gene models, 68 were predicted either entirely or partially ab initio. Hence, while some may represent true, lowly expressed genes, others may correspond to pseudogenes. For a subset of the clones for which RNA-Seq did not support the public gene models, new gene models that agreed with the experimental data were generated. These new gene models corresponding to successfully cloned ORFs have been provided to the maize genome annotation team, contributing to genome improvement.
Figure 1.

Flowchart used for the generation of the maize TFome and CoREGome. (a) Flowchart describing the strategy for template identification for PCR amplification of transcription factor (TF) open reading frames (ORFs). FL, full length; AGI, Arizona Genomics Institute.(b) Distribution of template sources for the ORFs in the maize TF ORF collection (TFome).
When generating new gene models, the 5′-most ATG in the first predicted exon was chosen as the translation start codon. The last in-frame stop codon in the last exon with RNA-Seq support was chosen as the stop codon. For genes that exhibited more than one splice variant, the longest one with the most support for its structure based on RNA-Seq was chosen for cloning.
The corresponding ORFs were amplified by reverse-transcriptase PCR (RT-PCR) from RNA isolated from various maize tissues (Table S2) from which ORFs were identified using the tissue-specific maize RNA-Seq expression data available at qTeller (http://qteller.com/qteller3/). This approach, which is also referred to as directed RT-PCR or ‘rescue PCR’, was previously used to generate human ORFs in the Mammalian Gene Collection project (Temple et al., 2009). While a variety of maize tissues were used to ensure gene amplification, young seedlings proved to be a reliable source of transcripts for about 37% of the cloned genes. In general, RT-PCR required more optimization than approaches based on plasmid and genomic DNA. However, we were successful in cloning some genes by RT-PCR with fragments per kilobase of exon per million fragments mapped (FPKM; Trapnell et al., 2010) values less than 10. We did not include tissues obtained from plants subjected to biotic or abiotic stress conditions in our protocol, but such samples may be valuable for the isolation of additional TFs or CoREGs.
Primers were synthesized to amplify the corresponding ORFs without the respective stop codons, and an additional 5′-CACC-3′ nucleotide tail added to the forward primer, for directional cloning into the pENTR® vectors. The PCR products were generated using a high-fidelity polymerase and cloned into the pENTR®/D-TOPO® or pENTR®/SD/D-TOPO® vectors. It is important to note that GRASSIUS contains information for each clone regarding the primers that were used for cloning and other conditions essential to reproduce what is described here.
The entire collection of maize gene regulatory proteins can be subdivided into a set of 58 TF families with known DNA-binding motifs that we named the Maize TFome and a set of 24 CoREG families referred as the Maize CoREGome (summarized in Figure2; further information about each protein family may be found in GRASSIUS). A total of 3022 gene models were identified in this study, of which 89% correspond to TFs and 11% to CoREGs. Thus far, 67% of the TFome and 57% of the CoREGome collections have been cloned into Gateway® entry vectors (Figure2). Transcription factors under the Orphan classification conform to some but not all of the rules required for inclusion in a TF family. It is expected that as similar efforts progress in other plants it will become possible to assign these Orphan TFs to new families.
Figure 2.

Summary of the gene content of the maize transcription factor (TF) open reading frame (ORF) collection (TFome).(a) Maize TFome families.(b) Maize co-regulator open reading frame collection (CoREGome) families. The black bars indicate the number of gene models currently assigned to each protein family and the gray bars indicate the current state of progress towards cloning all of the these genes into Gateway® entry vectors. Further descriptions of each TF family may be found at the GRASSIUS website (http://www.grassius.org).
Codon-optimized total synthesis
We opted for gene synthesis in cases when traditional cloning was deemed to be unfeasible for all validated ORFs targeted for amplification via RT-PCR with RNA-Seq expression values of <10 FPKM and ORFs that were recalcitrant to cloning from FL-cDNA, genomic DNA (gDNA), or by RT-PCR. Gene synthesis was performed by GeneArt®, using a codon optimization process, which is required for any complex or GC-rich sequence, and to increase expression.
To determine which codon optimization would provide adequate expression in yeast as well as in maize, two of the systems in which we anticipated this collection to be maximally used, we tested three TF genes with three codon usage optimizations: maize (M), yeast (Y) and an intermediate between both (YM) (Figure S3). Thus, for each of these three TFs (numbers 34, 75 and 93; Figure S3), the three synthetic constructs were cloned into the pDONR®221 vector with the linker regions of the pENTR®/SD/D-TOPO® vector synthesized between the att recombination sites, making the vectors functionally identical (Figure3). The clones were subsequently recombined into the p1511 vector (see Experimental Procedures) harboring the constitutive CaMV 35S promoter (p35S) and an in-frame green fluorescent protein (GFP) sequence, and into the yeast vector pBD-GAL4-GW-C1 derived from pDB-GAL4® by insertion of the Gateway® cassette, making it recombination-ready (Machemer et al., 2011). Expression in yeast of these proteins was investigated by transforming the respective plasmids into the strain pJ69.4a (James et al., 1996), commonly used in yeast two-hybrid experiments. After selection in media lacking tryptophan, expression was evaluated by Western blotting using a commercial antibody that recognizes the GAL4 DNA-binding domain. The N-terminal region of the maize transcription factor R, which we previously used in yeast two-hybrid experiments, was used as positive control (Figure4a) (Hernandez et al., 2004, 2007; Kong et al., 2012). All clones with maize codon optimization showed the lowest, but still significant, expression in yeast, with no appreciable difference between the Y and YM optimizations (Figure4a); even protein expression of maize-optimized clone 93, which was the lowest, reached levels proven to be adequate for most applications (#93M in Figure4a).
Figure 3.

Schematic diagram of the strategy and constructs utilized for protein fusion expression experiments in pJ69.4a yeast cells or maize protoplasts.pENTR®SD/D-TOPO® vectors harboring synthesized open reading frames with stop codons were used as entry constructs for recombination into the p1511 and pBD-GAL4 vectors using LR Clonase, to generate GFP and GAL4-BD protein fusions, respectively.
Figure 4.

Impact of codon usage on transcription factor (TF) expression in yeast and maize. (a) Yeast cells were transformed with different constructs on SD-Trp to evaluate growth depending on the different codon usage. Western blot of yeast protein extracts was done from the strains expressing GAL4-BD fused TFs. The blot was probed with the GAL4-BD monoclonal antibody (SC510). Ponceau Red staining was used as a loading control. #34, GRMZM2G162434; #75, GRMZM2G001875; #93, GRMZM2G051793; M, maize optimized open reading frame (ORF); Y, yeast optimized ORF; YM, a hybrid optimized construct for both maize and yeast; R1–252, the N-terminal region of maize R.(b) Fluorescent micrographs showing the expression and localization of N-terminal GFP fusions of different constructs optimized for expression in yeast (Y), maize (M) and hybrid (YM) for each of the following transcription factors: GRMZM2G162434, GRMZM2G001875, GRMZM2G051793.(c) Graph bar showing the relative GFP intensity level for each construct tested.
Expression in maize protoplasts of the nine clones (in biological triplicates) was followed by the microscopic observation (Figure4b) and quantification of green fluorescence (Figure4c). Notably, no statistically significant differences in the level of fluorescence were observed in all the clones for the three different genes studied and all the codon optimizations tested (Figure4c). Based on these results, and considering that the most likely application of this collection is to investigate the biological function of TFs in plants, the de novo synthesized 621 ORFs were pursued using maize codon optimization.
Sequence variants present in maize TFome and CoREGome clones
Sequence discrepancies between the cloned ORF sequences and the annotated gene models were allowed into the collection as long as the reading frame was maintained and contained no nonsense mutations. The majority (92%, 1878 clones) contained no differences, beyond synonymous substitutions, from the B73 reference genome (release 5b.60), and 45 clones had only synonymous substitutions. These deviations from the annotated gene models could correspond to PCR mistakes, variations between the B73 line used and the reference genome, or errors in the maize genome sequence used (release 5b.60). Out of the 156 clones with nucleotide sequence deviations that resulted in amino acid changes or insertions or deletions from the annotated gene models, 45% (71 clones) were reflected in the sequence of the template FL-cDNA, publicly available EST sequences, the maize HapMap2 (Chia et al., 2012) or the available RNA-Seq data (Table S3). All sequence variants were recorded relative to the translation start of the canonical targeted transcript, as well as their presence or absence in annotated functional domains in the note column of Data S1.
Clone availability, information and distribution
Information for each TFome clone, including clone sequence, template, GenBank ID and family, is provided in Data S1. Additional information, including primers used, established nomenclature and gene models is available at GRASSIUS (http://grassius.org/tfomecollection.html). While the first 121 clones were deposited and are available from Addgene (https://www.addgene.org/), due to the large size of this collection further deposits using this mechanism were not possible. Hence, the entire entry clone collection was donated to the ABRC where clones can be ordered individually or as an entire collection for the use of the entire community. Links to order individual clones, and the entire set, have been provided in Data S1.
Discussion
The generation of this initial 2034-clone maize TFome/CoREGome collection provides an unparalleled resource for functional genomics studies in maize and other grasses. The availability of this collection in recombination-ready vectors and the ease by which they can be transferred to yeast or maize expression vectors (Figure3) are indicative of the potential application of these clones for a variety of purposes.
Recommendations for TFome development in other crop species
The generation of TFomes/CoREGomes in other crop species would be valuable in enabling cross-species validation and the application of gene regulatory information to crop improvement. From our study it became apparent that significant challenges arise when gene models are inaccurate. In particular, alternative splice models may not be biologically relevant and some gene models may represent pseudogenes (Pei et al., 2012). Also, some gene models are collapsed due to recent genome duplication events, and thus can represent more than one gene. It is recommended that all gene models are first validated by FL-cDNAs or RNA-Seq data derived from a variety of tissues and environmental growth conditions. For genes whose expression levels remain low (FPKM < 10), phylogenetics and comparative genomics as well as synteny can be beneficial in validating models and aiding primer design. According to our previous studies in maize seeds (Morohashi et al., 2012), the average TF expression corresponds to about 50 FPKM, with the majority of the TFs showing FPKM values from 1 to 100. The generation of the maize TFome and CoREGomes and the companion GRASSIUS database will serve as a useful model resource for others embarking on such projects.
Contribution of the TFome and CoREGome collections to improving the annotation of the maize genome
The utility of the TFome/CoREGome extends beyond the usefulness of the physical clones for experimentation as they also provide validation of gene models predicted to exist in the maize genome. The models in the maize genome (release 5b.60) were predicted using both cDNA/EST/protein/orthologous mRNA sequence information and ab initio (by FGENESH). Although EST information existed for many gene models, we were able to identify FL-cDNAs from the AGI collection for 45% of the TF/CoREG genes cloned here. This underscores the general observation that transcripts for many TF genes are rare. Therefore, sequence information gleaned from the PCR rescue of at least 12% of the new TFome or CoREGome collections will assist in annotation of the corresponding gene models. This percentage should increase, as many of the remaining TFs/CoREGs are anticipated to be cloned via PCR rescue. Of the 244 genes obtained by PCR rescue so far, about 5% exhibited variants from the annotated transcript. Of the remaining 618 gene models targeted for amplification by RT-PCR, 230 have transcript structures supported by our RNA-Seq analyses yet not represented in the models of B73 RefGen_v2.
Some of the observed variation involved alternative splice sites that encode unannotated TF/CoREG variants. For example, RT-PCR uncovered a splice variant of the GRMZM2G475305 gene that was not in the annotated gene model. The splice variant involved the removal of an 85-bp intron (Figure5). The alternative splice variant that we cloned was supported by 12% of the reads in one or several RNA-Seq datasets used (Table S1). Interestingly, the alternative splice variant produces a truncated ZmNLP6 TF that retains a DNA-binding motif, but due to an earlier stop codon, a C-terminal PB1 protein–protein interaction motif is no longer present. Several responses in plants are orchestrated by NLP family members, including the nitrate response (Jeong et al., 2011; Waki et al., 2011; Marchive et al., 2013). The removal of a PB1 motif by alternative splicing may alter the stabilization or localization of the ZmNLP6 TF in vivo. Thus, our PCR rescue may not only provide better gene annotation but possibly also the raw material for further gene regulatory insights. It is estimated that 19% of the genes in maize undergo alternative splicing, but the functional meaning of such events has barely been investigated (Barbazuk et al., 2008). Ultimately the accuracy of gene annotation will increase as larger RNA-Seq and CAGE datasets (Takahashi et al., 2012) become available and are incorporated during model refinement. For this project, the aim was to clone at least one splice variant of every TF, but as the genome annotation improves it will become more feasible to pursue the cloning of all splice variants.
Figure 5.

Example of an alternative splice variant of the GRMZM2G475303 transcript that encodes ZmNLP6. In the gene diagram the white bars indicate exons and the lines represent introns. The gray and black shaded regions represent the RWP-RK DNA-binding motif (pfam02042) and a PB1 protein–protein interaction motif (pfam00564), respectively. The removal of a 85-bp intron in the alternative splice variant is predicted to result in an earlier stop codon and a deletion of the PB1 domain in the translated protein. aa, amino acids.
A need for clone repositories
The size of this collection makes it essential that a dedicated center/institution is responsible for the propagation, indexing and distribution of clones, since this can take a significant toll on any investigator without the proper infrastructure. Addgene was a natural repository, but we did not realize that the volume of the collection, coupled with the expectation that individual clones would be ordered only occasionally, made it difficult for Addgene to accept it. Fortunately, the ABRC generously agreed to propagate and distribute individual clones and the entire collection, but this is clearly not within the mission of the center. Although the advent of more cost-effective gene synthesis reduces the hurdle of obtaining full-length clones for individual genes, it remains a significant barrier for high-throughput programs that study protein–protein interactions, protein structure and protein function. By making collections such as this TFome available to many researchers there is a community benefit of lower cost and more uniform quality while enabling more rapid progress to be made in research (Soderlund et al., 2009). We anticipate that the maize TFome will be the first of many collections to be developed for crop species and a centralized distribution center modeled on that of the Integrated Molecular Analysis of Genomes and their Expression (IMAGE, http://www.imageconsortium.org/) Consortium.
Experimental Procedures
Phylogenetic analysis of co-regulators from maize
Each of the cloned HAG maize CoREGs was scanned for the presence of an acetyl-CoA-binding site using the Conserved Domain Database (CDD). This region (approximately 75–90 amino acids) was then used as the operational taxonomic unit for phylogenetic analysis. Related proteins were aligned using ClustalW (Larkin et al., 2007) with default parameters implemented using the megalign® program within the dnastar Lasergene® package (http://www.dnastar.com). In the case of the AUX/IAA CoREGs, FL amino acid sequences were aligned using MUSCLE (Edgar, 2004) with default parameters (VTML 200 substitution matrix with a gap opening penalty of −2.9 and a gap extension penalty of 0). The maximum number of iterations was eight with UPMGB as the cluster method and ClustalW for sequence weighting. The initial distance measure was Kmer 6-6 and Kimura % identity used for later iterations. Each multiple sequence alignment was exported in Newick format and phylogenetic trees displayed using FigTree V1.4.0. (http://beast.bio.ed.ac.uk/software/figtree/).
Construction of the TF library
Constructs were cloned into either the pENTR®/D-TOPO® or pENTR®/SD/D-TOPO® vector. Targeted TF or CoREG ORFs were typically amplified using high-fidelity Phusion® polymerase (New England Biolabs, https://www.neb.com/) using the manufacturer's GC buffer supplemented with 10% glycerol. For many genes further optimization was required through the addition of up to 10% DMSO, up to 4 mm MgCl2 or adjustment of the annealing temperature. Primers were designed to produce a fragment without a stop codon, and with a 5′-CACC nucleotide tailing added to the forward primer as required for directional cloning into the pENTR® vectors. Kanamycin-resistant colonies were then screened for inserts of the correct size and orientation by colony PCR or by restriction digestion of candidate plasmids. The sequence of cloned inserts was confirmed by single-read paired-end Sanger dideoxy sequencing. For clones longer than 1200 bp, internal primers were designed to complete the sequence confirmation. A workflow of the cloning process is provided in Figure S4.
Plant materials
The B73 inbred of maize (propagated from original stock that was obtained from Dr Guri Johal, Purdue University, West Lafayette IN, USA) was grown in a field nursery in 2012 (USDA-OARDC Northwest Research Station, Custer, OH, USA) and DNA was isolated for PCR amplification of single-exon genes or RNA for RT-PCR.
Pilot to determine codon optimization strategy for synthesized ORFs
pENTR®SD/D-TOPO® clones harboring synthesized coding sequences (CDSs) corresponding to GRMZM2G162434, GRMZM2G001875 and GRMZM2G051793 and optimized according to maize, yeast or hybrid preference were delivered by Invitrogen (http://www.invitrogen.com/). LR Clonase® was used to recombine these CDSs into a GAL4-BD-GWC1 vector to fuse them with the yeast GAL4-binding domain (BD), and alternatively to generate N-terminal fusions with green fluorescent protein (GFP) into plasmid p1511 under the control of the 35S promoter to test for expression in maize protoplasts.
For Western blot analysis, PJ69.4a yeast cells were transformed with each GAL4-BD-GWC1 construct using the lithium acetate/single-stranded (LiAc/SS) carrier DNA/polyethylene glycol method (Gietz and Schiestl, 2007). Cells were grown in 5 ml of synthetic medium without tryptophan (SD – Trp) at 30°C to OD600 = 0.6. They were subsequently centrifuged at 13 500 g for 1 min at room temperature (25°C) and re-suspended in 100 μl H2O followed by the addition of 100 μl NaOH (0.2 m) and incubated for 5 min at room temperature. Cells were centrifuged at 13 500 g for 1 min at room temperature and the pellet was re-suspended in 50 μl of SDS loading buffer [0.06 m 2-amino-2-(hydroxymethyl)-1,3-propanediol (TRIS)-HCl, pH 6.8, 10% (v/v) glycerol, 2% (w/v) SDS, 5% (v/v) β-mercaptoethanol, 0.0025% (w/v) bromophenol blue] and heated to 95°C for 5 min. The suspension was then centrifuged at 13 500 g, and 15 μl of extract was loaded and analysed by 10% SDS-PAGE. The membrane was blocked for 1 h in 5% BSA TRIS Base saline-Tween 20 buffer (TBS-T) Tween (0.05%), followed by 1:1000 dilution of primary antibody (DBD-AB, SC-510, Santa Cruz, http://www.scbt.com/) in 1% BSA–TBS-Tween (0.05%) for 12 h at 4°C, washed three times for 10 min each using 1 × TBS-Tween 20 (0.1%), followed by 1:4000 dilution of secondary antibody (ECLTM Anti-mouse IgG Horseradish, NA931V, GE Healthcare UK, http://www3.gehealthcare.co.uk/) in 1% BSA-TBS-Tween (0.05%) for 1.5 h at room temperature, washed three times for 10 min each using 1 × TBS-Tween 20 (0.1%), and exposed for 15 sec for the proteins of the optimized clones for genes 34 and 75, and for 30 min for the proteins of the optimized clones for gene 93. The N-terminal region of R (R1–252) (ZmbHLH1, GRMZM5G822829) fused with GAL4 BD was used as the positive control (Hernandez et al., 2004).
To prepare maize protoplasts for transformation, second or third leaves from 13-day-old B73 × Mo17 F1 hybrid etiolated seedlings were chopped and digested in 3% cellulase RS, 0.6% macerozyme R10 (Yakult Honsha Co., http://www.yakult.co.jp/english/), 0.6 m mannitol, 10 mm KCl, 10 mm 2-(N-morpholine)-ethanesulfonic acid (MES) (pH 5.7), 5 mM CaCl2 and 0.1% w/v BSA for 20 min under vacuum followed by gentle shaking (30 g) for 150 min at 25°C in the dark. Protoplasts were released at 80 g and filtered through a 35-μm nylon mesh followed by centrifugation at 150 g for 1 min. Protoplasts were washed in ES buffer (0.6 m mannitol, 5 mm MES, pH 5.7, 10 mm KCl) and counted with a hemocytometer. Electroporation was carried out on about 105 protoplasts with 40 μg of total DNA per transformation for the transient luciferase assays and 20 μg for the transient complementation assays, using a 100 V cm−1 pulse, 10 msec pulse length for two pulses with a BTX Electro-Square-Porator T820 (http://www.btxonline.com/). After electroporation, protoplasts were incubated for 18–22 h in the dark at room temperature before further analyses. Transformation efficiency was estimated following GFP expression under a Nikon Eclipse E600 fluorescence microscope (Nikon Co., http://www.nikon.com/) equipped with an argon lamp (Chiu Technical, http://www.chiutech.com/) with a 20× amplification and a gain of 1. Fluorescence was quantified using intensity measurement of the entire protoplast area using Image J software (http://rsbweb.nih.gov/ij/). No significant difference was observed between maize optimized CDS, yeast optimized CDS and hybrid optimized CDS of all three TFs (one-way anova, P > 0.05).
DNA and RNA isolation
Genomic DNA was isolated from B73 plant tissue by a cetyltrimethylammonium bromide (CTAB)-based method (Hulbert and Bennetzen, 1991). Total RNA was isolated from non-starchy B73 tissues using RNAzol® (Sigma, http://www.sigmaaldrich.com/) according to the manufacturer's suggested protocol. For the isolation of RNA from starchy tissues such as developing seeds, the protocol of Li and Trick (2005) was employed, except that buffer II had the following composition: 4.2 m guanidine thiocyanate, 1 m sodium acetate (pH 4.0) stock solution, 0.5% lauryl sarcosine and 25 mm sodium citrate (pH 7.0). The RNA integrity was verified by agarose gel electrophoresis and visualized by ethidium bromide staining.
Synthesis of cDNA and RT-PCR
Prior to cDNA synthesis, total RNA samples were first treated to remove any contaminating DNA with Turbo DNA-free® (Ambion/Life Technolgies Corp., http://www.lifetechnologies.com/) according to the manufacturer's suggested protocol. Complementary DNA was synthesized using Maxima H Minus® Reverse Transcriptase (Thermo Fisher Scientific Inc., http://www.thermofisher.com/) according to the manufacturer's protocol. The integrity of cDNA was determined by amplifying a 1001-bp portion of a ZmGAPDH (GRMZM2G046804) transcript using the primers ZmGAPDH_F (5′-ATGCAGGCAAGATTAAGATCGGAATCAAC-3′) and ZmGAPDH_R (5′-CATGTGGCGGATCAGGTCGAC-3′). The absence of a larger 2817-bp PCR product confirmed the removal of genomic DNA. The TFs were amplified by RT-PCR from tissues in which they were expected to be most highly expressed as determined from the RNA-Seq dataset stored at the qTeller database (http://www.qteller.com/). The RT-PCR was conducted using conditions as described above for construction of the TF/CoREG libraries.
Acknowledgments
We very much appreciate the willingness of the Arabidopsis Biological Resource Center (ABRC), and in particular Dr Jelena Brkljacic and Diana Shin, for accepting the TFome collection for storage, propagation and distribution. We thank Drs Han Zhang, Yvonne Ludwig and Kris Nemeth for providing clones and plasmids. We thank Diego Mauricio Riaño-Pachón for his assistance and feed back with curation of the co-regulator families, and for providing us with information on the LUFS PFAM domain. Special thanks to the invaluable work of more than 300 University of Toledo undergraduate students who participated in the FIRE (Fostering the Integration of Research with Educational laboratory classes) program, as well as Azam Abdollahzadeh, Andrew Reed, Erik Mukundi, Evans Kataka, Narmer Fernando Galeano Vanegas, Flavia Santos, Hai-Dong Yu, Jeffrey Campbell, Jennifer Carstens, Katja Machemer-Noonan, Kelly Scarberry, Kengo Morohashi, Kristen Belesky, Maria Tobias, Noor Zayed, Thais Andrade and Tomoe Kusayanagi and SiGuE (Success in Graduate Education, http://www.sigue-caps.org/) fellows Miriam Mills and Gilbert Kayanja, for their outstanding contributions in team-cloning. Michael dos Santos Brito thanks FAPESP (São Paulo Research Foundation) for postdoctoral fellowship BEPE 2012/20486-2. Support for this project was provided by NSF IOS-1125620 to JG, AID and EG.
Conflict of Interest
The authors declare no conflict of interest.
Supporting Information
Additional Supporting Information may be found in the online version of this article.
Figure S1. Phylogenetic analysis of HAG co-regulators from maize.
Figure S2. Phylogenetic analysis of AUX/IAA co-regulators from maize.
Figure S3. Nucleotide sequence alignments showing the codon usage of the three synthetic codon optimized constructs for numbers 34 (GRMZM2G162434), 75 (GRMZM2G001875) and 93 (GRMZM2G051793).
Figure S4. Workflow describing the cloning process used to generate the maize transcription factor open reading frame (TFome) collection.
Table S1. The RNA-Seq data used for the validation of gene models.
Table S2. Percentage of clones successfully amplified from a variety of maize tissues by RT-PCR (n=244).
Table S3. Support for sequence discrepancies between maize transcription factor open reading frame (TFome) or co-regulator open reading frame (CoREGome) collection clones and B73 RefGen_v2 models.
Data S1. Plate addresses, accessions, gene models, order links and other pertinent information for the clones that constitute the first release of the maize transcription factor open reading frame collection (TFome).
References
- Barbazuk WB, Fu Y, McGinnis KM. Genome-wide analyses of alternative splicing in plants: opportunities and challenges. Genome Res. 2008;18:1381–1392. doi: 10.1101/gr.053678.106. [DOI] [PubMed] [Google Scholar]
- Benhamed M, Bertrand C, Servet C, Zhou D-X. Arabidopsis GCN5, HD1, and TAF1/HAF2 interact to regulate histone acetylation required for light-responsive gene expression. Plant Cell. 2006;18:2893–2903. doi: 10.1105/tpc.106.043489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bharathan G, Janssen BJ, Kellogg EA, Sinha N. Phylogenetic relationships and evolution of the KNOTTED class of plant homeodomain proteins. Mol. Biol. Evol. 1999;16:553–563. doi: 10.1093/oxfordjournals.molbev.a026136. [DOI] [PubMed] [Google Scholar]
- Brady SM, Zhang L, Megraw M, et al. A stele-enriched gene regulatory network in the Arabidopsis root. Mol. Syst. Biol. 2011;7:459. doi: 10.1038/msb.2010.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Castrillo G, Turck F, Leveugle M, Lecharny A, Carbonero P, Coupland G, Paz-Ares J, Onate-Sanchez L. Speeding cis-trans regulation discovery by phylogenomic analyses coupled with screenings of an arrayed library of Arabidopsis transcription factors. PLoS ONE. 2011;6:e21524. doi: 10.1371/journal.pone.0021524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chia JM, Song C, Bradbury PJ, et al. Maize HapMap2 identifies extant variation from a genome in flux. Nat. Genet. 2012;44:803–807. doi: 10.1038/ng.2313. [DOI] [PubMed] [Google Scholar]
- Coego A, Brizuela E, Castillejo P, et al. The TRANSPLANTA collection of Arabidopsis lines: a resource for functional analysis of transcription factors based on their conditional overexpression. Plant J. 2014;77:944–953. doi: 10.1111/tpj.12443. [DOI] [PubMed] [Google Scholar]
- Curtis MD, Grossniklaus U. A gateway cloning vector set for high-throughput functional analysis of genes in planta. Plant Physiol. 2003;133:462–469. doi: 10.1104/pp.103.027979. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deplancke B, Dupuy D, Vidal M, Walhout AJ. A gateway-compatible yeast one-hybrid system. Genome Res. 2004;14:2093–2101. doi: 10.1101/gr.2445504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deplancke B, Mukhopadhyay A, Ao W, et al. A gene-centered C. elegans protein-DNA interaction network. Cell. 2006;125:1193–1205. doi: 10.1016/j.cell.2006.04.038. [DOI] [PubMed] [Google Scholar]
- Dias AP, Braun EL, McMullen MD, Grotewold E. Recently duplicated maize R2R3 Myb genes provide evidence for distinct mechanisms of evolutionary divergence after duplication. Plant Physiol. 2003;131:610–620. doi: 10.1104/pp.012047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dyda F, Klein DC, Hickman AB. GCN5-related N-acetyltransferases: a structural overview. Annu. Rev. Biophys. Biomol. Struct. 2000;29:81–103. doi: 10.1146/annurev.biophys.29.1.81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Earley KW, Haag JR, Pontes O, Opper K, Juehne T, Song K, Pikaard CS. Gateway-compatible vectors for plant functional genomics and proteomics. Plant J. 2006;45:616–629. doi: 10.1111/j.1365-313X.2005.02617.x. [DOI] [PubMed] [Google Scholar]
- Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gietz RD, Schiestl RH. Quick and easy yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat. Protoc. 2007;2:35–37. doi: 10.1038/nprot.2007.14. [DOI] [PubMed] [Google Scholar]
- Gong W, Shen YP, Ma LG, et al. Genome-wide ORFeome cloning and analysis of Arabidopsis transcription factor genes. Plant Physiol. 2004;135:773–782. doi: 10.1104/pp.104.042176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gong W, He K, Covington M, Dinesh-Kumar SP, Snyder M, Harmer SL, Zhu YX, Deng XW. The development of protein microarrays and their applications in DNA-protein and protein-protein interaction analyses of Arabidopsis transcription factors. Mol. Plant. 2008;1:27–41. doi: 10.1093/mp/ssm009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gray J, Bevan M, Brutnell T, et al. A recommendation for naming transcription factor proteins in the grasses. Plant Physiol. 2009;149:4–6. doi: 10.1104/pp.108.128504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hernandez J, Heine G, Irani NG, Feller A, Kim M-G, Matulnik T, Chandler VL, Grotewold E. Different mechanisms participate in the R-dependent activity of the R2R3 MYB transcription factor C1. J. Biol. Chem. 2004;279:48205–48213. doi: 10.1074/jbc.M407845200. [DOI] [PubMed] [Google Scholar]
- Hernandez JM, Feller A, Morohashi K, Frame K, Grotewold E. The basic helix loop helix domain of maize R links transcriptional regulation and histone modifications by recruitment of an EMSY-related factor. Proc. Natl Acad. Sci. USA. 2007;104:17222–17227. doi: 10.1073/pnas.0705629104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hulbert SH, Bennetzen JL. Recombination at the Rp1 locus of maize. Mol. Gen. Genet. 1991;226:377–382. doi: 10.1007/BF00260649. [DOI] [PubMed] [Google Scholar]
- James P, Halladay J, Craig EA. Genomic libraries and a host strain designed for highly efficient two-hybrid selection in yeast. Genetics. 1996;144:1425–1436. doi: 10.1093/genetics/144.4.1425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jeong S, Palmer TM, Lukowitz W. The RWP-RK factor GROUNDED promotes embryonic polarity by facilitating YODA MAP kinase signaling. Curr. Biol. 2011;21:1268–1276. doi: 10.1016/j.cub.2011.06.049. [DOI] [PubMed] [Google Scholar]
- Karimi M, Inze D, Depicker A. GATEWAY vectors for Agrobacterium-mediated plant transformation. Trends Plant Sci. 2002;7:193–195. doi: 10.1016/s1360-1385(02)02251-3. [DOI] [PubMed] [Google Scholar]
- Kong Q, Pattanaik S, Feller A, Werkman JR, Chai C, Wang Y, Grotewold E, Yuan L. Regulatory switch enforced by basic helix-loop-helix and ACT-domain mediated dimerizations of the maize transcription factor R. Proc. Natl Acad. Sci. USA. 2012;109:E2091–E2097. doi: 10.1073/pnas.1205513109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Larkin MA, Blackshields G, Brown NP, et al. Clustal W and clustal X version 2.0. Bioinformatics. 2007;23:2947–2948. doi: 10.1093/bioinformatics/btm404. [DOI] [PubMed] [Google Scholar]
- Li Z, Trick HN. Rapid method for high-quality RNA isolation from seed endosperm containing high levels of starch. Biotechniques. 2005;38:872. doi: 10.2144/05386BM05. 874, 876. [DOI] [PubMed] [Google Scholar]
- Liu X, Luo M, Zhang W, Zhao JH, Zhang JX, Wu KQ, Tian LN, Duan J. Histone acetyltransferases in rice (Oryza sativa L.): phylogenetic analysis, subcellular localization and expression. BMC Plant Biol. 2012;12:145. doi: 10.1186/1471-2229-12-145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ludwig Y, Zhang Y, Hochholdinger F. The maize (Zea mays L.) AUXIN/INDOLE-3-ACETIC ACID gene family: phylogeny, synteny, and unique root-type and tissue-specific expression patterns during development. PLoS ONE. 2013;8:e78859. doi: 10.1371/journal.pone.0078859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Machemer K, Shaiman O, Salts Y, Shabtai S, Sobolev I, Belausov E, Grotewold E, Barg R. Interplay of MYB factors in differential cell expansion, and consequences for tomato fruit development. Plant J. 2011;68:337–350. doi: 10.1111/j.1365-313X.2011.04690.x. [DOI] [PubMed] [Google Scholar]
- Marchive C, Roudier F, Castaings L, Brehaut V, Blondet E, Colot V, Meyer C, Krapp A. Nuclear retention of the transcription factor NLP7 orchestrates the early response to nitrate in plants. Nat. Commun. 2013;4:1713. doi: 10.1038/ncomms2650. [DOI] [PubMed] [Google Scholar]
- Morohashi K, Casas MI, Ferreyra LF, et al. A genome-wide regulatory framework identifies maize pericarp color1 controlled genes. Plant Cell. 2012;24:2745–2764. doi: 10.1105/tpc.112.098004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ou B, Yin KQ, Liu SN, et al. A high-throughput screening system for Arabidopsis transcription factors and its application to Med25-dependent transcriptional regulation. Mol. Plant. 2011;4:546–555. doi: 10.1093/mp/ssr002. [DOI] [PubMed] [Google Scholar]
- Paz-Ares J. REGIA, an EU project on functional genomics of transcription factors from Arabidopsis thaliana. Comp. Funct. Genomics. 2002;3:102–108. doi: 10.1002/cfg.146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pei B, Sisu C, Frankish A, et al. The GENCODE pseudogene resource. Genome Biol. 2012;13:1474–7596. doi: 10.1186/gb-2012-13-9-r51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Perez-Rodriguez P, Riano-Pachon DM, Correa LGG, Rensing SA, Kersten B, Mueller-Roeber B. PInTFDB: updated content and new features of the plant transcription factor database. Nucleic Acids Res. 2010;38:D822–D827. doi: 10.1093/nar/gkp805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pruneda-Paz JL, Breton G, Para A, Kay SA. A functional genomics approach reveals CHE as a component of the Arabidopsis circadian clock. Science. 2009;323:1481–1485. doi: 10.1126/science.1167206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Riano-Pachon DM, Ruzicic S, Dreyer I, Mueller-Roeber B. PlnTFDB: an integrative plant transcription factor database. BMC Bioinformatics. 2007;8:42. doi: 10.1186/1471-2105-8-42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schnable PS, Ware D, Fulton RS, et al. The B73 maize genome: complexity, diversity, and dynamics. Science. 2009;326:1112–1115. doi: 10.1126/science.1178534. [DOI] [PubMed] [Google Scholar]
- Servet C, Conde e Silva N, Zhou D-X. Histone acetyltransferase AtGCN5/HAG1 is a versatile regulator of developmental and inducible gene expression in Arabidopsis. Mol. Plant. 2010;3:670–677. doi: 10.1093/mp/ssq018. [DOI] [PubMed] [Google Scholar]
- Soderlund C, Descour A, Kudrna D, et al. Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs. PLoS Genet. 2009;5:e1000740. doi: 10.1371/journal.pgen.1000740. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takahashi H, Kato S, Murata M, Carninci P. CAGE (cap analysis of gene expression): a protocol for the detection of promoter and transcriptional networks. In: Deplancke B, Gheldof N, editors. Gene Regulatory Networks: Methods and Protocols, Vol. 786. New Jersey: Humana Press Inc; 2012. pp. 181–200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Temple G, Gerhard DS, Rasooly R, et al. The completion of the mammalian gene collection (MGC) Genome Res. 2009;19:2324–2333. doi: 10.1101/gr.095976.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Theissen G, Becker A, Di Rosa A, Kanno A, Kim JT, Munster T, Winter KU, Saedler H. A short history of MADS-box genes in plants. Plant Mol. Biol. 2000;42:115–149. [PubMed] [Google Scholar]
- Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14:178–192. doi: 10.1093/bib/bbs017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 2010;28:511–515. doi: 10.1038/nbt.1621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Waki T, Hiki T, Watanabe R, Hashimoto T, Nakajima K. The Arabidopsis RWP-RK protein RKD4 triggers gene expression and pattern formation in early embryogenesis. Curr. Biol. 2011;21:1277–1281. doi: 10.1016/j.cub.2011.07.001. [DOI] [PubMed] [Google Scholar]
- Wang Y, Deng D, Bian Y, Lv Y, Xie Q. Genome-wide analysis of primary auxin-responsive Aux/IAA gene family in maize (Zea mays. L.) Mol. Biol. Rep. 2010;37:3991–4001. doi: 10.1007/s11033-010-0058-6. [DOI] [PubMed] [Google Scholar]
- Wehner N, Hartmann L, Ehlert A, Boettner S, Onate-Sanchez L, Droege-Laser W. High-throughput protoplast transactivation (PTA) system for the analysis of Arabidopsis transcription factor function. Plant J. 2011;68:560–569. doi: 10.1111/j.1365-313X.2011.04704.x. [DOI] [PubMed] [Google Scholar]
- Weiste C, Iven T, Fischer U, Onate-Sanchez L, Droege-Laser W. In planta ORFeome analysis by large-scale over-expression of GATEWAY (R)-compatible cDNA clones: screening of ERF transcription factors involved in abiotic stress defense. Plant J. 2007;52:382–390. doi: 10.1111/j.1365-313X.2007.03229.x. [DOI] [PubMed] [Google Scholar]
- Yilmaz A, Nishiyama MY, Garcia-Fuentes B, Souza GM, Janies D, Gray J, Grotewold E. GRASSIUS: a platform for comparative regulatory genomics across the grasses. Plant Physiol. 2009;149:171–180. doi: 10.1104/pp.108.128579. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, Wang L. The WRKY transcription factor superfamily: its origin in eukaryotes and expansion in plants. BMC Evol. Biol. 2005;5:1. doi: 10.1186/1471-2148-5-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou M-L, Tang Y-X, Wu Y-M. Genome-wide analysis of AP2/ERF transcription factor family in Zea Mays. Curr. Bioinform. 2012;7:324–332. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figure S1. Phylogenetic analysis of HAG co-regulators from maize.
Figure S2. Phylogenetic analysis of AUX/IAA co-regulators from maize.
Figure S3. Nucleotide sequence alignments showing the codon usage of the three synthetic codon optimized constructs for numbers 34 (GRMZM2G162434), 75 (GRMZM2G001875) and 93 (GRMZM2G051793).
Figure S4. Workflow describing the cloning process used to generate the maize transcription factor open reading frame (TFome) collection.
Table S1. The RNA-Seq data used for the validation of gene models.
Table S2. Percentage of clones successfully amplified from a variety of maize tissues by RT-PCR (n=244).
Table S3. Support for sequence discrepancies between maize transcription factor open reading frame (TFome) or co-regulator open reading frame (CoREGome) collection clones and B73 RefGen_v2 models.
Data S1. Plate addresses, accessions, gene models, order links and other pertinent information for the clones that constitute the first release of the maize transcription factor open reading frame collection (TFome).
