Abstract
The E2F family of transcription factors provides essential activities for coordinating the control of cellular proliferation and cell fate. Both E2F1 and E2F3 proteins have been shown to be particularly important for cell proliferation, whereas the E2F1 protein has the capacity to promote apoptosis. To explore the basis for this specificity of function, we used DNA microarray analysis to probe for the distinctions in the two E2F activities. Gene expression profiles that distinguish either E2F1- or E2F3-expressing cells from quiescent cells are enriched in genes encoding cell cycle and DNA replication activities, consistent with many past studies. E2F1 profile is also enriched in genes known to function in apoptosis. We also identified patterns of gene expression that specifically differentiate the activity of E2F1 and E2F3; this profile is enriched in genes known to function in mitosis. The specificity of E2F function has been attributed to protein interactions mediated by the marked box domain, and we now show that chimeric E2F proteins generate expression signatures that reflect the origin of the marked box, thus linking the biochemical mechanism for specificity of function with specificity of gene activation.
Keywords: DNA microarray, transcriptional control
Numerous studies have demonstrated the role of E2F proteins in the control of genes whose products are essential for DNA replication, differentiation, and cell cycle progression (1, 2). In particular, these studies have detailed the importance of E2F proteins in controlling gene expression at the G1/S transition, involving the activation of genes important for S phase events including dihydrofolate reductase, thymidine kinase, and DNA polymerase. In addition to this role for E2Fs in controlling S phase, more recent work has also demonstrated a role for E2F activity during G2/M transition (3-6).
The E2F family is comprised of nine distinct gene products encoded by seven distinct genomic loci (7-14). The size and complexity of the E2F family of proteins reflect a complexity in function with individual E2Fs performing both distinct and overlapping roles in proliferation, apoptosis, and development (15-17). E2F1, E2F2, and E2F3a make up one subset, with each of these E2Fs functioning as a strong transcriptional activator that can induce quiescent cells to enter S phase (18-20). As cells enter mid-to-late G1, many E2F-responsive promoters are bound by E2F1, E2F2, and E2F3a, coincident with histone acetylation and gene activation (21, 22). E2F4, E2F5, and the alternative version of E2F3, termed E2F3b (9, 11), constitute the second subset of E2F family members. They are not regulated by cell growth but instead can be found at nearly equivalent levels in both quiescent and proliferating cells (17). In contrast to the activating E2Fs, E2F4, E2F5, and E2F3b are mainly involved in the repression of growth-promoting E2F-responsive genes through the recruitment of complexes to E2F-responsive promoter elements that contain histone deacetylase (1, 2) or other corepressors (23).
The complexity of transcription control for the large number of protein-coding genes in a eukaryotic cell presents a major challenge in achieving specificity of transcription control with a limited number of transcription factors. A solution to this problem has been proposed based on a combinatorial mechanism of transcription control, whereby a finite number of transcription factors yield a substantial level of complexity by working in combination (24, 25). Various studies have now provided evidence for such combinatorial specificity, involving upstream binding transcription factors as well as components of the basal transcription machinery (26). Our previous work has focused on interactions involving the E2F family of transcription factors as an example of combinatorial gene control, leading to the identification of TFE3, YY1, and Myb as transcription partners for several E2F proteins (6, 22, 27, 28). Based on these observations, we have proposed that these examples of combinatorial interactions involving E2F proteins provide a basis for the specificity of transcription control in the Rb/E2F pathway. Importantly, these studies also identified a domain within the E2F family of proteins, the so-called marked box domain, that mediated the interactions between E2F proteins and the various transcription factor partners. By implication, these findings suggest a role for the marked box domain as a specificity determinant, directing a particular E2F protein to the proper promoter via protein interaction. To address this point on a more global basis, we made use of genome-wide measures of gene expression to identify patterns of gene expression that reflect the specificity of function of the E2F1 and E2F3 proteins. In particular, we demonstrate that chimeric E2F proteins that contain either the E2F1 or E2F3 marked box domain exhibit a gene expression signature that reflects the origin of the marked box, thus linking the biochemical mechanism for specificity of function with the specificity of gene activation.
Materials and Methods
Cells and Viruses. Primary mouse embryo fibroblasts (MEFs) were isolated from 13.5-day embryos as described (29). MEFs were passaged in DMEM containing 15% heat-inactivated FBS. Passage three MEFs were rendered quiescent by allowing growth to confluence in DMEM/15% FBS and then by splitting 1:5 into DMEM/0.25% FBS for 48 h. Construction of adenoviruses (Ads) expressing hemagglutinin (HA)-tagged E2F1, E2F3, and the chimeras HA111331 and HA333113 has been described (30). A control virus contained a GFP gene insert under the control of the cytomegalovirus (CMV) promoter. This control virus is referred to as CMV. Viral titers were determined simultaneously by indirect immunofluorescence against the viral Mr 72,000 protein. MEFs plated in 60-mm dishes were infected with a multiplicity of infection (moi) of 150 for each virus, but moi were adjusted to render similar levels of protein expression as determined by HA Western blot of total protein extracts. For the infections to prepare RNA, MEFs were plated in 150-mm dishes in DMEM/0.25% FBS for 48 h, as described above. After 48 h, the media were removed and replaced with media containing virus as described (31). After the infection, the cells were returned to DMEM/0.25% FBS for 16 h.
RNA Preparation. Total RNA was extracted from the infected cells by using TRIzol, as described in the manufacturer's instructions (Invitrogen).
DNA Microarray Analysis. All of the experiments used Affymetrix MOE430A arrays. The targets for the Affymetrix arrays were prepared according to manufacturer's instructions starting with 10 μg of total RNA. Double-stranded cDNA was synthesized by using a T7-linked oligo(dT) primer followed by second-strand synthesis. Biotin-labeled complementary RNA, produced by in vitro transcription, was synthesized and subsequently fragmented. The fragmented cRNA was hybridized to the MOE430A (Affymetrix GeneChip) arrays at 45°C for 16 h and then washed and stained with streptavidin-phycoerythrin (SAPE, Molecular Probes). Signal amplification was performed by using a biotinylated antistreptavidin antibody (Vector Laboratories). The arrays were scanned by an Affymetrix GeneChip Scanner, and hybridization patterns were detected as light emitted from the fluorescent reporter groups that have been incorporated into the target and hybridized to oligonucleotide probes. The signal intensity measurements computed in the Affymetrix microarray analysis suite 5.0 serve as a relative indicator of the level of expression. Scaling factors were also computed for each array based on an arbitrary target intensity of 500. Files containing the computed signal intensity value for each probe cell on the arrays (CEL files), files containing both experimental and sample information (control information files), and files providing the signal intensity values for each probe set, as derived by the Affymetrix analysis suite Ver. 5.0 software (pivot files), are available upon request to J.R.N. These experiments comply with the Minimum Information About a Microarray Experiment (MIAME) (32).
Statistical Analysis. Microarray data were first normalized by using the GC-RMA method (33). We used methods as described for analysis of the expression data (34). Briefly, the analysis uses binary regression models combined with singular values decompositions and stochastic regularization by using Bayesian analysis. A probability model estimates a classification probability for each of the two possible states control (CMV) vs. HAE2F1, control (CMV) vs. HAE2F3, or HAE2F1 vs. HAE2F3) for each sample. This probability is structured as a probit regression model in which the expression levels of genes are scored by regression parameters in a regression b. Analysis estimates this regression vector and the resulting classification probabilities for both training and validation samples. The estimated regression itself is important not only for defining the predictive classification but also for scoring genes according to their contribution to the classification.
Results
Gene Expression Profiles That Reflect E2F1 and E2F3 Activity. Previous work has demonstrated that ectopic expression of various E2Fs in quiescent rodent fibroblasts, including E2F1 and E2F3a, leads to an induction of S phase (35-37). Of particular interest is the apparent distinction of function of the two E2F proteins. Although both can induce quiescent cells to enter S phase, only E2F3 appears to be required in cycling cells (31). In addition, expression of E2F1, but not E2F3, induces apoptosis when expressed in the absence of survival signals (18, 38, 39). Given the likelihood that these distinctions reflect differences in the control of gene expression, we have developed gene expression signatures to explore the distinct function of the two E2F proteins.
To both synchronize cells and reduce levels of endogenous E2Fs, wild-type primary MEFs were brought to a quiescent state after 48 h in starvation media. The cells were then infected with control Ad (Ad-CMV) or Ad expressing either HA-tagged E2F1 (AdHAE2F1) or HA-tagged E2F3 (AdHAE2F3). We allowed the infections to proceed for only 16 h to minimize levels of ectopically expressed E2F1 and E2F3 and to focus on those genes that may be the primary targets of E2F activity. Virus infections were titrated to achieve similar levels of E2F1 and E2F3 proteins by using a target multiplicity of infection of 150 focus-forming units per cell and assayed by Western blot analysis by using an antibody against the HA tag (data not shown). Total RNA was extracted from eight independent E2F1 and E2F3 infections and six replicates of the Ad-CMV infection. Cyclin E was induced in both E2F1- and E2F3-infected MEFs but not Ad-CMV infections as measured by Northern blot (data not shown). The same RNA was then used to generate target for application to MOE430A Affymetrix microarrays. Targets generated from E2F1 and E2F3 infections were hybridized to arrays and compared with arrays hybridized to target generated from the Ad-CMV-infected control MEFs.
Using the GC-robust multiarray average normalized values for each probe set on the array over multiple experiments, we identified genes whose expression most highly correlated with the activity of either E2F1 or E2F3. We then used this group of genes in a binary regression analysis to elucidate patterns of gene expression or principal components that represent the underlying structure present in the data. Gene expression profiles were identified that can distinguish between a quiescent cell infected with a control virus and a cell infected with a virus expressing either E2F1 or E2F3. Illustrated in Fig. 1A are genes that differentiate the control-infected MEFs from the E2F1-expressing MEFs (Fig. 1A Left). Fig. 1A Center depicts the separation of the control samples from the E2F1 samples based on the first principal component (Factor 1). A list of the 100 genes that comprise this discriminator and the estimated regression parameters are found in Table 2, which is published as supporting information on the PNAS web site. Each row in Fig. 1A represents a gene, ordered from top to bottom as a function of estimated regression coefficients. High expression is depicted as red and low expression as blue. Likewise in Fig. 1B, control MEFs can be distinguished from E2F3-expressing MEFs by a second group of 100 genes. Fig. 1B Center demonstrates the capacity of the first principal component to separate the samples. In this example, it is also evident that the second principal component (Factor 2) also provides discrimination. Again, the genes that form the discriminator for E2F3 and the estimated regression parameters are found in Table 3, which is published as supporting information on the PNAS web site.
Fig. 1.
Classification of samples based on E2F1 or E2F3 expression. (A) An E2F1 signature. (Left) Expression patterns of MEFs infected with AdCMV (control) or Ad-HAE2F1. Columns represent independent infections, and rows are ordered vertically by the estimated regression weights. (Center) Two-factor analysis of Ad-CMV vs. Ad-HAE2F1. Individual samples depicted in the scatter plot on two dominant factors underlying 100 genes selected in discrimination of the training cases. Each independent infection is indicated by a number and is color-coded, indicating control infections in blue and the Ad-HAE2F1 infections in red. (Right) Crossvalidation analysis defines an E2F gene expression signature. One-at-a-time crossvalidation predictions of classification probabilities for the training cases from the factor regression analysis. The values on the horizontal axis are estimates of the overall factor score in the regression analysis. The corresponding values on the vertical axis are estimated classification probabilities with the corresponding 95% probability intervals marked as dashed lines to indicate uncertainty about these estimated values. Control samples are red squares, Ad-HAE2F1 are blue triangles, and Ad-HAE2F3 are green triangles. (B) An E2F3 signature. Details are the same as in A, except for the use of Ad-HAE2F3.
The true test of whether a pattern truly reflects the phenotype of interest, rather than just being discovered by chance alone, is the ability to accurately predict the status of an unknown sample. To verify that the patterns do indeed represent genes reflecting the E2F activities, we used a leave-one-out cross validation to assess the ability of the pattern to predict the status of the relevant samples. One sample is removed, the remainder are used for generating the patterns for prediction, and then the removed sample is used for prediction of whether it is an E2F1- or an E2F3-expressing sample. As shown in Fig. 1A Right, the E2F1 pattern did indeed accurately predict the E2F1-expressing cells, distinguishing them from control cells. The values on the horizontal axis are estimates of the signature score from the regression, and the values on the vertical axis are estimated classification probabilities with the corresponding 95% probability intervals marked as dashed lines to indicate the uncertainty about these estimated values. All of the E2F1-expressing cells have a high probability of having the E2F1 signature, whereas the control cells have a low probability. Likewise, the E2F3 profile also accurately predicted the E2F3-expressing cells, again distinguishing them from control cells.
Consistent with previous descriptions of functions for E2F1 and E2F3, the genes identified in the signatures distinguishing either E2F1 or E2F3 from control cells include ones encoding activities necessary for cell cycle progression and involve many of the previously identified E2F target genes. In addition, focusing on the role of E2F1 as an activator of apoptosis, a comparison of the 100 gene predictor lists by using fatigo (40) at Biological Process (41) level 4 reveals that 4.8% of the annotated genes in the E2F1 gene list (Table 1) are involved in the regulation of programmed cell death, or apoptosis. In contrast, 1.8% of the annotated genes in the E2F3 predictor (Table 2) are involved in these events, and Hells is the single overlapping gene between the two predictors.
Table 1. Genes that discriminate E2F3- from E2F1-expressing cells.
Probe ID | Weight | Gene | Description | Function |
---|---|---|---|---|
1452210_at | 0.933322 | DNA2l | DNA2 DNA replication helicase2-like | DNA replication |
1419655_at | 0.852784 | Tle3 | Transducin-like enhancer of split 3, homolog of Drosophila E(spl) | Wnt signaling |
1416961_at | 0.810252 | Bub1 | Budding uninhibited by benzimidazoles 1 homolog | Mitosis |
1451592_at | 0.802042 | P42pop | Myb protein P42pop | Mitosis |
1426682_at | 0.731436 | Unknown | ||
1448627_s_at | 0.532497 | Pbk | PDZ-binding kinase | Mitosis |
1452597_at | 0.464174 | Unknown | ||
1417407_at | 0.456457 | Fbxl14 | F-box and leucine-rich repeat protein 14 | Proteolysis |
1448191_at | 0.4314 | Plk | Polo-like kinase | Mitosis |
1451756_at | 0.176747 | Flt1 | FMS-like tyrosine kinase 1 | Angiogenesis |
1423774_a_at | 0.103688 | Prc1 | Protein regulator of cytokinesis | Mitosis |
1452895_at | 0.00549 | Unknown | ||
1425534_at | –0.095696 | Stau2 | Staufen homolog 2 | |
1416299_at | –0.210794 | Shcbo1 | Shc SH2-domain-binding protein 1 | Signal transduction |
1453037_at | –0.366663 | est | ||
1416849_at | –0.375043 | est | ||
1417938_at | –0.414884 | Rad51ap | RAD51-associated protein | DNA recombination |
1417719_at | –0.426277 | Sap30 | sin3-associated polypeptide | Mitosis |
1424766_at | –0.458014 | Unknown | ||
1416802_a_at | –0.473975 | Cdca5 | Cell division cycle associated 5 | Mitosis |
1436707_x_at | –0.488103 | Brrn1 | Barren homolog | Mitosis |
1438817_at | –0.510491 | DNA2l | DNA2 DNA replication helicase 2-like | DNA replication |
1417420_at | –0.514015 | Ccnd1 | Cyclin D1 | Cell cycle |
1416856_at | –0.543831 | Unknown | ||
1453683_a_at | –0.551945 | Unknown | ||
1434427_a_at | –0.57153 | Rnf157 | Ring finger protein 157 | |
1448698_at | –0.663125 | Ccnd1 | Cyclin D1 | Cell cycle |
1448205_at | –0.838015 | CcnB1 | Cyclin B1 | Mitosis |
1428353_at | –0.842293 | est | ||
1426778_at | –0.846422 | Dag1 | Dystroglycan | Morphogenesis |
1439017_x_at | –0.859631 | Adipor1 | Adiponectin receptor 1 | Fatty acid metabolism |
1418947_at | –0.930753 | Nek3 | NIMA (never in mitosis gene a)-related expressed kinase 3 | Mitosis |
1423484_at | –0.999995 | Bicc1 | Bicaudal C homolog 1 | |
1451223_a_at | –1.020922 | Unknown | ||
1448743_at | –1.034165 | Ssx2ip | Synovial sarcoma, X breakpoint 2 interacting protein | Cell adhesion |
1416558_at | –1.09871 | Melk | Maternal embryonic leucine zipper kinase | Protein amino acid phosphorylation |
1452659_at | –1.155192 | Dek | DEK oncogene | Cell growth |
1452919_a_at | –1.202562 | Unknown | ||
1423775_s_at | –1.219187 | Prc1 | Protein regulator of cytokinesis | Mitosis |
1418261_at | –1.221675 | Syk | Spleen-associated kinase | GPCR signaling |
1418744_s_at | –1.223125 | Tesc | Tescalin | Sodium ion homeostasis |
1448643_at | –1.242971 | Ssna1 | Sjögren's syndrome nuclear autoantigen 1 | Mitosis |
1452331_s_at | –1.262885 | Unknown | ||
1423237_at | –1.29748 | Galnt1 | UDP-N-acetyl-α-D-galactosamine:polypeptide N-acetylgalactosaminyltra | DNA metabolism |
1416258_at | –1.339809 | Tk1 | Thymidine kinase 1 | DNA replication |
1418517_at | –1.342051 | Irx3 | Iroquols-related homeobox 3 | Protein modification |
1455511_at | –1.378708 | Sephs1 | Selenophosphate synthetase 1 | Mitosis |
1427064_a_at | –1.394654 | Scrib | Scribbled homolog | Morphogenesis |
1423431_a_at | –1.464129 | Mybbp1a | MYB-binding protein 1a | Electron transport |
1450943_at | –1.469186 | est |
Focusing on the Distinction Between E2F1 and E2F3. The results described in Fig. 1 suggest that the signatures discriminate not only an E2F-expressing cell from a control cell but also, to some degree, one E2F from the other. For instance, the E2F3 samples were predicted at lower probability and with greater uncertainty on the E2F1 signature. Similarly, the E2F1 samples were less well predicted on the E2F3 signature. Nevertheless, the discrimination between E2F1 and E2F3 on this basis was not completely clear, and it was also true that there was considerable overlap in the genes in the two profiles. We have thus sought to develop a more discrete discriminator that could effectively distinguish E2F1 from E2F3. To do so, we trained the analysis to specifically distinguish E2F3 vs. E2F1 cells rather than either of these against the control cells, making use of a profile of 100 genes (Fig. 2A). From those 100 genes, the first principal component (Factor 1) can clearly segregate the E2F1-expressing MEFs (samples 1-8) from the E2F3-expressing MEFs (samples 9-16) (Fig. 2B).
Fig. 2.
Distinguishing E2F1 and E2F3 samples. (A) Expression patterns of MEFs infected with Ad-HAE2F1 or Ad-HAE2F3. Columns represent independent infections, and rows are ordered vertically by the estimated regression weights of the 100 predictive genes. (B) Two-factor analysis of Ad-HAE2F1 vs. Ad-HAE2F3. Individual samples depicted in the scatter plot on two dominant factors underlying 100 genes selected in discrimination of the training cases. Ad-HAE2F1 samples are colored in red and numbered by independent infection. Ad-HAE2F3 infections are colored in blue and numbered independently. (C) Crossvalidation analysis defines an E2F3 gene expression signature. One-at-a-time crossvalidation analysis predicts classification probabilities for the training cases from the factor regression analysis. Values on the horizontal axis are estimates of the overall factor score from the regression analysis, and the corresponding values on the vertical axis are the estimated classification probability of E2F3 activity, with 95% probability intervals marked as dashed lines to indicate uncertainty about the estimated values. Control infections are labeled as red squares, Ad-HAE2F1 as blue triangles, and Ad-HAE2F3 as green triangles.
The revealing test of the training set is illustrated in Fig. 2C by using the model trained to discriminate E2F3 from E2F1 to then predict the status of both the control samples and the samples that express the E2Fs. The samples representing E2F3-expressing cells show a high probability of E2F3 activity, whereas the E2F1 and the control samples score as a low probability of E2F3 activity.
The top genes that were selected for the ability to discriminate E2F3 from E2F1 are listed in Table 1 (the full list is provided in Table 4, which is published as supporting information on the PNAS web site). Only 6% of these genes overlap with the discriminators derived from comparison of the E2F samples vs. control samples. It thus appears that the prediction of E2F1 and E2F3 activity depends on a group of genes that are largely distinct from those that distinguish E2F activity from control cells. An examination of these genes reveals a substantial enrichment for genes encoding mitotic activities: nearly 20% of the genes selected to discriminate E2F3 from E2F1 encode mitotic activities, which is also reflected in the enrichment of Gene Ontology terms, indicating mitotic functions (Fig. 4, which is published as supporting information on the PNAS web site). The link between E2F3 and control of mitotic genes is of interest given previous data that have differentiated roles for E2F1 and E2F3 as a function of initial cell cycle entry vs. continuing growth in the presence of growth factors; whereas both E2F1 and E2F3 appear to be important for the initial S phase entry after serum stimulation, only E2F3 is required in cycling cells (31). The expression profile trained to differentiate the two E2F activities would appear to emphasize this distinction, highlighting the control of G2/M transcription as the dominant characteristic that distinguishes the two E2Fs.
Gene Expression Patterns That Reflect the Molecular Basis for E2F Specificity. In considering the specificity of transcription function exhibited by E2F1 and E2F3, we focused on prior work that pointed to a mechanism involving protein-protein interactions as the basis for the specificity. These studies identified the marked box domain as a determinant of specificity of transcriptional activation by the E2F proteins by promoting the interaction with other transcription factors to allow recognition and binding to specific target promoters (6, 22, 27, 28). In the case of E2F1, further experiments have shown that the marked box domain confers specificity of apoptosis induction, coincident with the induction of apoptotic activities such as p53 and p73 (30).
Given the identification of gene expression profiles that distinguish E2F1 and E2F3, we made use of a series of E2F chimeric proteins to determine whether these profiles reflect the function of the marked box domain. As illustrated in Fig. 3A, HA-tagged chimeric E2Fs were generated from the human E2F1 and E2F3 cDNAs and introduced into Ad as described (30). We made use of two chimeras previously described. Ad-333113 expresses a protein that contains the E2F1 marked box in the backbone of E2F3, and Ad-111331 contains the E2F3 marked box in the backbone of E2F1. MEFs were infected in four independent experiments with either Ad-111331 or Ad-333113 for 16 h, and total RNA was harvested and used to generate probes for hybridization to Affymetrix MOE430A arrays. We then examined the expression of the 100 genes selected to discriminate E2F1 from E2F3 in the cells expressing the chimeric proteins. As shown in Fig. 3B, it was evident that the profiles on these genes in the chimera-expressing cells reflected the origin of the marked box domain. As shown in Fig. 3C, the training model developed to distinguish E2F1-from E2F3-expressing cells, as described in Fig. 2, accurately identified the cells expressing the chimeric protein containing the E2F1 marked box domain.
Fig. 3.
Classification of E2F chimeras. (A) Schematic of protein domain structure of E2F1 and E2F3. Domain structures of Ad-HAE2F1, Ad-HAE2F3, Ad-HA111331, and Ad-HA333113. (B) Expression patterns of MEFs infected with Ad-HAE2F1, Ad-HAE2F3, Ad-HA333113, and Ad-HA111331. Columns represent independent infections, and rows are ordered vertically by estimated regression weights of the 100 predictive genes for E2F3 expression. (C) Prediction of E2F chimeras based on E2F3 gene expression signature. One-at-a-time crossvalidation predictions of classification probabilities of chimeric protein expression based on the trained pattern derived from the E2F3-vs. E2F1-expressing fibroblasts. Ad-HA111331 is designated as a green triangle and labeled E2F3 by origin of the marked box, and Ad-HA333113 is designated as a blue triangle and labeled E2F1 by origin of the marked box.
Taken together, these results confirm that the specificity of E2F function can be defined by using patterns of gene expression as captured by DNA microarray analysis. Importantly, the specificity of the gene expression profiles that distinguish E2F1 from E2F3 are driven by the marked box domain that has previously been shown to drive the mechanistic basis for specific E2F function through protein-protein interactions, thus establishing a link between biochemical function and gene expression phenotype.
Discussion
Central to the generation of biological phenotypes, including those directed by the action of E2F proteins, are the mechanisms responsible for defining the specificity of function of transcriptional regulatory proteins. Key in this process is the recognition of specific DNA sequences within transcriptional regulatory regions of target genes. A fundamental challenge is the mechanism by which the recognition of a short sequence of DNA (six to eight base pairs) can properly distinguish the true functional sites from random occurrences of the sequence. Clearly, the properties of a single transcription factor such as E2F are not sufficient to recognize the appropriate sites, discriminating those from the vast array of irrelevant sequences. This general issue then leads to the specific question of specificity within families of related proteins such as the E2F family. Because there is little variation in the sequence of the various E2F promoter elements, and analysis of the E2F4-DNA crystal structure suggests all E2F members recognize similar DNA sequences (40), it is apparent that the specificity of E2F-promoter interaction is more complex than a simple E2F-DNA recognition.
The complexity of the E2F family, which constitutes nine distinct gene products, reflects a complexity in function in which individual family members have been shown to perform distinct functions. The E2F1-3 proteins are recognized as transcriptional activating proteins, whereas the E2F4-7 proteins appear to function primarily as transcriptional repressors. In addition, various experiments point to overlapping but also distinct roles for the E2F1 and E2F3 proteins in the activation of genes important for apoptosis, cell cycle entry, and cell cycle progression (18, 31, 36, 37, 42).
We have proposed that this specificity is mediated through specific protein-protein interactions, whereby an E2F protein must physically interact with another transcription factor in a promoter-specific fashion to generate a functional outcome. In this context, the specificity of the E2F then becomes not just the 8-bp recognition sequence but the combined promoter sequence containing the E2F site and the partner protein-binding site, together constituting what we term a regulatory module. This would then provide the necessary complexity of sequence recognition to distinguish functional sites from nonfunctional binding sites. Moreover, if the protein interactions were specific for individual E2F proteins, then this mechanism would provide a basis for distinct specificities in the activation of transcription.
Our previous work has provided evidence for cooperative protein-protein interactions as a basis for such E2F specificities. Specifically, we identified the E-box-binding factor TFE3 as an E2F3-specific partner (22, 27) and the YY1 transcription factor as a partner for E2F2 and E2F3 (28). Importantly, the specificity of transcription cooperativity involving these transcription factors was shown to reflect specific interactions in which the marked box domain of the E2F proteins mediated an interaction with the appropriate partner transcription factor. The results presented here now demonstrate that a gene expression signature reflecting the distinction between E2F1 and E2F3 also distinguishes the action of chimeric proteins that differ only by the marked box domain. As such, the results thus link the biochemical mechanism proposed for E2F specificity with the specificity seen in gene expression signatures.
This work also highlights the power of gene expression profiling to focus on distinctions that reflect similar and potentially overlapping biological phenotypes. The ability to distinguish an E2F1-from an E2F3-expressing cell was facilitated by the ability to find patterns in the massive gene expression data that reflect subtle differences in the action of the two proteins. An examination of the genes whose expression provides this discrimination reveals many of the known E2F targets as well as additional genes not previously identified as being E2F-regulated. Importantly, a comparison of the genes identified in the E2F1 vs. E2F3 profiles does reveal differences consistent with known biology. One-third of the genes identified as distinguishing E2F1 or E2F3 from control cells are involved in the control of cellular proliferation, although there was also some enrichment for apoptotic genes (4.8% vs. 1.8% in the E2F3 signature) in the E2F1 signature as annotated by using fatigo. However, it was the signature developed to specifically distinguish E2F1 from E2F3 that revealed the most dramatic distinction, highlighted by a substantial number of mitotic genes (see Tables 1 and 4). Previous work has highlighted the role of E2Fs in the control of both DNA replication genes at G1/S and also genes encoding mitotic functions at G2/M (3-5, 43). Moreover, other work has pointed to a specific role for E2F3 in the regulation of transcription in cycling cells; whereas both E2F1 and E2F3 are important for initial cell cycle entry, only E2F3 is required once cells begin to cycle (31). The gene expression profiles that distinguish the two E2F proteins clearly emphasize this distinction, thus further highlighting the difference in function for the two E2Fs. Although we might have anticipated that the distinction between E2F1 and E2F3 could also have included apoptotic genes, clearly the dominant difference was the mitotic genes, presumably reflecting this as the primary distinction of function of the two E2Fs in the control of gene expression in growing cells.
Supplementary Material
Acknowledgments
We thank Kaye Culler for assistance with the preparation of the manuscript. This work was supported by National Institutes of Health Grants 1R01-CA104663-02 and 1R01-CA106520-01A1 (to J.R.N.).
Author contributions: E.P.B., M.W., and J.R.N. designed research; E.P.B. and T.H. performed research; E.P.B., H.D., M.W., and J.R.N. analyzed data; and E.P.B., M.W., and J.R.N. wrote the paper.
Conflict of interest statement: No conflicts declared.
This paper was submitted directly (Track II) to the PNAS office.
Abbreviations: MEF, mouse embryo fibroblast; HA, hemagglutinin; CMV, cytomegalovirus; Ad, adenovirus.
References
- 1.Nevins, J. R. (1998) Cell Growth Differ. 9, 585-593. [PubMed] [Google Scholar]
- 2.Dyson, N. (1998) Genes Dev. 12, 2245-2262. [DOI] [PubMed] [Google Scholar]
- 3.Ishida, S., Huang, E., Zuzan, H., Spang, R., Leone, G., West, M. & Nevins, J. R. (2001) Mol. Cell. Biol. 21, 4684-4699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Polager, S., Kalma, Y., Berkovich, E. & Ginsberg, D. (2002) Oncogene 21, 437-446. [DOI] [PubMed] [Google Scholar]
- 5.Ren, B., Cam, H., Takahashi, Y., Volkert, T., Terragni, J., Young, R. A. & Dynlacht, B. D. (2002) Genes Dev. 16, 245-256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Zhu, W., Giangrande, P. & Nevins, J. R. (2004) EMBO J. 23, 4615-4626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Cam, H. & Dynlacht, B. D. (2003) Cancer Cell 3, 311-316. [DOI] [PubMed] [Google Scholar]
- 8.Di Stefano, L., Jensen, M. R. & Helin, K. (2003) EMBO J. 22, 6289-6298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Adams, M. R., Sears, R., Nuckolls, F., Leone, G. & Nevins, J. R. (2000) Mol. Cell. Biol. 20, 3633-3639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.He, Y., Armanious, M. K., Thomas, M. J. & Cress, W. D. (2000) Oncogene 19, 3422-3433. [DOI] [PubMed] [Google Scholar]
- 11.Leone, G., Nuckolls, F., Ishida, S., Adams, M., Sears, R., Jakoi, L., Miron, A. & Nevins, J. R. (2000) Mol. Cell. Biol. 20, 3626-3632. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Trimarchi, J. M. & Lees, J. A. (2002) Nat. Rev. Mol. Cell. Biol. 3, 11-20. [DOI] [PubMed] [Google Scholar]
- 13.Dahme, T., Wood, J., Livingston, D. M. & Gaubatz, S. (2002) Eur. J. Biochem. 269, 5030-5036. [DOI] [PubMed] [Google Scholar]
- 14.de Bruin, A., Maiti, B., Jakoi, L., Timmers, C., Buerki, R. & Leone, G. (2003) J. Biol. Chem. 278, 42041-42049. [DOI] [PubMed] [Google Scholar]
- 15.Muller, H., Bracken, A. P., Vernell, R., Moroni, M. C., Christians, F., Grassilli, E., Prosperini, E., Vigo, E., Oliner, J. D. & Helin, K. (2001) Genes Dev. 15, 267-285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Trimarchi, J. M., Fairchild, B., Wen, J. & Lees, J. A. (2001) Proc. Natl. Acad. Sci. USA 98, 1519-1524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.DeGregori, J. (2002) Biochem. Biophys. Acta 1602, 131-150. [DOI] [PubMed] [Google Scholar]
- 18.DeGregori, J., Leone, G., Miron, A., Jakoi, L. & Nevins, J. R. (1997) Proc. Natl. Acad. Sci. USA 94, 7245-7250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lukas, J., Petersen, B. O., Holm, K., Bartek, J. & Helin, K. (1996) Mol. Cell. Biol. 16, 1047-1057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Verona, R., Moberg, K., Estes, S., Starz, M., Vernon, J. P. & Lees, J. A. (1997) Mol. Cell. Biol. 17, 7268-7282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Takahashi, Y., Rayman, J. B. & Dynlacht, B. D. (2000) Genes Dev. 14, 804-816. [PMC free article] [PubMed] [Google Scholar]
- 22.Giangrande, P., Zhu, W., Rempel, R. E., Laakso, N. & Nevins, J. R. (2004) EMBO J. 23, 1336-1347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Meloni, A. R., Smith, E. J. & Nevins, J. R. (1999) Proc. Natl. Acad. Sci. USA 96, 9574-9579. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Yamamoto, K. R., Darimont, B. D., Wagner, R. L. & Iniguez-Lluhi, J. A. (1998) Cold Spring Harbor Symp. Quant. Biol. 63, 587-598. [DOI] [PubMed] [Google Scholar]
- 25.Pilpel, Y., Sudarsanam, P. & Church, G. M. (2001) Nat. Genet. 29, 153-159. [DOI] [PubMed] [Google Scholar]
- 26.Smale, S. T. (2001) Genes Dev. 15, 2515-2519. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Giangrande, P. H., Hallstrom, T. C., Tunyaplin, C., Calame, K. & Nevins, J. R. (2003) Mol. Cell. Biol. 23, 3707-3720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Schlisio, S., Halperin, T., Vidal, M. & Nevins, J. R. (2002) EMBO J. 21, 5775-5786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Robertson, E. J. (1987) in Teratocarcinomas and Embryonic Stem Cells: A Practical Approach, ed. Robertson, E. J. (IRL, Oxford), pp. 104-108.
- 30.Hallstrom, T. C. & Nevins, J. R. (2003) Proc. Natl. Acad. Sci. USA 100, 10848-10853. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Leone, G., DeGregori, J., Yan, Z., Jakoi, L., Ishida, S., Williams, R. S. & Nevins, J. R. (1998) Genes Dev. 12, 2120-2130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C. A., Causton, H. C., et al. (2001) Nat. Genet. 29, 365-371. [DOI] [PubMed] [Google Scholar]
- 33.Bolstad, B. M., Irizarry, R. A., Astrand, M. & Speed, T. P. (2003) Bioinformatics 19, 185-193. [DOI] [PubMed] [Google Scholar]
- 34.West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J. A., Jr., Marks, J. R. & Nevins, J. R. (2001) Proc. Natl. Acad. Sci. USA 98, 11462-11467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Johnson, D. G., Schwarz, J. K., Cress, W. D. & Nevins, J. R. (1993) Nature 365, 349-352. [DOI] [PubMed] [Google Scholar]
- 36.Qin, X.-Q., Livingston, D. M., Kaelin, W. G. & Adams, P. D. (1994) Proc. Natl. Acad. Sci. USA 91, 10918-10922. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Shan, B. & Lee, W.-H. (1994) Mol. Cell. Biol. 14, 8166-8173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Kowalik, T. F., DeGregori, J., Schwarz, J. K. & Nevins, J. R. (1995) J. Virol. 69, 2491-2500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Wu, X. & Levine, A. J. (1994) Proc. Natl. Acad. Sci. USA 91, 3602-3606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Al-Shahrour, F., Diaz-Uriarte, R. & Dopazo, J. (2004) Bioinformatics 20, 578-580. [DOI] [PubMed] [Google Scholar]
- 41.The Gene Ontology Consortium (2000) Nat. Genet. 25, 25-29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Zheng, N., Fraenkel, E., Pabo, C. O. & Pavletich, N. P. (1999) Genes Dev. 13, 666-674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Lin, W.-C., lin, F.-T. & Nevins, J. R. (2001) Genes Dev. 15, 1833-1845. [PMC free article] [PubMed] [Google Scholar]
- 44.Neufeld, T. P., de la Cruz, A. F. A., Johnston, L. A. & Edgar, B. A. (1998) Cell 93, 1183-1193. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.