Abstract
The Caenorhabditis elegans hermaphrodite is a complex multicellular animal model that is composed of 959 somatic cells. The C. elegans genome contains ∼20,000 protein-coding genes, 940 of which encode regulatory transcription factors (TFs). In addition, the worm genome encodes more than one hundred microRNAs and many other regulatory RNA and protein molecules. Most C. elegans genes are subject to regulatory control, most likely by multiple regulators, and combined, this dictates the activation or repression of the gene and corresponding protein in the relevant cells and under the appropriate conditions. A major goal in C. elegans biology is to determine the spatiotemporal expression pattern of each gene throughout development and in response to different signals, and to determine how this expression pattern is accomplished. Gene regulatory networks describe physical and/or functional interactions between genes and their regulators that result in specific spatiotemporal gene expression. Such regulators can act at transcriptional or post-transcriptional levels. Here, I will discuss the methods that can be used to delineate gene regulatory networks in C. elegans. I will mostly focus on gene-centered yeast one-hybrid (Y1H) assays that are used to map interactions between non-coding genic regions, such as promoters, and regulatory TFs. The approaches discussed here are not only relevant to C. elegans biology, but can be applied to other model organisms as well.
Introduction
Complex multicellular model organisms such as C. elegans need to faithfully develop from a fertilized oocyte into a complete and fully functioning animal that is composed of different cell and tissue types. After development is completed, metazoan organisms also need mechanisms for homeostasis and to adequately respond to physiological and environmental cues, i.e. to find mating partners, to detect food and to avoid pathogens. For correct functionality, cells and tissues need to compute an appropriate biological output based on the input they receive. Such an output can for instance be to differentiate, to move, or to enter the dauer stage. Biological outputs result from interactions between the different biomolecules cells and tissues contain, including the genome, the proteins and RNA molecules encoded by the genome, and small molecules such as metabolites.
Developmental and post-developmental processes are controlled, at least in part, by the specific spatiotemporal expression of each of the ∼20,000 protein-coding genes in the C. elegans genome. Each gene/protein is likely controlled by multiple regulators and at multiple levels (Figures 1 and 2). First, genes are transcribed into mRNAs, and this is controlled by the action of regulatory transcription factors (TFs) that can repress or activate gene expression by directly interacting with the genome. Second, mRNA stability and translation are controlled by small RNAs such as microRNAs, and by RNA binding proteins that frequently interact with the 3′UTR of their target mRNAs. Third, after translation, proteins can be stabilized or destabilized due to post-translational modifications by, for example, kinases or acetylases. Finally, sub-cellular mRNA and protein localization can be subject to control mechanisms as well.
The delineation of the complex networks that comprehensively describe the physical and regulatory interactions at each of these levels and between all biomolecules is a daunting task. Here, I will focus specifically on C. elegans gene regulatory networks that control gene expression at the transcriptional and post-transcriptional levels. I will briefly discuss the methods that can be used to identify the players in gene regulatory networks, as well as approaches to identify interactions between them, with a primary focus on gene-centered yeast one-hybrid (Y1H) assays that are used to identify interactions between non-coding regulatory DNA regions and TFs.
Gene regulatory networks
Gene regulatory networks are composed of two main components: the nodes and the edges. The network nodes are the players involved, i.e. the genes and their regulators. The edges are the physical and/or regulatory relationships between the nodes (Figure 2B). Gene regulatory networks are different from better-known protein-protein interaction networks, because gene regulatory networks are both bipartite and directional. They are bipartite because there are two types of nodes: genes and regulators, although of course some genes are themselves regulators of other genes or proteins. Gene regulatory networks are directional because regulators control genes and usually not the other way around. In order to map and characterize gene regulatory networks, one needs to first identify the nodes. For the genes this means to identify the non-coding genomic DNA sequences that participate in the control of gene expression, and for the regulators this means to identify which protein-coding genes encode TFs, RNA binding proteins, and other regulators, as well as to determine the complete collection of regulatory RNA molecules. Here, I will mostly focus on TFs and microRNAs, and the types of genic regions they interact with.
Identifying gene regulatory network nodes
Regulatory regions
Different parts of a gene can contribute to its regulation. The more complex an organism, the more complex its gene regulation is. In C. elegans there are two main regulatory regions: gene promoters in the genome and 3′UTRs in mRNAs.
Promoters
A gene promoter is the genomic DNA sequence immediately upstream of the transcription start site. Generally, promoters are composed of a basal element where the general transcriptional machinery binds (e.g. RNA polymerase II and general TFs), and the proximal gene promoter that serves as a landing site for regulatory TFs. Since the majority of C. elegans genes are subject to trans-splicing, precise transcription start sites have not been determined for most worm genes. However, 5′UTRs are short compared to more complex organisms such as humans, and for practical purposes, promoters can therefore be defined as the region immediately upstream of the translational start site. It is difficult to determine the 5′ start point of gene promoters. However, since most intergenic regions are shorter than 2 kb, most studies have limited their analyses to this length [e.g. (Deplancke et al., 2004; Dupuy et al., 2004; Hunt-Newbury et al., 2007)]. Importantly, it has been shown that this region, when fused to a reporter gene such as that encoding the green fluorescent protein (GFP) often drives gene expression in a manner that recapitulates the expression of the endogenous gene (Dupuy et al., 2004; Grove et al., 2009; Hunt-Newbury et al., 2007; Martinez et al., 2008b; Reece-Hoyes et al., 2007).
To facilitate the system-level analysis of gene expression, a clone resource comprised of ∼6000 C. elegans promoters, referred to as the Promoterome has been generated (Dupuy et al., 2004). This resource is based on the Gateway cloning system and consists of promoter Entry clones that can be easily transferred to various Destination vectors by a simple recombination reaction (Hartley et al., 2000) (Walhout et al., 2000b). Destination vectors that can be used to analyze gene regulatory networks include a GFP vector for the creation of transgenic animals to study promoter activity in vivo, and Y1H vectors for the identification of TFs that can interact with the promoter (see below). So far, systematic efforts have determined the in vivo activity of ∼350 TF-encoding gene promoters (Grove et al., 2009; Reece-Hoyes et al., 2007), ∼1800 additional gene promoters (Hunt-Newbury et al., 2007) and 73 microRNA gene promoters (Martinez et al., 2008b). Many of the corresponding transgenic lines are available to the community through the C. elegans genetics center (CGC).
3′ UTRs
The 3′UTR is the untranslated region in the mRNA, immediately downstream of the stop codon. This region is subject to post-transcriptional control by microRNAs and RNA binding proteins. Recently, a comprehensive collection of 3′UTRs has been delineated for most C. elegans genes (Mangone et al.). Cloning these 3′UTRs into Gateway-compatible vectors will provide a resource for experimental gene regulatory network mapping that is similar to the ORFeome (see below) and Promoterome resources (Lall et al., 2006; Mangone et al., 2008).
Other genic regulatory regions
It is not clear to what extent other regulatory regions function in gene regulatory networks in C. elegans. So far, transcriptional studies have mostly focused on promoters. However, it is clear that other regions, such as introns and sequences downstream of the gene can also play a role. Similarly, microRNAs could target regions outside the 3′UTR within their mRNA targets. Systematic studies are required to elucidate the relative role different genic regions play in complex gene regulatory networks.
Regulators
Transcription factors
TFs provide the first level of gene control. They bind directly to DNA through their sequence-specific DNA binding domain and can be grouped into families based on the type of DNA binding domain they possess (Reece-Hoyes et al., 2005). Well-known DNA binding domains include the homeodomain, the basic helix-loop-helix (bHLH) domain, C2H2 zinc fingers, the ETS domain, the bZIP domain and C4-type zinc fingers found in nuclear hormone receptors (NHRs). TFs can be predicted in a genome of interest by searching the complete collection of proteins for the presence of a known DNA binding domain. This is usually done by computational methods, for instance using Interpro (Mulder et al., 2003) or SMART (Letunic et al., 2004) databases. However, we have found that visual inspection of predicted DNA binding domains using knowledge of their sequence and structure is highly useful as well. Indeed, by doing so we increased the predicted set of C. elegans TFs from ∼600 (Ruvkun and Hobert, 1998) to 940, or ∼5% of all protein-coding genes (Reece-Hoyes et al., 2005; Vermeirssen et al., 2007b). Most C. elegans TF-encoding genes encode a single splice variant, however in some cases multiple variants are present, and some of these may encode proteins with different DNA binding domains (Reece-Hoyes et al., 2005). Interestingly, different TF variants can have different biological functions. For instance, different variants of the forkhead protein DAF-16, were recently found to be expressed in distinct patterns and to confer different functions related to metabolism and aging (Kwon et al., 2010). Several proteins have been identified that can bind C. elegans gene promoters but that do not possess a known DNA binding domain (Deplancke et al., 2006a; Vermeirssen et al., 2007a). Thus, the total collection of C. elegans TFs may be slightly larger, but is likely not to exceed 1000 (unpublished data).
More than 12,000 C. elegans full-length open reading frames (ORFs) have been cloned into a Gateway-compatible resource called the ORFeome (Lamesch et al., 2004; Reboul et al., 2003). We obtained the TF-encoding ORFs from this resource and supplemented that with TF-encoding ORFs that we cloned ab initio (Deplancke et al., 2004; Vermeirssen et al., 2007b). The resulting clone collection currently contains ∼90% of all full-length TFs and can be directly used in assays for the delineation of gene regulatory networks, such as Y1H assays (unpublished data, see below).
MicroRNAs
MicroRNAs regulate gene expression post-transcriptionally by sequence-specific but imperfect basepairing with the 3′UTR of their target mRNAs. In total, it has been estimated that the C. elegans genome encodes more than 110 microRNAs (Lehrbach and Miska, 2008). Some of these have been identified genetically (e.g. lin-4, let-7), some have been predicted computationally (Lim et al., 2003), and others were more recently found by deep sequencing small RNA populations purified from worms (Friedlander et al., 2008; Kato et al., 2009). As with TFs, microRNAs can also be grouped into families, based on their seed sequence, the part with which they basepair with their target genes. It is not yet clear whether all C. elegans microRNAs have been identified. Indeed, it may be that additional microRNAs will be uncovered when the animal is exposed to particular conditions, in males or dauers, or when sequencing techniques further improve to detect microRNAs of very low abundance.
Other regulators
In addition to TFs and microRNAs, other RNA and protein molecules contribute to differential gene regulation. These include RNA binding proteins, transcriptional cofactors and signaling molecules such as kinases and phosphatases, as well as endogenous siRNAs and, perhaps, long non-coding RNAs. Systematic computational and experimental analyses will shed light onto the number of molecules in each class of regulators.
Delineating gene regulatory network edges
TF-target gene interactions
Interactions between TFs and their target genes can be identified using two conceptually different and highly complementary strategies. The first are TF-centered; they start with a TF of interest and identify the genes with which this factor interacts. The second are gene-centered; they start with a gene of interest and identify the TFs with which it interacts (Figure 3).
Transcription factor-centered methods: ChIP
The most widely used TF-centered method is chromatin immunoprecipitation (ChIP). In ChIP assays, an anti-TF antibody is used to precipitate TFs in vivo. Briefly, worm extracts are first treated with formaldehyde to crosslink proteins to proteins and proteins to DNA. After precipitation of the TF, associated DNA molecules can be identified 1) by PCR using primer sets of interest (Deplancke et al., 2006a); 2) by cloning and sequencing (Oh et al., 2006); 3) using microarrays that tile the entire C. elegans genome (Whittle et al., 2009); or more recently 4) by ultra high-throughput sequencing (e.g. 454 or Solexa). Controls include a non-relevant antibody and, if possible, mutant animals that do not express the TF of interest.
ChIP is a very powerful method to identify TF-target gene interactions that occur in vivo. However, it is mostly limited to TFs that are highly and/or broadly expressed throughout the lifetime of the animal, and to TFs for which ChIP-grade antibodies are available. It is, however, also feasible to use ChIP in transgenic animals that overexpress an epitope-tagged TF. While ChIP is usually the method of choice when one is interested in one or a few TFs, it is less suitable when one is interested in a single gene (or a set of genes) and wants to identify the TFs that contribute to its (their) regulation. This is because all 940 TFs would have to be tested and under all relevant developmental and physiological conditions. Detailed discussion and protocols for ChIP in worms are provided elsewhere (Mukhopadhyay et al., 2008).
Gene-centered methods: Y1H assays
Y1H assays provide a genetic method for the gene-centered identification of TF-target gene interactions. The Y1H system is conceptually similar to yeast two-hybrid (Y2H) assays that have been used extensively to map C. elegans protein-protein interaction networks (Li et al., 2004; Walhout et al., 2002; Walhout et al., 2000a). Here, I will discuss the principles of the Y1H system. Detailed Y1H protocols are available elsewhere (Deplancke et al., 2006b).
The Y1H system uses a reporter gene readout in yeast to detect interactions between a “DNA bait” and a “protein prey” (e.g. TF) (Figure 4). The first step in Y1H assays involves the selection of the DNA bait. In most cases, this will be a gene promoter or a small cis-regulatory element. Next, the DNA bait is cloned upstream of two reporter genes, HIS3 and LacZ (Figure 4). Traditionally, this was done by restriction enzyme/ligation-based methods (Li and Herskowitz, 1993). However, this is difficult to standardize and thus not amenable to the high-throughput settings that are required for regulatory network studies. To enable high-throughput cloning of DNA baits, we have combined the Y1H system with Gateway cloning, a recombination-based method that is compatible with the Promoterome resource (Deplancke et al., 2004). With this method, multiple DNA baits can be transferred to the Y1H reporter Destination vectors simultaneously (e.g. in 96-well plates).
After cloning, the two DNA bait∷reporter constructs are linearized and integrated into the genome of a suitable yeast strain. DNA bait∷HIS3 constructs are integrated into a mutant HIS3 locus and plated on media lacking histidine. There is enough background His3 expression conferred by the basal yeast promoter present in the DNA bait∷reporter constructs to enable growth on media lacking histidine. When the same construct is used in a protein-DNA interaction assay, however, the media are supplemented with 3-aminotriazole (3AT), a competitive inhibitor of the His3 enzyme. That way, growth of the yeast depends on an increase in expression of His3, conferred by an interacting AD-TF hybrid protein (see below and Figure 4). DNA bait∷LacZ constructs contain a wild type URA3 gene and are integrated into a mutant URA3 locus, thereby rescuing the Ura3 deficiency when plated on media lacking uracil. The DNA bait∷reporter constructs do not carry a yeast origin of replication and, therefore, the formation of colonies is strictly dependent on their integration into the yeast genome. Integrations are generally done sequentially, either by first integrating the DNA bait∷HIS3 or the DNA bait∷LacZ construct, and following with the other. However, it is possible to integrate both constructs simultaneously, but the efficiency will be much lower and only a handful of colonies is usually obtained (unpublished data).
After picking integrant colonies, they need to be tested for background reporter gene expression (auto-activation). Levels of auto-activation can differ between integrants from the same DNA bait∷reporter construct, most likely because of differences in copy number (Deplancke et al., 2004). The degree of auto-activation of DNA bait∷HIS3 strains is determined by plating the colonies on media lacking histidine, and with increasing concentrations of 3AT (5, 10, 20, 40, 60 and 80 mM). Preferably colonies are selected that do not confer growth on low concentrations (5-40 mM) of 3AT. The degree of auto-activation of DNA bait∷LacZ strains is determined by a colorimetric pGal assay where white indicates no induction and darker shades of blue indicate increasing induction of βGal. Colonies with little or no blue should be selected where possible. In our hands 10-20% of all DNA baits exhibit high levels of auto-activation. These baits are difficult to use in Y1H assays although certainly interacting TFs can sometimes be detected.
After obtaining double integrant DNA bait strains that exhibit the lowest possible levels of auto-activation, the actual Y1H experiment can be performed to detect interacting TFs. In Y1H assays, TFs are fused to the transcription activation domain of the yeast Gal4 protein. This ensures that both activators and repressors of transcription can be detected. In other words only strict physical protein-DNA interactions are examined in Y1H assays. AD-TF clones can be obtained from different sources and can be introduced into the DNA bait strain in different ways (Figure 4). In our Y1H system, AD-TF clones carry wild type yeast TRP1 and, therefore, colonies containing the plasmid are selected on media lacking tryptophan.
The most commonly used method is by transforming an AD-cDNA library into haploid DNA bait strains (Arda et al., 2010; Deplancke et al., 2004; Deplancke et al., 2006a; Deplancke et al., 2006b; Martinez et al., 2008a; Vermeirssen et al., 2007a). Another source for such haploid transformations was created by cherry-picking relevant clones from the ORFeome, transferring them to the AD Y1H Destination vector by Gateway cloning, and combining them into a single AD-TF mini-library (Deplancke et al., 2004). This library consists of ∼650 full-length TFs. Screening such a mini-library enables the detection of TFs that are underrepresented in non-normalized cDNA libraries. Since TFs are usually of low abundance, this can be very useful. In library screens, interacting TFs are identified by yeast colony PCR and sequencing. We have also developed mini-pools of individual AD-TF clones that can be introduced into DNA bait strains by transformation (Vermeirssen et al., 2007b). These pools are designed using a “Smart pool” strategy, based on a Steiner Triple System that is used in combinatorial mathematics. We have generated these pools as well as the scripts to deconvolute the resulting interactions. This method is useful for higher throughput, cost-effective Y1H experiments because it does not rely on extensive prey sequencing. Single AD-TF clones can of course also be transformed individually when particular pre-defined interactions are to be examined (Reece-Hoyes et al., 2009).
In addition to transformation into haploid DNA bait strains, AD-TF clones can also be introduced by mating. For this, we have transformed the AD-TF clones (∼755 in the first iteration) into yeast of mating type α, which is compatible for mating with the DNA bait strains that have the “a” mating type (Vermeirssen et al., 2007b). DNA bait strains are mated with the AD-TF clone array and positives are examined in diploids. Each of these different methods for introducing AD-TFs into DNA bait strains has advantages and disadvantages (Vermeirssen et al., 2007b). Generally, transformation detects more interacting TFs than mating. However, mating is fast, less labor-intensive and much less costly. Further, interactions detected by mating are highly robust and reproducible. When comparing library screens to more directed experiments with smart pools or individual clones, it is clear that many more protein-DNA interactions are found by the latter methods. However, with directed experiments only cloned TFs can by definition be found, which in our current collection is about 850 (∼90%) (Vermeirssen et al., 2007b)(unpublished data). Proteins that do not have a recognizable DNA binding domain can only be retrieved in unbiased cDNA library screens (Deplancke et al., 2006a; Vermeirssen et al., 2007a). However, we do include these in TF resources after confirming their capability of interacting with C. elegans promoters and obtaining a suitable clone.
MicroRNA-mRNA interactions
Putative interactions between the 3′UTRs of mRNAs and microRNAs are mostly identified genetically or computationally predicted using one or more algorithms that are publicly available. These include PicTar (Lall et al., 2006), MiRanda (Griffiths-Jones et al., 2006), TargetScan (Lewis et al., 2005), RNA hybrid (Rehmsmeier et al., 2004) and mirWIP (Hammell et al., 2008). These algorithms are challenging to use because they are often too greedy (high rate of false positive predictions), or too stringent (high rate of false negative predictions). In order to alleviate this, at least to some extent, we have previously used predictions that were found by at least two of four algorithms used (Martinez et al., 2008a). Future experimental approaches will shed light onto physical and functional microRNA-mRNA interactions that occur in vivo (Lall et al., 2006; Zisoulis et al., 2010).
Other regulatory interactions
In addition to protein-DNA and microRNA-mRNA interactions, other relationships are involved in gene control. An important class involves sequence-specific RNA binding proteins that interact with the 3′UTR of mRNAs. It is not yet clear how many sequence-specific RNA binding proteins are encoded by the C. elegans genome, and only few have been studied genetically or biochemically. For instance, detailed binding sites have been determined in vitro for MEX-3, MEX-5 and a handful of other RNA binding proteins (Farley et al., 2008; Pagano et al., 2009; Pagano et al., 2007). However, the functionality of most RNA binding proteins and their mRNA targets remains largely unexplored.
Gene regulatory network visualization and analysis
The identification of physical and functional relationships between genes and their regulators in only the first step in the characterization of gene regulatory networks. Lists of interactions are usually difficult to navigate through. Network models, however, provide a visually attractive method for gene regulatory network analysis. We usually use the publicly available Cytoscape tool (Shannon et al., 2003) for network visualization and analysis (Arda et al., 2010; Deplancke et al., 2006a; Grove et al., 2009; Martinez et al., 2008a; Vermeirssen et al., 2007a). Subsequently, we use a variety of tools for network analysis. Most notably we use topological overlap coefficient analysis to compare gene expression patterns and to identify TF or gene network modules. These methods are discussed elsewhere (Arda et al., 2010; Arda and Walhout, 2009; Ravasz et al., 2002; Vermeirssen et al., 2007a).
Gene regulatory network validation
As with any method, the identification of physical and functional interactions between genes and their regulators is subject to issues related to both assay sensitivity and assay specificity. Sensitivity refers to the proportion of real interaction that can be identified by the assay; interactions that cannot be detected are referred to as false negatives. Specificity refers to the proportion of interactions detected that are real, i.e. that do occur in vivo and/or that have a biological consequence. Interactions that are detected but that are not biologically meaningful are referred to as false positives.
False negatives
Previously, we estimated the coverage of our Y1H screens to be ∼35% (Deplancke et al., 2006a). This number is based on a very small number of available published interactions, but is very similar to the coverage obtained with Y2H (Braun et al., 2009). There are several reasons that not all possible TF-promoter interactions can be detected by Y1H assays: 1) Several TFs bind DNA as obligatory dimers. Although homodimers can be detected (e.g. we detected binding of HLH-1, a homodimeric bHLH TF), our assay currently is not configured to detect heterodimers. In the future, we hope to develop approaches that enable the detection of heterodimeric TF-DNA interactions in directed Y1H assays. 2) We will not find TFs that depend on specific post-translational modification or co-factor interactions with C. elegans proteins for DNA binding. 3) We can obviously only find TFs that are available in the TF resource used. However, it is highly encouraging to note that we have already detected interactions for about 25% of all predicted C. elegans TFs with only ∼1% of all gene promoters.
False positives
As with Y2H assays there are two types of false positives with Y1H assays: technical false positives that cannot be reproduced in the same assay, and biological false positives that represent genuine Y1H/Y2H interactions that nonetheless do not occur in vivo. To keep the rate of technical false positives low several issues need to be taken into consideration. First, it is best to only consider interactions that score positively for both Y1H reporters; i.e. that induce growth on media lacking histidine and containing 3AT and that are bluer than an “AD only” control. Second, it is important to make sure that the TF retrieved is in frame (only relevant to cDNA library screens). Third, all Y1H interactions need to be retested in fresh DNA bait cells (i.e. from a frozen stock that has not been used in the screen itself), either by gap-repair (Walhout and Vidal, 2001) or by directly transforming a AD-TF clone. This is necessary because baits can mutate in yeast and give rise to a colony with an apparent interaction phenotype that is not reproducible (Walhout and Vidal, 1999). Fourth, it is absolutely critical to integrate DNA bait∷reporter constructs into the yeast genome. We have tried to perform the assay with replicating plasmids, but the background expression was highly variable, probably due to different plasmid copy numbers. Finally, it is important to note that interactions obtained with highly auto-active DNA baits are more difficult to assess and may be less specific. We have developed an interaction scoring scheme to assess the results obtained from Y1H library screens (Vermeirssen et al., 2007a).
Biological false positives are more challenging to assess. First, the genome itself is the same in every cell and thus, when a TF is expressed in any given cell one may expect the interaction to occur. However, the nucleosome occupancy likely varies in different cell types and this may prevent interactions from occurring in vivo. The integration of the DNA baits into the yeast genome ensures that they are incorporated into chromatin and, thus, Y1H assays are not based on interactions with naked DNA. However, it could be that the integration of the DNA baits in yeast only partially recapitulates the chromatin state in C. elegans. In Y1H assays, we can find multiple members of a TF family binding to a particular DNA bait. This could be because these members have very similar DNA binding specificities and that this does not reflect in vivo functionality. However, we, and others have found that multiple members of a TF family can bind the same DNA targets in vivo and can function redundantly (Hollenhorst et al., 2007; Ow et al., 2008). For instance, multiple TFs with a FLYWCH DNA binding domain were found to interact with microRNA promoters in Y1H assays and to redundantly repress microRNA expression in the early C. elegans embryo (Ow et al., 2008). It is also important to note that not all TF-DNA interactions lead to a regulatory consequence. For instance, ChIP has identified numerous interactions that do not have an apparent biological function (Li et al., 2008). This should be taken into account when physical interactions are being assessed by regulatory assays such as target gene expression in TF mutants or by TF knockdown with RNAi. Finally, different validation assays each have their own rate of false negatives, i.e. they cannot detect every single genuine interaction. For instance, assays that are performed with mixed populations of animals can easily miss interactions that occur only in a few cells or only during a short developmental time. Indeed, in our study of the B0507.1 promoter, we found a reduction in expression upon loss of the TF CES-1 only in the spermatheca, rectal gland and pharyngeal-intestinal valve and, since these are not large tissues, this would be extremely difficult to detect in mixed population, whole animal assays such as qPCR (Reece-Hoyes et al., 2009).
Future challenges
The comprehensive mapping of gene regulatory networks in C. elegans has only just started. Future studies are needed to complete transcriptional networks by high-throughput Y1H assays, and by other complementary assays such as ChIP. In addition, it will be highly useful to systematically generate promoter∷GFP constructs and corresponding transgenic C. elegans lines for all worm genes. Such lines can then be used to examine promoter activity under different experimental or physiological conditions and to validate transcriptional networks, for instance using TF mutants or TF knockdown. Further, the continued experimental analysis of microRNAs and other small RNAs will be of extremely high value. Experimental methods also need to be developed and applied to assess other regulatory networks, such as those involving RNA binding proteins, signaling molecules and metabolites. Finally, it will be exciting to go beyond static network models that represent a compilation of the interactions that can occur in the animal and to incorporate the dynamics and levels of gene and regulator expression and activation throughout the lifetime of the nematode.
Acknowledgments
I thank members of my lab for their hard work and especially John Reece-Hoyes and Lesley MacNeil for critical reading of the manuscript. Work in my lab is supported by the National Institutes of Health (DK068429 and GM082971) and by the Ellison Medical Research Foundation.
References
- Arda HE, Taubert S, Conine C, Tsuda B, Van Gilst MR, Sequerra R, Doucette-Stam L, Yamamoto KR, Walhout AJM. Functional modularity of nuclear hormone receptors in a C. elegans gene regulatory network. Molecular Systems Biology. 2010;6:367. doi: 10.1038/msb.2010.23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arda HE, Walhout AJM. Gene-centered regulatory networks. Briefings in functional genomics and proteomics. 2009 doi: 10.1093/elp049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Braun P, Tasan M, Dreze M, Barrios-Rodiles M, Lemmens I, Yu H, Sahalie JM, Murray RR, Roncari L, de Smet AS, Venkatesan K, Rual JF, Vandenhaute J, Cusick ME, Pawson T, Hill DE, Tavernier J, Wrana JL, Roth FP, Vidal M. An experimentally derived confidence score for binary protein-protein interactions. Nat Methods. 2009;6:91–7. doi: 10.1038/nmeth.1281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deplancke B, Dupuy D, Vidal M, Walhout AJM. A Gateway-compatible yeast one-hybrid system. Genome Res. 2004;14:2093–2101. doi: 10.1101/gr.2445504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deplancke B, Mukhopadhyay A, Ao W, Elewa AM, Grove CA, Martinez NJ, Sequerra R, Doucette-Stam L, Reece-Hoyes JS, Hope IA, Tissenbaum HA, Mango SE, Walhout AJM. A gene-centered C. elegans protein-DNA interaction network. Cell. 2006a;125:1193–1205. doi: 10.1016/j.cell.2006.04.038. [DOI] [PubMed] [Google Scholar]
- Deplancke B, Vermeirssen V, Arda HE, Martinez NJ, Walhout AJM. Gateway-compatible yeast one-hybrid screens. CSH Protocols. 2006b doi: 10.1101/pdb.prot4590. [DOI] [PubMed] [Google Scholar]
- Dupuy D, Li Q, Deplancke B, Boxem M, Hao T, Lamesch P, Sequerra R, Bosak S, Doucette-Stam L, Hope IA, Hill D, Walhout AJM, Vidal M. A first version of the Caenorhabditis elegans promoterome. Genome Res. 2004;14:2169–2175. doi: 10.1101/gr.2497604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Farley BM, Pagano JM, Ryder SP. RNA target specificity of the embryonic cell fate determinant POS-1. RNA. 2008;14:2685–97. doi: 10.1261/rna.1256708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friedlander MR, Chen W, Adamidi C, Maaskola J, Einspanier R, Knespel S, Rajewski N. Discovering microRNAs from deep sequencing data using miRDeep. Nat Biotechnol. 2008;26:407–415. doi: 10.1038/nbt1394. [DOI] [PubMed] [Google Scholar]
- Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006;34:D140–4. doi: 10.1093/nar/gkj112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grove CA, deMasi F, Barrasa MI, Newburger D, Alkema MJ, Bulyk ML, Walhout AJ. A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell. 2009;138:314–327. doi: 10.1016/j.cell.2009.04.058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hammell M, Long D, Zhang L, Lee A, Carmack CS, Han M, Ding Y, Ambros V. mirWIP: microRNA target prediction based on microRNA-containing ribonucleoprotein-enriched transcripts. Nat Methods. 2008;5:813–9. doi: 10.1038/nmeth.1247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hartley JL, Temple GF, Brasch MA. DNA cloning using in vitro site-specific recombination. Genome Res. 2000;10:1788–1795. doi: 10.1101/gr.143000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hollenhorst PC, Shah AA, Hopkins C, Graves BJ. Genome-wide analyses reveal properties of redundant and specific promoter occupancy within the ETS gene family. Genes Dev. 2007;21:1882–1894. doi: 10.1101/gad.1561707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hunt-Newbury R, Viveiros R, Johnsen R, Mah A, Anastas D, Fang L, Halfnight E, Lee D, Lin J, Lorch A, McKay S, Okada HM, Pan J, Schulz AK, Tu D, Wong K, Zhao Z, Alexeyenko A, Burglin T, Sonnhammer E, Schnabel R, Jones SJ, Marra MA, Baillie DL, Moerman DG. High-throughput in vivo analysis of gene expression in Caenorhabditis elegans. PLoS Biol. 2007;5:e237. doi: 10.1371/journal.pbio.0050237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kato M, de Lencastre A, Pincus Z, Slack FJ. Dynamic expression of small non-coding RNAs, including novel microRNAs and piRNAs/21U-RNAs, during Caenorhabditis elegans development. Genome Biol. 2009;10:R54. doi: 10.1186/gb-2009-10-5-r54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kwon ES, Narasimhan SD, Yen K, Tissenbaum HA. A new DAF-16 isoform regulates longevity. Nature. 2010 doi: 10.1038/nature09184. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lall S, Grun D, Krek A, Chen K, Wang YL, Dewey CN, Sood P, Colombo T, Bray N, MacMenamin P, Kao HL, Gunsalus KC, Pachter L, Piano F, Rajewski N. A Genome-wide map of conserved microRNA targets in C. elegans. Curr Biol. 2006;16:460–471. doi: 10.1016/j.cub.2006.01.050. [DOI] [PubMed] [Google Scholar]
- Lamesch P, Milstein S, Hao T, Rosenberg J, Li N, Sequerra R, Bosak S, Doucette-Stam L, Vandenhaute J, Hill DE, Vidal M. C. elegans ORFeome version 3.1: increasing the coverage of ORFeome resources with improved gene predictions. Genome Res. 2004;14:2064–2069. doi: 10.1101/gr.2496804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lehrbach NJ, Miska EA. Functional genomic, computational and proteomic analysis of C. elegans microRNAs. Briefings in functional genomics and proteomics. 2008;7:228–235. doi: 10.1093/bfgp/eln024. [DOI] [PubMed] [Google Scholar]
- Letunic I, Copley RR, Schmidt S, Ciccarelli FD, Doerks T, Schultz J, Ponting CP, Bork P. SMART 4.0: towards genomic data integration. Nucleic Acids Res. 2004;32:D142–4. doi: 10.1093/nar/gkh088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005;120:15–20. doi: 10.1016/j.cell.2004.12.035. [DOI] [PubMed] [Google Scholar]
- Li JJ, Herskowitz I. Isolation of the ORC6, a component of the yeast origin recognition complex by a one-hybrid system. Science. 1993;262:1870–1874. doi: 10.1126/science.8266075. [DOI] [PubMed] [Google Scholar]
- Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JDJ, Chesneau A, Hao T, Goldberg DS, Li N, Martinez M, Rual JF, Lamesch P, Xu L, Tewari M, Wong SL, Zhang LV, Berriz GF, Jacotot L, Vaglio P, Reboul J, Hirozane-Kishikawa T, Li Q, Gabel HW, Elewa A, Bauemgartner B, Rose DJ, Yu H, Bosak S, Sequerra R, Fraser A, Mango SE, Saxton WM, Strome S, van den Heuvel S, Piano F, Vandenhaute J, Sardet C, Gerstein M, Doucette-Stam L, Gunsalus K, Harper JW, Cusick ME, Roth FP, Hill D, Vidal M. A map of the interactome network of the metazoan C. elegans. Science. 2004;303:540–543. doi: 10.1126/science.1091403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li XY, MacArthur S, Bourgon R, Nix D, Pollard DA, Iyer VN, Hechmer A, Simirenko L, Stapleton M, Luengo Hendriks CL, Chu HC, Ogawa N, Inwood W, Sementchenko V, Beaton A, Weiszmann R, Celniker SE, Knowles DW, Gingeras TR, Speed TP, Eisen MB, Biggin MD. Transcription factors bind thousands of active an inactive regions in the Drosophila blastoderm. PLoS Biol. 2008;6:e27. doi: 10.1371/journal.pbio.0060027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lim LP, Lau NC, Weinstein EG, Abdelhakim A, Yekta S, Rhoades MW, Burge CB, Bartel DP. The microRNAs of Caenorhabditis elegans. Genes Dev. 2003;17:991–1008. doi: 10.1101/gad.1074403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mangone M, Macmenamin P, Zegar C, Piano F, Gunsalus KC. UTRome.org: a platform for 3′UTR biology in C. elegans. Nucleic Acids Res. 2008;36:D57–62. doi: 10.1093/nar/gkm946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mangone M, Manoharan AP, Thierry-Mieg D, Thierry-Mieg J, Han T, Mackowiak S, Mis E, Zegar C, Gutwein MR, Khivansara V, Attie O, Chen K, Salehi-Ashtiani K, Vidal M, Harkins TT, Bouffard P, Suzuki Y, Sugano S, Kohara Y, Rajewsky N, Piano F, Gunsalus KC, Kim JK. The Landscape of C. elegans 3′UTRs. Science. doi: 10.1126/science.1191244. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martinez NJ, Ow MC, Barrasa MI, Hammell M, Sequerra R, Doucette-Stamm L, Roth FP, Ambros V, Walhout AJM. A C. elegans genome-scale microRNA network contains composite feedback motifs with high flux capacity. Genes Dev. 2008a;22:2535–2549. doi: 10.1101/gad.1678608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martinez NJ, Ow MC, Reece-Hoyes J, Ambros V, Walhout AJ. Genome-scale spatiotemporal analysis of Caenorhabditis elegans microRNA promoter activity. Genome Res. 2008b;18:2005–2015. doi: 10.1101/gr.083055.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mukhopadhyay A, Deplancke B, Walhout AJ, Tissenbaum HA. Chromatin immunoprecipitation (ChIP) coupled to detection by quantitative real-time PCR to study transcription factor binding to DNA in Caenorhabditis elegans. Nat Protoc. 2008;3:698–709. doi: 10.1038/nprot.2008.38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W, Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD, Servant F, Sigrist CJ, Vaughan R, Zdobnov EM. The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res. 2003;31:315–8. doi: 10.1093/nar/gkg046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oh SW, Mukhopadhyay A, Dixit BL, Raha T, Green MR, Tissenbaum HA. Identification of direct targets of DAF-16 controlling longevity, metabolism and diapause by chromatin immunoprecipitation. Nat Genet. 2006;38:251–257. doi: 10.1038/ng1723. [DOI] [PubMed] [Google Scholar]
- Ow MC, Martinez NJ, Olsen P, Silverman S, Barrasa MI, Conradt B, Walhout AJM, Ambros VR. The FLYWCH transcript elegion factors FLH-1, FLH-2 and FLH-3 repress embryonic expression of microRNA genes in C. elegans. Genes Dev. 2008;22:2520–2534. doi: 10.1101/gad.1678808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pagano JM, Farley BM, Essien KI, Ryder SP. RNA recognition by the embryonic cell fate determinant and germline totipotency factor MEX-3. Proc Natl Acad Sci U S A. 2009;106:20252–7. doi: 10.1073/pnas.0907916106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pagano JM, Farley BM, McCoig LM, Ryder SP. Molecular basis of RNA recognition by the embryonic polarity determinant MEX-5. J Biol Chem. 2007;282:8883–94. doi: 10.1074/jbc.M700079200. [DOI] [PubMed] [Google Scholar]
- Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL. Hierarchical organization of modularity in metabolic networks. Science. 2002;297:1551–5. doi: 10.1126/science.1073374. [DOI] [PubMed] [Google Scholar]
- Reboul J, Vaglio P, Rual JF, Lamesch P, Martinez M, Armstrong CM, Li S, Jacotot L, Bertin N, Janky R, Moore T, Hudson JR, Jr, Hartley JL, Brasch MA, Vandenhaute J, Boulton S, Endress GA, Jenna S, Chevet E, Papasotiropoulos V, Tolias PP, Ptacek J, Snyder M, Huang R, Chance MR, Lee H, Doucette-Stamm L, Hill DE, Vidal M. C. elegans ORFeome version 1.1: experimental verification of the genome annotation and resource for proteome-scale protein expression. Nat Genet. 2003;34:35–41. doi: 10.1038/ng1140. [DOI] [PubMed] [Google Scholar]
- Reece-Hoyes JS, Deplancke B, Barrasa MI, Hatzold J, Smit RB, Arda HE, Pope PA, Gaudet J, Conradt B, Walhout AJ. The C. elegans Snail homolog CES-1 can activate gene expression in vivo and share targets with bHLH transcription factors. Nucleic Acids Res. 2009;37:3689–3698. doi: 10.1093/nar/gkp232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reece-Hoyes JS, Deplancke B, Shingles J, Grove CA, Hope IA, Walhout AJM. A compendium of C. elegans regulatory transcription factors: a resource for mapping transcription regulatory networks. Genome Biol. 2005;6:R110. doi: 10.1186/gb-2005-6-13-r110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reece-Hoyes JS, Shingles J, Dupuy D, Grove CA, Walhout AJ, Vidal M, Hope IA. Insight into transcription factor gene duplication from Caenorhabditis elegan Promoterome-driven expression patterns. BMC Genomics. 2007;8:27. doi: 10.1186/1471-2164-8-27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rehmsmeier M, Steffen P, Hochsmann M, Giegerich R. Fast and effective prediction of microRNA/target duplexes. Rna. 2004;10:1507–17. doi: 10.1261/rna.5248604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ruvkun G, Hobert O. The taxonomy of developmental control in Caenorhabditis elegans. Science. 1998;282:2033–2041. doi: 10.1126/science.282.5396.2033. [DOI] [PubMed] [Google Scholar]
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vermeirssen V, Barrasa MI, Hidalgo C, Babon JAB, Sequerra R, Doucette-Stam L, Barabasi AL, Walhout AJM. Transcription factor modularity in a gene-centered C. elegans core neuronal protein-DNA interaction network. Genome Res. 2007a;17:1061–1071. doi: 10.1101/gr.6148107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vermeirssen V, Deplancke B, Barrasa MI, Reece-Hoyes JS, Arda HE, Grove CA, Martinez NJ, Sequerra R, Doucette-Stamm L, Brent M, Walhout AJM. Matrix and Steiner-triple-system smart pooling assays for high-performance transcription regulatory network mapping. Nat Methods. 2007b;4:659–664. doi: 10.1038/nmeth1063. [DOI] [PubMed] [Google Scholar]
- Walhout AJM, Reboul J, Shtanko O, Bertin N, Vaglio P, Ge H, Lee H, Doucette-Stam L, Gunsalus KC, Schetter AJ, Morton DG, Kemphues KJ, Reinke V, Kim SK, Piano F, Vidal M. Integrating Interactome, Phenome, and Transcriptome Mapping Data for the C. elegans Germline. Current Biology. 2002;12:1952–1958. doi: 10.1016/s0960-9822(02)01279-4. [DOI] [PubMed] [Google Scholar]
- Walhout AJM, Sordella R, Lu X, Hartley JL, Temple GF, Brasch MA, Thierry-Mieg N, Vidal M. Protein interaction mapping in C. elegans using proteins involved in vulval development. Science. 2000a;287:116–122. doi: 10.1126/science.287.5450.116. [DOI] [PubMed] [Google Scholar]
- Walhout AJM, Temple GF, Brasch MA, Hartley JL, Lorson MA, van den Heuvel S, Vidal M. GATEWAY recombinational cloning: application to the cloning of large numbers of open reading frames or ORFeomes. Methods in enzymology: “Chimeric genes and proteins”. 2000b;328:575–592. doi: 10.1016/s0076-6879(00)28419-x. [DOI] [PubMed] [Google Scholar]
- Walhout AJM, Vidal M. A genetic strategy to eliminate self-activator baits prior to high-throughput yeast two-hybrid screens. Genome Res. 1999;9:1128–1134. doi: 10.1101/gr.9.11.1128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walhout AJM, Vidal M. High-throughput yeast two-hybrid assays for large-scale protein interaction mapping. Methods. 2001;24:297–306. doi: 10.1006/meth.2001.1190. [DOI] [PubMed] [Google Scholar]
- Whittle CM, Lazakovitch E, Gronostajski RM, Lieb JD. DNA-binding specificity and in vivo targets of Caenorhabditis elegans nuclear factor I. Proc Natl Acad Sci U S A. 2009;106:12049–12054. doi: 10.1073/pnas.0812894106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zisoulis DG, Lovci MT, Wilbert ML, Hutt KR, Liang TY, Pasquinelli AE, Yeo GW. Comprehensive discovery of endogenous Argonaute binding sites in Caenorhabditis elegans. Nat Struct Mol Biol. 2010;17:173–9. doi: 10.1038/nsmb.1745. [DOI] [PMC free article] [PubMed] [Google Scholar]