Abstract
The developing vertebrate nervous system contains a remarkable array of neural cells organized into complex, evolutionarily conserved structures. The labeling of living cells in these structures is key for the understanding of brain development and function, yet the generation of stable lines expressing reporter genes in specific spatio-temporal patterns remains a limiting step. In this study we present a fast and reliable pipeline to efficiently generate a set of stable lines expressing a reporter gene in multiple neuronal structures in the developing nervous system in medaka. The pipeline combines both the accurate computational genome-wide prediction of neuronal specific cis-regulatory modules (CRMs) and a newly developed experimental setup to rapidly obtain transgenic lines in a cost-effective and highly reproducible manner. 95% of the CRMs tested in our experimental setup show enhancer activity in various and numerous neuronal structures belonging to all major brain subdivisions. This pipeline represents a significant step towards the dissection of embryonic neuronal development in vertebrates.
Introduction
Recent years are witnessing a flood of new discoveries in neuroscience largely resulting from the ability to monitor living cells in the context of the developing nervous system using reporter gene expression [1]. Exciting development in engineering new proteins has extended current barriers to allow monitoring and manipulating the activity of specific pathways within living cells [2]–[5]. Nonetheless, these techniques rely heavily on the ability to drive gene expression to specific developmental stages, brain structures and cell types in a stable and reproducible way. While great efforts have been made to efficiently obtain such stable lines, this step remains a serious bottleneck.
In vertebrates, the most widely used strategy to express reporters in anatomical structures relies on the use of regulatory elements, often promoters of genes known to be expressed in the desired structures (promoter bashing). This trial and error process is slow and tedious. Thus, to maximize the chances of getting the right regulatory sequences, entire loci around selected genes employing BAC technology have been used [6]. However, this methodology is time-consuming and the level of reporter expression may not be high enough for proper monitoring. Other attempts to generate reporter gene expression in various structures are based on the random insertion of a reporter cassette into the genome [7]–[11]. Only upon activation by nearby regulatory element(s), the transgene is expressed (enhancer trap). In mouse [12] and zebrafish [13],[14], enhancer assays have been developed essentially to test genomic elements for enhancer activity.
Despite advantages of one approach over the other, all these methodologies have the significant drawback of lacking specificity. Testing semi-random elements in vertebrates either by promoter bashing or enhancer traps results in high screening efforts, while BAC technology, which addresses the specificity issue by using the entire locus instead, is experimentally tedious and cannot be scaled up easily.
In parallel, progress has been made towards the computational identification of regulatory regions in sequenced genomes. Previous work has shown that, without experimental priors, functional constraints acting on non-coding sequences are one of the most predictive information to locate regulatory elements [15],[16]. Thus cross-species comparison has been extensively used to improve the detection of functional non-coding DNA regions from neutrally evolving DNA [17]. The discovery of new regulatory regions using inter-species conservation was greatly stimulated by the recent availability of various vertebrate genomes, from mammals to fish [18]–[21] as well as the development of more specific and sensitive alignment programs [22]–[25]. Furthermore, it has been shown that the tendency of transcription factor binding sites (TFBS) to cluster together can be used to predict putative CRMs [26]. This led to the development of new methods to locate clusters of binding sites in conserved regions [27]. An algorithm that combines both, inter-species binding site conservation and clustering has recently been applied to the human genome [28] resulting in the identification of 118,000 predicted human regulatory elements [29].
Here, we report the development of a new pipeline aimed at specifically labeling, in a stable manner, various neuronal structures in developing Oryzias latipes (medaka) embryos. This pipeline represents two major breakthroughs compared to previous methodologies: A selective step to predict neuronal specific regulatory regions, combined with a new reliable enhancer assay in medaka to efficiently obtain stable lines expressing the reporter gene in neuronal structures ( Figure 1 ).
The selective step applies a modified version of the computational pipeline previously described [28] to select a large number of short (∼100–1000 bp) regions predicted to be CRMs in fish. As we predict vertebrate conservation to be an important criteria for selecting CRMs active in neuronal structure, we filtered those regions conserved until human and tested them in our new enhancer assay in the medaka fish. As expected, a vast majority of the regions resulted in a strong, reproducible expression of the reporter gene in various neuronal structures. All the major subdivisions of the medaka CNS were covered by at least one expression pattern. In most of the cases, the reporter gene expression persists beyond hatching and in all cases analyzed, at least two independent stable lines were generated. We also show that the enhancer activity is reminiscent of the endogenous target gene expression, which facilitates the additional selection of regions to target specific anatomical areas. Both, the computational prediction of CRMs and the experimental results have been integrated into databases for easy access and queries.
Taken together, our pipeline is an important tool for labeling neuronal structures and deciphering the regulatory grammar controlling the development of the neuronal system in vertebrates. Furthermore our results indicate that pan-vertebrate conserved non-coding elements compared to less deeply conserved elements, show activity preferentially in neuronal structures.
Results
Identification of a set of neuronal regulatory elements
One of the key steps to establish a robust pipeline for the labeling of developmental structures is the accurate prediction of autonomous regulatory elements in the genome. Thus, to define genomic regions most likely involved in gene regulation, we use a variant of the PreMod algorithm [28] applied to the medaka genome (see Methods). The algorithm first identifies individual TFBS based on a set of 402 high quality position-weight matrices (PWMs), from manually curated databases of known TFBS (Transfac [30], Jaspar [31]) and results from ChIP data [32] ( Figure 1 and methods ). Next, it assesses conservation of the predicted TFBS by comparing the medaka sequence to the orthologous sequences in Tetraodon nigroviridis (tetraodon), Takifugu rubripes (pufferfish) and Gasterosteus aculeatus (stickleback). Finally, clusters of conserved homotypic or oligotypic binding sites were identified and predicted as CRMs ( Figure 1 ).
The algorithm resulted in the identification of 23,011 predicted CRMs (average length 244 bp; median length 136 bp) which contain on average 62 putative TFBSs. These regions, despite being broadly distributed over the genome, are found significantly more often in intergenic regions (72.4%, p-value <0.01, Figure S1A) and preferentially within 100 kilobases (kb) distance to the nearest transcription start site (TSS) (93.11%, p-value <0.01, Figure S1B).
It has previously been shown that vertebrate conserved non-coding elements are functional enhancers [12]. These elements are also known to be preferentially located around developmental genes and are consequently hypothesized to be active during development [16]. Thus, we selected those predicted CRMs for which a statistically significant alignment in a conserved syntenic block with human was found (see Methods for details). Of the resulting 491 vertebrate conserved CRMs, 69.36% lie in intergenic regions (p-value <0.01, Figure S1A) and 97.98% are located less than 100 kb away from the nearest TSS (p-value <0.01, Figure S1B). These trends are accentuated compared to the ones observed for the entire set of predicted CRMs.
Both sets of predicted CRMs (all CRMs and vertebrate conserved CRMs) are stored in the PreMod database [29] (http://premod.mcb.mcgill.ca) and listed in Supplementary Tables S1 and S2. PreMod provides the location, score, and binding site content of each predicted CRM. It also reports which transcription factor matrices were used to build the CRM (tag matrices). Predicted CRMs and surrounding genes are displayed in their genomic context. Where in-situ expression of medaka genes or CRM activity information is available, PreMod links to the corresponding experimental data stored in the 4DXpress database [33] (http://4dx.embl.de/4DXembl/reg/all/searchbyspecies/line.do?speciesID=4).
Next, we took advantage of the large compendium of Danio rerio (zebrafish) in-situ annotations from ZFIN [34] to shed light on the putative function of the predicted CRMs. We first mapped the in-situ annotation of the zebrafish genes onto their orthologs in medaka ( Methods and Figure 2A ). For each of those predicted CRMs in the medaka genome, we located the closest of the two flanking genes and assigned its projected ZFIN annotation to the CRM. We then tested if vertebrate conserved CRMs show a statistically significant increase in annotations for certain developmental tissues compared to the rest of predicted CRMs. Interestingly, we found that vertebrate conserved CRMs are associated with an elevated ratio of genes expressed in various brain regions compared to all predicted CRMs ( Figure 2B ; Tables S3 and S4). More specifically, 74% of vertebrate conserved CRMs are associated with genes annotated as being expressed in the central nervous system (brain: p-value = 5e−4,spinal cord: p-value = 2e−3). On the other hand, enrichment is not observed in non-neuronal tissues (pronephros: p-value = 0.22, somite: p-value = 0.45, cardiovascular system: p-value = 0.67).
This finding, empirically observed in mouse enhancer analysis [12] and confirmed in this study, has important implications for the understanding of neuronal system evolution in vertebrates. Vertebrate conservation can be used as criteria to prioritize which regulatory elements to use for the labeling of neuronal structures.
Development of a new enhancer assay in medaka
We developed a new enhancer assay to rapidly test genomic regions for enhancer activity and to derive stable transgenic lines. Aiming to set up a pipeline for large-scale analysis, we particularly focused on generating a quick and reliable readout, which required live monitoring of the expression pattern directly in injected embryos. The ability to record GFP expression in a live embryo throughout its development is a clear advantage of the fish system compared to the mouse embryo. Thus, we expect an increased sensitivity in the detection of expression patterns and better characterization of these expression patterns over time.
We use meganuclease mediated transgenesis [35] as the method of choice to obtain highly efficient integration of the transgene into the genome and high rates of germline transmission. Predicted CRMs are cloned into a pBlueScript-based transgenesis vector containing two recognition sites for the meganuclease ISce-I [36] flanking a core promoter, a reporter gene and a SV40-polyadenylation signal. Injected embryos were visually monitored daily for a week to follow the spatio-temporal pattern of GFP expression during embryonic developmental stages ( Figure 1 ).
We also developed a robust and efficient experimental setup to distinguish between the absence of enhancer activity and the failure of the injection experiment. For this, we use the hsp70 core promoter that conveniently triggers a strong and specific lens expression from stage 28 onward [37]. The heat-inducible zebrafish hsp70 gene is expressed during normal lens development under non-stress conditions. This feature remains when CRMs are cloned upstream of the core promoter, resulting in embryos with composite expression in the lens and other domain(s) (if any) specific for the CRM. As the correlation between lens expression and expression in other domains is very high when testing positive CRMs, the monitoring of lens expression itself is a very good indicator for the injection success rate.
We therefore monitor the number of lens-positive embryos (injection success rate) and the number of embryos showing reproducible GFP expression in other domains (Table S5). The percentage of successfully injected embryos showing reproducible expression outside the lens is calculated and should be above 50% in order to call a genomic region positive for enhancer activity. To be significant, a consistent pattern should be seen for at least 10 individual fish. This typically requires injecting less than a hundred embryos, which is easily achievable in a single injection experiment. About 1 in every 50 successfully injected embryos shows non-consistent expression most likely resulting from the activity of local enhancers (enhancer trap). Following our defined criteria, the enhancer trap expression pattern does not pass the quality control and is therefore discarded. This quality control measurement is a significant improvement over previously described enhancer assays from which the distinction between injection failure and lack of enhancer activity cannot be made.
In a typical experiment we obtain an injection success rate around 46%, and, in the case of a functional enhancer, on average 66% of successfully injected embryos show a consistent expression pattern (Table S5). These highly reproducible patterns are a good indication that the expression patterns we observe are solely the result of the tested enhancer activity.
A vast majority of the computationally predicted regions shows enhancer activity
The top 10 computationally predicted vertebrate CRMs located in eight genomic loci were experimentally tested for enhancer activity and the injected fish were raised to generate stable transgenic lines (Table S6a). To evaluate the global success rate of the pipeline, an additional 10 predicted CRMs evenly distributed among the 200 top scoring candidates were tested for enhancer activity (Table S6b).
To ensure the inclusion of all the necessary regulatory features, we fused close-by predicted CRMs (see Methods) and extended the predicted regions to include 200 bp flanking sequence on each side. The resulting regions are ranging from around 500 bp to 2 kb and their location varies from 2095 bp to 63755 bp distance to the TSS of the nearest gene (20 kb on average).
Out of the 20 tested regions, 19 triggered a reproducible expression pattern in transient transgenic fish ( Figure 3 , Figures S2, S3 , S4 , S5 , S6 , S7 , S8 , S9 ). Extrapolated to the full dataset of the 200 top scoring regions, we estimate that 95% of the computationally predicted CRMs have enhancer activity during embryonic development. The fraction of validated enhancers is higher than for another large-scale study done in mouse, which reveals that 40% of ultra-conserved elements show enhancer activity [12]. This result is further discussed but may be caused by both, the prediction method involving vertebrate conserved regions and the monitoring of reporter gene expression throughout the whole embryonic development.
Stable transgenic lines were generated for all the top nine candidate regions with validated enhancer activity. The same spatio-temporal structures were labeled in transient injected fish compared to stable lines showing that the accurate description of enhancer activity can be done directly in the injected fish. Thus, the required experimental time can be cut down from eight weeks (generation time of medaka) to less than a week (time for embryogenesis in medaka).
Stable expression of the reporter gene in neuronal structures
Further confirming the computational predictions, all the positive elements drive reporter gene expression in various neuronal structures. Some patterns are limited to very specific areas of the brain or the peripheral nervous system, sometimes, with just a few cells being labeled. This specific expression remains in stable lines suggesting that the reporter gene expression is activated in only one or a few cell types. For example, MEDMOD021885 highlights a cluster of a few dozen neurons located bilaterally in the diencephalon ( Figure 3d ). Other CRMs gave broader expression patterns, covering entire domain(s) of the brain.
For a general analysis of the neuronal system, a complete coverage of brain structures would be desired. We found that all major subdivisions of the vertebrate CNS include labeled cells in our assay. Reporter gene expression is found in telencephalic domains (line MEDMOD021953), the diencephalon (lines MEDMOD021953, MEDMOD021885, MEDMOD046007), the mesencephalon (lines MEDMOD074008, MEDMOD021953), the rhombencephalon (lines MEDMOD021953 and MEDMOD070042, among others), and the spinal cord (line MEDMOD070042). Other neuron-containing structures, such as the nasal epithelium were also labeled (lines MEDMOD21953 and MEDMOD074008) ( Figure 3 ; Figure S2).
The expression patterns of the lines have been annotated using a controlled vocabulary from the medaka anatomical ontology [38] and incorporated into 4DXpress. From the 32 defined neuronal structures in the ontology, 20 (62%) were labeled in at least one of the stable lines generated. These stable lines expressing a reporter gene in specific cell types are an important starting point for further functional analysis of defined brain structures. In the long run, they offer a valuable resource for the accurate characterization of neuronal cell types and the anatomical description of embryonic neural structures in vertebrates.
Next, we investigated whether the reporter gene expression monitored in our stable lines reflects the expression pattern of the genes surrounding the CRMs in their native genomic location. For this we performed whole-mount in-situ hybridization of the genes flanking the CRMs and compared the resulting expression patterns with the activity of the enhancers ( Figure 1 ). For each of the nine predicted CRMs showing enhancer activity, we found that at least one of the flanking genes is expressed during development ( Figure 3 ). Furthermore, at least one spatio-temporal domain of expression is common with the reporter gene expression under the control of the corresponding enhancer. These results strongly suggest that our enhancer assay outputs represent an accurate description of the activity of the enhancers in their native endogenous state.
The algorithm defines a list of transcription factors predicted to bind to the predicted CRMs. To evaluate how pertinent this information is, we selected three experimentally confirmed CRMs whose activity is restricted to a very defined neuronal structure (forebrain, diencephalon or rhombomeres). Using the ZFIN database, we compared the expression pattern of the factors predicted to bind these CRMs to the observed enhancer activity and searched the literature for those transcription factors being expressed in overlapping domains (Table S7).
For the CRM active in the rhombomeres (MEDMOD086628) we found, among others, following transcription factors: MafB (Val), known to be required for hindbrain segmentation and rhombomere formation [39], Elf1 that belongs to the ephrin family which is involved in rhombomere boundary specification in zebrafish [40] and Evi1 that has been shown to be expressed in rhombomeres. Interestingly, Evi1 is a target gene of the MafB repressed transcription factor gene, hoxb1a [41],[42]. These three transcription factors (MafB, Elf1 and Evi1) have all been predicted to bind the MEDMOD086628 CRM, but expression domains of Elf1 and Evi1 are not limited to the rhombomeres. Only MafB is preferentially expressed in the rhomobomeres suggesting that MafB restricts the CRM activity to this structure.
For the CRM active in the diencephalon (MEDMOD045693), four transcription factors are predicted to bind this CRM (Pou3f2, Hnf6, dl and Fos) and show overlapping and specific expression patterns. Pou3f2, for example, is required for oxytocin neuronal development in the hypothalamus [43]. All these factors are expressed in additional domains suggesting that the coordinated action of these factors in the telencephalic domain is required for the CRM activity. The same holds true for the forebrain CRM (MEDMOD062537).
Taken together, these results show that the factors predicted to bind these CRMs can be used as starting points to prioritize further experiments.
Discussion
We describe a new hybrid methodology aimed at identifying neuronal regulatory elements in fish. With 95% success rate after experimental validation and a 100% success rate in transgenesis, this pipeline is, to date, the most efficient procedure to obtain stable transgenic lines expressing reporter genes in various neuronal structures. Furthermore, the orthologs of three of the 20 CRMs analyzed in our study have previously been tested in mouse [12]. For one of the sequences assayed (homologous to MEDMOD021953), expression of the reporter gene localized to the hindbrain of mouse at stage E11.5. In comparison, MEDMOD021953 also shows expression in the medaka hindbrain but is not restricted to this structure. No expression was observed for the other mouse sequence assayed by Pennacchio et al. [12] (homologous to MEDMOD086628) while it drives reporter gene expression in the rhombomeres in our study. These results indicate the high sensitivity of the enhancer assay in medaka.
We have also shown that the patterns of reporter gene expression in our lines are reminiscent of the expression of genes neighboring the tested CRMs. Using gene expression information such as in-situ data, it will therefore be possible to further target the pipeline to select regions most likely active in specific neuronal structures. This task is facilitated by the fact that the computational predictions stored in PreMod are linked to expression data stored in 4DXpress. Furthermore PreMod provides CRMs in their genomic context as well as a score for each predicted regulatory region. As a result, prior to in-vivo testing, CRMs can be targeted based on their genomic context and score.
Finally, we have shown that the predicted CRMs conserved across vertebrates are enriched around genes known to be expressed in neuronal tissues. Such enrichment cannot be detected for non-neuronal tissues (with the notable exception of pectoral fin and pectoral fin bud) suggesting that this trend is essentially neuronal specific. This analysis, (supported by the experimental results) indicates that pan-vertebrate conserved CRMs have preferred activity in neuronal structures. Our results are in accordance with a recent finding reporting that a large population of heart enhancers is poorly conserved [44] and suggests that the evolutionary conservation of enhancers can vary depending on tissue type. Conservation may reflect the ‘ancestrality’ of neuronal structures but could also reflect the tendency of alignment algorithms to perform better when co-linearity is preserved. Future analysis of such conservation will shed light on evolutionary events that lead to morphological innovation via the emergence of new regulatory interactions.
Our pipeline, designed to create neuronal tissue specific markers, is of great interest for analyzing enhancer activity, identifying genetic markers and finally as a cost effective enhancer screening tool.
Methods
CRM prediction
We collected a comprehensive set of 402 non-redundant PWMs based on Transfac (version 9.2) [30], Jaspar core vertebrate matrices [31] and a curated set of matrices built from Chip data with Trawler [32]. Transfac matrices were filtered based on the following rules:
All non-vertebrate transfac matrices were removed, except for 8 drosophila matrices for factors known to be involved in vertebrate development;
Matrices linked to more than two different TFs (from the same species) were discarded;
Among different matrices for the same TF, only that with the highest quality value was kept or, if not available, the predicted sites that are the most conserved through vertebrate evolution were used (M. Blanchette, unpublished).
For each TF, binding sites were predicted in the complete non-coding and non-repetitive regions of euteleostei (based on Ensembl database version 41 [45] of medaka (Oryzias latipes, assembly HdrR, Oct 2005 [46]), tetraodon (Tetraodon nigroviridis, assembly, Tetraodon 7, Apr 2003 [47]), stickelback (Gasterosteus aculeatus, assembly Broad S1, Feb 2006, Broad Institute) and takifugu (Takifugu rubripes, assembly 1.0, Aug 2002 [21]) genomes). We followed the procedure described in [28], with the following modifications:
The local GC-content background model used in [28] was replaced by a uniform background model;
Interspecies binding site conservation was measured using a more fexible approach that allows for (but penalizes) sites that are slightly misaligned, up to 20 bp. In addition, conservation was weighted as follows: hitScorealn(m, p) = hitScoremedaka+max(hitScoreTetraodon, hitScoreStickleback, hitScoreFugu). hitScore will then depend on both the score of the binding site in medaka and its conservation in at least one other teleost. Note that a binding site can have a high score without being conserved if the medaka scoring hit is strong enough. CRMs are predicted genome-wide and are not targeted to specific regions (regions with known developmental genes for example).
A subset of 491 CRM predictions was selected using criterion combining high CRM score and conservation with human (vertebrate conserved CRMs). Specifically, predicted CRMs with a BLASTZ [23] score over 2600 between medaka and human and with a percentage identity over 60% were ordered in descending order of CRM scores. BLASTZ homology searches in human were restricted to the orthologous neighborhood of each CRM, defined as following: Each medaka CRM was first associated to the closest medaka gene having a human ortholog H, and the human genes flanking H on the left and the right were identified. From the list of vertebrate conserved CRMs, we selected two datasets: [1] The top 10 scoring CRMs and [2] 10 CRMs distributed at regular intervals in the top 200 scoring CRMs (CRM at position 20, 40, 60, 81, 100, 120, 140, 159, 180, 200) for experimental validation.
Gene expression analysis
Each predicted CRM is associated with the closest gene independently of the genomic distance between them. We took advantage of the large collection of genes with zebrafish in-situ annotations available from the ZFIN in-situ database [34]. Next, we transferred zebrafish in-situ annotation to the medaka orthologs using the BioMart utility [45],[48]. If more than one ortholog was found for a given zebrafish gene, the orthologous gene with the highest identity was used. For each tissue (and its subparts) and stage, we retrieved all expressed genes. The expression annotation of each gene was subsequently transferred to the associated CRMs (Table S3). Only tissues associated with at least 20 vertebrate conserved CRMs are retained for further analysis. We then calculated the significance of the overrepresentation of CRMs showing annotation for specific tissues comparing the vertebrate conserved dataset to a background set (composed of the whole set of predicted CRMs, except vertebrate conserved). The significance of this overrepresentation was calculated with a one-sided fisher test. All tissue and stage annotations follow the OBO ontology.
CRM genomic location analysis
For each CRM, the distance to the nearest annotated TSS (as defined in Ensembl version 61) is retrieved and categorized into distances of less than 1 kb, 1 to 10 kb, 10 to 100 kb or more than 100 kb. We also assessed if the CRMs are localized in annotated genes or in intergenic regions (<100 or >100 kb away from the nearest gene as defined in Ensembl version 61). One hundred randomizations consisting of the same number of random locations (with the same size distribution) in the medaka genome as the number of CRMs in the real dataset has been produced. The same location analysis was then performed on these random datasets and the significance was calculated from these randomizations.
Molecular cloning
The identified CRMs were PCR amplified (using LA-Taq polymerase, Takara Bio Inc.) from genomic medaka DNA and flanking HindIII restriction sites introduced (for primer sequences see Table S8). After restriction digest the fragments were cloned into a pBlueScript-based transgenesis vector containing two recognition sites for the meganuclease ISce-I [35] flanking a multiple cloning site followed by the core promoter hsp70::GFP [37] and an SV40 polyadenylation signal (clone available upon request). All constructs were verified by sequencing.
Medaka injection and screening
Injections were done as described [49]. DNA was purified using the Maxiprep Kit (Qiagen) and injected at a concentration of 15 ng/µl.
A Leica fluorescent microscope (Leica MZFLIII) was used to examine GFP expression in live embryos. Injected embryos were analyzed at different stages to determine the spatio-temporal pattern of GFP expression. As the hsp70 core promoter is activated by temperature changes, the embryos were kept and examined at constant room temperature. Developmental stages were determined by morphological features as described by Iwamatsu [50].
Whole mount in-situ hybridization
For analysis of scamp1, fign(1 of 2), atg4c, gon3_oryla and kcnh7 expression patterns, fragments were PCR amplified from medaka cDNA (using Taq-Polymerase, primer sequences in Table S8) and subcloned using the TOPO TA Cloning Kit (Invitrogen). After verification by sequencing, Digoxigenin incorporated antisense-RNA probes were generated by in-vitro transcription with Sp6 or T7 RNA Polymerase (NEB).
Probe preparation and whole mount in-situ hybridization were performed as described previously [51]. For the remaining genes analyzed, we could find at least one clone matching part of the transcript sequence in our in-house library (in pCMV-Sport6.1). In these cases, probes were generated by in-vitro transcription with Sp6 or T7 RNA Polymerase directly from these clones.
Medaka annotation
The medaka nervous system ontology is derived from the medaka fish anatomy and development OBO ontology (medaka_ontology.obo). The descendent terms of nervous system at various stages were extracted. A total of 32 different terms were found and used for the controlled vocabulary annotation. Reporter gene expression was found in 20 (62%) of these anatomical terms.
Supporting Information
Acknowledgments
We wish to thank Pablo Cingolani for taking care of the PReMod web site and database. We would like to thank the fish room team for fish husbandry.
Footnotes
Competing Interests: The authors have read the journal's policy and have the following conflicts: Emmanuel Mongin was partially funded by genome Quebec during the period of research outlined in this paper. The commercial funder was not involved in any aspects of the research and applied no restrictions. No patents have been placed on the research outlined in this paper. The involvement of the funder does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials.
Funding: EM is supported by funding from genome Quebec/Canada. LE and JW are supported by the European Community's Seventh Framework Programme (FP7/2007-2013) CISSTEM and DFG-SFB488 TP17-TP8. This work was also supported in part by the EMBO Short-term Fellowships program (EM). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Tsien RY. The green fluorescent protein. Annu Rev Biochem. 1998;67:509–544. doi: 10.1146/annurev.biochem.67.1.509. [DOI] [PubMed] [Google Scholar]
- 2.Higashijima S, Masino MA, Mandel G, Fetcho JR. Imaging neuronal activity during zebrafish behavior with a genetically encoded calcium indicator. J Neurophysiol. 2003;90:3986–3997. doi: 10.1152/jn.00576.2003. [DOI] [PubMed] [Google Scholar]
- 3.Nagai T, Sawano A, Park ES, Miyawaki A. Circularly permuted green fluorescent proteins engineered to sense Ca2+. Proc Natl Acad Sci USA. 2001;98:3197–3202. doi: 10.1073/pnas.051636098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Pertz O, Hodgson L, Klemke RL, Hahn KM. Spatiotemporal dynamics of RhoA activity in migrating cells. Nature. 2006;440:1069–1072. doi: 10.1038/nature04665. [DOI] [PubMed] [Google Scholar]
- 5.Srivastava J, Barber DL, Jacobson MP. Intracellular pH sensors: design principles and functional significance. Physiology (Bethesda) 2007;22 IS:30–39. doi: 10.1152/physiol.00035.2006. [DOI] [PubMed] [Google Scholar]
- 6.Heintz N. BAC to the future: the use of bac transgenic mice for neuroscience research. Nat Rev Neurosci. 2001;2:861–870. doi: 10.1038/35104049. [DOI] [PubMed] [Google Scholar]
- 7.Parinov S, Kondrichin I, Korzh V, Emelyanov A. Tol2 transposon-mediated enhancer trap to identify developmentally regulated zebrafish genes in vivo. Dev Dyn. 2004;231:449–459. doi: 10.1002/dvdy.20157. [DOI] [PubMed] [Google Scholar]
- 8.Ellingsen S, Laplante MA, Konig M, Kikuta H, Furmanek T, et al. Large-scale enhancer detection in the zebrafish genome. Development. 2005;132:3799–3811. doi: 10.1242/dev.01951. [DOI] [PubMed] [Google Scholar]
- 9.Korzh V. Transposons as tools for enhancer trap screens in vertebrates. Genome Biol. 2007;(Suppl 1):S8. doi: 10.1186/gb-2007-8-s1-s8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Scott EK, Mason L, Arrenberg AB, Ziv L, Gosse NJ, et al. Targeting neural circuitry in zebrafish using GAL4 enhancer trapping. Nat Methods. 2007;4:323–326. doi: 10.1038/nmeth1033. [DOI] [PubMed] [Google Scholar]
- 11.Asakawa K, Suster ML, Mizusawa K, Nagayoshi S, Kotani T, et al. Genetic dissection of neural circuits by Tol2 transposon-mediated Gal4 gene and enhancer trapping in zebrafish. Proc Natl Acad Sci USA. 2008;105:1255–1260. doi: 10.1073/pnas.0704963105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA, et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature. 2006;444:499–502. doi: 10.1038/nature05295. [DOI] [PubMed] [Google Scholar]
- 13.Fisher S, Grice EA, Vinton RM, Bessling SL, Urasaki A, et al. Evaluating the biological relevance of putative enhancers using Tol2 transposon-mediated transgenesis in zebrafish. Nat Protoc. 2006;1:1297–1305. doi: 10.1038/nprot.2006.230. [DOI] [PubMed] [Google Scholar]
- 14.Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, et al. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 2005;3:e7. doi: 10.1371/journal.pbio.0030007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Dermitzakis ET, Reymond A, Lyle R, Scamuffa N, Ucla C, et al. Numerous potentially functional but non-genic conserved sequences on human chromosome 21. Nature. 2002;420:578–582. doi: 10.1038/nature01251. [DOI] [PubMed] [Google Scholar]
- 16.Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, et al. Ultraconserved elements in the human genome. Science. 2004;304:1321–1325. doi: 10.1126/science.1098119. [DOI] [PubMed] [Google Scholar]
- 17.Loots GG, Locksley RM, Blankespoor CM, Wang ZE, Miller W, et al. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science. 2000;288:136–140. doi: 10.1126/science.288.5463.136. [DOI] [PubMed] [Google Scholar]
- 18.Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. [DOI] [PubMed] [Google Scholar]
- 19.Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–562. doi: 10.1038/nature01262. [DOI] [PubMed] [Google Scholar]
- 20.International Chicken Genome Sequencing Consortium. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2004;432:695–716. doi: 10.1038/nature03154. [DOI] [PubMed] [Google Scholar]
- 21.Aparicio S, Chapman J, Stupka E, Putnam N, Chia J-M, et al. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science. 2002;297:1301–1310. doi: 10.1126/science.1072104. [DOI] [PubMed] [Google Scholar]
- 22.Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, et al. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 2003;13:721–731. doi: 10.1101/gr.926603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, et al. Human-mouse alignments with BLASTZ. Genome Res. 2003;13:103–107. doi: 10.1101/gr.809403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AFA, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004;14:708–715. doi: 10.1101/gr.1933104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Paten B, Herrero J, Beal K, Birney E. Sequence progressive alignment, a framework for practical large-scale probabilistic consistency alignment. Bioinformatics. 2009;25:295–301. doi: 10.1093/bioinformatics/btn630. [DOI] [PubMed] [Google Scholar]
- 26.Howard ML, Davidson EH. cis-Regulatory control circuits in development. Dev Biol. 2004;271:109–118. doi: 10.1016/j.ydbio.2004.03.031. [DOI] [PubMed] [Google Scholar]
- 27.Philippakis AA, He FS, Bulyk ML. Modulefinder: a tool for computational discovery of cis regulatory modules. Pac Symp Biocomput. 2005;VL-IS:519–530. [PMC free article] [PubMed] [Google Scholar]
- 28.Blanchette M, Bataille AR, Chen X, Poitras C, Laganiere J, et al. Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res. 2006;16:656–668. doi: 10.1101/gr.4866006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ferretti V, Poitras C, Bergeron D, Coulombe B, Robert F, et al. PReMod: a database of genome-wide mammalian cis-regulatory module predictions. Nucleic Acids Res. 2007;35:122–126. doi: 10.1093/nar/gkl879. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 2006;34:108–110. doi: 10.1093/nar/gkj143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Vlieghe D, Sandelin A, De Bleser PJ, Vleminckx K, Wasserman WW, et al. A new generation of JASPAR, the open-access repository for transcription factor binding site profiles. Nucleic Acids Res. 2006;34:95–97. doi: 10.1093/nar/gkj115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Ettwiller L, Paten B, Ramialison M, Birney E, Wittbrodt J. Trawler: de novo regulatory motif discovery pipeline for chromatin immunoprecipitation. Nat Methods. 2007;4:563–565. doi: 10.1038/nmeth1061. [DOI] [PubMed] [Google Scholar]
- 33.Haudry Y, Berube H, Letunic I, Weeber P-D, Gagneur J, et al. 4DXpress: a database for cross-species expression pattern comparisons. Nucleic Acids Res. 2008;36:847–853. doi: 10.1093/nar/gkm797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Sprague J, Bayraktaroglu L, Clements D, Conlin T, Fashena D, et al. The Zebrafish Information Network: the zebrafish model organism database. Nucleic Acids Res. 2006;34:581–585. doi: 10.1093/nar/gkj086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Grabher C, Wittbrodt J. Meganuclease and transposon mediated transgenesis in medaka. Genome Biol. 2007;8:Suppl 1:S8. doi: 10.1186/gb-2007-8-s1-s10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Monteilhet C, Perrin A, Thierry A, Colleaux L, Dujon B. Purification and characterization of the in vitro activity of I-Sce I, a novel and highly specific endonuclease encoded by a group I intron. Nucleic Acids Res. 1990;18:1407–1413. doi: 10.1093/nar/18.6.1407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Blechinger SR, Evans TG, Tang PT, Kuwada JY, Warren JT, et al. The heat-inducible zebrafish hsp70 gene is expressed during normal lens development under non-stress conditions. Mech Dev. 2002;112:213–215. doi: 10.1016/s0925-4773(01)00652-9. [DOI] [PubMed] [Google Scholar]
- 38.Smith B, Ashburner M, Rosse C, Bard J, Bug W, et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007;25:1251–1255. doi: 10.1038/nbt1346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Moens CB, Yan YL, Appel B, Force AG, Kimmel CB. valentino: a zebrafish gene required for normal hindbrain segmentation. Development. 1996;122:3981–3990. doi: 10.1242/dev.122.12.3981. [DOI] [PubMed] [Google Scholar]
- 40.Cooke J, Moens C, Roth L, Durbin L, Shiomi K, et al. Eph signalling functions downstream of Val to regulate cell sorting and boundary formation in the caudal hindbrain. Development. 2001;128:571–580. doi: 10.1242/dev.128.4.571. [DOI] [PubMed] [Google Scholar]
- 41.Rohrschneider MR, Elsen GE, Prince VE. Zebrafish Hoxb1a regulates multiple downstream genes including prickle1b. Dev Biol. 2007;309:358–372. doi: 10.1016/j.ydbio.2007.06.012. [DOI] [PubMed] [Google Scholar]
- 42.Hernandez RE, Rikhof HA, Bachmann R, Moens CB. vhnf1 integrates global RA patterning and local FGF signals to direct posterior hindbrain development in zebrafish. Development. 2004;131:4511–4520. doi: 10.1242/dev.01297. [DOI] [PubMed] [Google Scholar]
- 43.Nakai S, Kawano H, Yudate T, Nishi M, Kuno J, et al. The POU domain transcription factor Brn-2 is required for the determination of specific neuronal lineages in the hypothalamus of the mouse. Genes Dev. 1995;9:3109–3121. doi: 10.1101/gad.9.24.3109. [DOI] [PubMed] [Google Scholar]
- 44.Blow MJ, McCulley DJ, Li Z, Zhang T, Akiyama JA, et al. ChIP-Seq identification of weakly conserved heart enhancers. Nat Genet. 2010;42:806–810. doi: 10.1038/ng.650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Flicek P, Aken BL, Beal K, Ballester B, Caccamo M, et al. Ensembl 2008. Nucleic Acids Res. 2008;36:707–714. doi: 10.1093/nar/gkm988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Kasahara M, Naruse K, Sasaki S, Nakatani Y, Qu W, et al. The medaka draft genome and insights into vertebrate genome evolution. Nature. 2007;447:714–719. doi: 10.1038/nature05846. [DOI] [PubMed] [Google Scholar]
- 47.Jaillon O, Aury J-M, Brunet F, Petit J-L, Stange-Thomann N, et al. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004;431:946–957. doi: 10.1038/nature03025. [DOI] [PubMed] [Google Scholar]
- 48.Kasprzyk A, Keefe D, Smedley D, London D, Spooner W, et al. EnsMart: a generic system for fast and flexible access to biological data. Genome Res. 2004;14:160–169. doi: 10.1101/gr.1645104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Rembold M, Lahiri K, Foulkes NS, Wittbrodt J. Transgenesis in fish: efficient selection of transgenic fish by co-injection with a fluorescent reporter construct. Nat Protoc. 2006;1:1133–1139. doi: 10.1038/nprot.2006.165. [DOI] [PubMed] [Google Scholar]
- 50.Iwamatsu T. Stages of normal development in the medaka Oryzias latipes. Mech Dev. 2004;121:605–618. doi: 10.1016/j.mod.2004.03.012. [DOI] [PubMed] [Google Scholar]
- 51.Loosli F, Winkler S, Burgtorf C, Wurmbach E, Ansorge W, et al. Medaka eyeless is the key factor linking retinal determination and eye growth. Development. 2001;128:4035–4044. doi: 10.1242/dev.128.20.4035. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.