Whole genome duplication in teleost fish reveals that a few changes in non-regulatory genomic sequences are a source for generating new enhancers.
Abstract
Evolutionary innovation relies partially on changes in gene regulation. While a growing body of evidence demonstrates that such innovation is generated by functional changes or translocation of regulatory elements via mobile genetic elements, the de novo generation of enhancers from non-regulatory/non-mobile sequences has, to our knowledge, not previously been demonstrated. Here we show evidence for the de novo genesis of enhancers in vertebrates. For this, we took advantage of the massive gene loss following the last whole genome duplication in teleosts to systematically identify regions that have lost their coding capacity but retain sequence conservation with mammals. We found that these regions show enhancer activity while the orthologous coding regions have no regulatory activity. These results demonstrate that these enhancers have been de novo generated in fish. By revealing that minor changes in non-regulatory sequences are sufficient to generate new enhancers, our study highlights an important playground for creating new regulatory variability and evolutionary innovation.
Author Summary
The genome of each living organism contains thousands of genes, and the precise control of the timing and location of expression of these genes is key for normal development and homeostasis of each individual. Despite the oftentimes high genetic similarity between organisms, the source of phenotypic differences, for example between human and mouse, is thought to originate mainly from changes in how and when genes are expressed. This is partially determined by enhancers, that contribute to the control of gene expression. For decades, duplication of existing genomic enhancers, mobile elements, and changes in the sequence of existing enhancers were believed to be the major ways of increasing the number and modifying the activity of enhancers. In this study, we show that enhancers don't have to be derived from pre-existing ones but can also appear de novo in regions of the genome that were previously not regulating gene expression. We analyzed teleost fish genomes and found three regions for which a limited number of changes in the DNA sequence was sufficient to generate new enhancers. We predict that such a process is frequent in vertebrate genomes, making de novo generation of enhancers an important mechanism for creating variation in gene expression.
Introduction
The question of the evolutionary origin and modification of enhancer elements is central for understanding the dynamics of gene expression [1]–[3]. A growing body of evidence points out that new enhancers evolve from existing ones via duplication. According to the classic model of evolution by duplication as put forward by Ohno [4], the duplicated copies are used as starting material for variation in the binding site composition, which modifies the respective enhancer's activity [5]–[10]. Mobile genetic elements have also been shown to have regulatory activity [11],[12] or bear transcription factor binding sites (TFBSs) [13], and thus, their translocation can be associated with changes in gene expression.
While the modification/translocation of those pre-existing elements has been shown to play an important functional role, they may only contribute to a fraction of the regulatory innovation. Indeed, recent findings using large-scale comparative analysis of regulatory features have shown that single binding sites can vary extensively between closely related species [14] or even between individuals of the same species [15]. Further supporting the flexibility of regulatory elements, tissue-specific enhancers such as heart enhancers have been shown to be poorly conserved [16] and examples of lineage/specie-specific enhancers have been described [17],[18]. Recently it has been reported that the genomic positions of tissue-specific enhancers of the yellow gene differ between Drosophila species [19].
Taken together, these results are suggesting that complete autonomous enhancer elements containing all the necessary binding sites in the correct arrangement can be lineage specific. Nevertheless it is currently unclear whether these apparent lineage-specific enhancers appear de novo or are derived from pre-existing enhancers whose sequences have diverged too much to be identifiable. In order to show the de novo nature of these lineage-specific enhancers, a strategy to identify the orthologous regions and test them for enhancer activity is needed.
In this report we identify de novo enhancers by searching for special cases that we refer to as “Recycled Regions” (RRs). An RR is a region with enhancer function in one lineage that remains identifiable in another lineage due to sequence constraints imposed by a different kind of function. These scenarios are likely to be very rare in stable genomes. Thus, we took advantage of the most recent Whole Genome Duplication (WGD) in teleosts [20] followed by a massive loss of the duplicated coding genes. It is estimated that 75% of the duplicated genes lost one copy [20]. Initially, while one of the duplicated copies remained a coding gene, the other copy lost its coding function and accumulated nucleotide changes. In rare cases, the sequence from the non-coding copy became constrained if a regulatory function arose de novo. Those regulatory sequences are alignable to their coding orthologs if the selection for the new function took place soon enough. Hence we used the ancestral coding function as an evolutionary trap to identify orthologous sequences of the enhancer across lineages (mammalian, cartilaginous fish, and teleost) and assessed whether these enhancers are generated de novo in the teleost lineage (Figure 1A).
Results
Identification of the Recycled Regions
We developed an algorithm to systematically search for the RRs in teleost fish genomes that satisfy the corresponding criteria (Figure 1B): (1) are located in the locus corresponding to the lost copy of a duplicated gene; (2) despite no evidence for the coding function, are conserved with part of the human coding ortholog; and (3) as experimental validation is performed during embryogenesis, we selected those RRs flanked by at least one gene annotated to be involved in development (Figure S1 and Materials and Methods, Computational Pipeline). The algorithm was first run on the stickleback (Gasterosteus aculeatus) genome because of the high quality of the gene annotation and assembly, and later the results were transferred to the Oryzias latipes (medaka) genome. Our analysis identified four RRs (Figure 1C, Table S1, and Table S2) as putative de novo regulatory regions satisfying the above criteria. Those RRs are conserved across teleosts including Danio rerio (zebrafish), suggesting that they appear after the WGD but before the Cypriniformes-Euteleostei split.
The Recycled Regions Show Enhancer Activity
We investigated the enhancer activity of the four medaka RRs (Figure 1C and Table S1) using an in vivo reporter assay in medaka that we previously developed [21]. We cloned the four RRs extended with a maximum of 200 bp flanking sequences upstream of an hsp70 minimal promoter and a reporter gene (gfp). The basal expression of the hsp70 minimal promoter in the lens [22] was used as injection control. We found that all four regions tested drive reporter gene expression in specific structures in the medaka embryo (Figure 2A–D). The assay is highly reproducible, resulting in a consistent expression pattern in a large fraction of embryos (Table S3). The onset of reporter gene expression depends on the nature of the RR and varies from developmental stage 20 (fam44bRR) to stage 32 (dock9RR) and is in all cases maintained in juvenile (Figure S2) and adult fish (unpublished data). Moreover, the specific expression pattern observed in injected embryos (Table S3) is retained in stable lines. In line with our hypothesis, these results show enhancer activity for all four RR reporter constructs. We further addressed the contribution of the four RRs to the observed enhancer activity by deleting the orthologous regions corresponding to the exon, leaving only the flanking regions from the reporter constructs (Figure S3A–D). In two cases, the deletion constructs completely abolished reporter gene expression (Figure S3E–F). For ccdc46RR, the deletion altered and massively reduced the reporter gene expression to a few cells in the hindbrain (Figure S3G). Only for fam44bRR did the deletion construct not abolish the original enhancer activity of the full construct (Figure S3H) and therefore fam44bRR was excluded from further analysis. These results demonstrate that three out of four RRs are necessary for enhancer activity.
Recycled Regions Recapitulate Part of the Flanking Gene Expression Patterns
We next investigated whether the enhancer activity of the remaining three RRs recapitulates aspects of the expression pattern of flanking genes. For this, we analysed the in situ expression pattern of those genes. We found that in all cases RR-driven reporter gene expression temporally and spatially resembles the expression of at least one of the respective flanking genes (Figure S4). To further confirm this, we performed double fluorescent whole mount in situ hybridisation on stable transgenic lines by combining probes for the reporter and the flanking genes. In all cases, we identified at least one flanking gene that recapitulates key aspects of the expression pattern of the RR-driven reporter gene (Figure 3). In particular, both ttc29RR-driven GFP (Figure 3B) and the flanking gene pou4f2 (Figure 3A) are expressed in the optic tectum and retina (Figure 3C). dock9RR shows very specific enhancer activity in the cerebellum (Figure 3E,H) as do the neighbouring genes zic5 and zic2 (Figure 3D,G), which exhibit an expression pattern that includes the cerebellum (Figure 3F,I). Finally, ccdc46RR shows activity in the forebrain (Figure 3K), recapitulating part of the expression pattern of its flanking gene axin2 (1 of 2) (Figure 3J,L). All putative target genes have been reported to play important roles in developmental processes: Zic2 and 5 are zinc finger proteins of the cerebellum, and mutations in the zic2 gene have been reported to cause holoprosencephaly [23]. Axin2, an Axin-related protein, has been shown to play an important role in the regulation of β-catenin stability in the Wnt signalling pathway [24], and Pou4f2, better known as Brn3b, is a member of the POU-domain family of transcription factors and is a key regulator for axon outgrowth and pathfinding in projection neurons [25]. Our results demonstrate that the RRs exhibit enhancer activity that recapitulates multiple aspects of the expression of neighbouring genes. Our results further suggest that the identified RRs contribute to the transcriptional regulation of genes that are key players in embryonic development.
Two possible evolutionary scenarios may account for our results obtained so far: (1) the ancestral function was both regulatory and coding or (2) the ancestral vertebrate sequence was coding but the teleosts have lost that function in one of the duplicated copies and acquired regulatory function instead (which supports the de novo enhancer hypothesis). For the former scenario, dual functions on the same region have been hypothesised [26] and shown for several cases [27]–[32] while the latter scenario has not been shown so far. To shed light on the ancestral state of the RRs, we investigated the RRs in lineages that diverged prior to the last WGD in teleosts.
Orthologous Regions in Non-Teleost Lineages Show No Enhancer Activity
In species that have diverged prior to the teleost-tetrapod split (e.g., elephant shark (Callorhinchus milii) or ciona (Ciona savignyi)) the sequences corresponding to the three RRs showed an open reading frame (ORF) spanning the coding exon that is in frame with the human ORF (Figure S5). For both TTC29 and CCDC46 we also found EST evidence in the ciona lineage (Table S2). These results show that the RRs ancestral sequences were very likely to have been coding at the split of the teleost-tetrapod lineages.
We next investigated the evolutionary dynamics of these regions by analysing the similarity between the human coding exon and the orthologous regions in various lineages at both the amino-acid (AA) and nucleotide level. We found that the percentage identity at the nucleotide level is higher for the fish RRs, while the similarity at the AA level is higher for all other lineages, including the fish coding paralog (Figure S6). Consistent with the alignment similarities, the ratio of non-synonymous compared to synonymous base pair changes (Ka/Ks) [33] is increased for the RRs compared to the coding homologs (see Materials and Methods and Figure S6). In accordance with the results obtained so far, these data further support the hypothesis that (1) the RRs were ancestrally coding and (2) the fish RRs are under a selection acting at the nucleotide rather than at the AA level. These data suggest that the RRs were ancestrally not regulatory since the Ka/Ks ratio between human and shark or ciona would favour a selection acting at the AA level only.
To test the nature (regulatory or non-regulatory) of the ancestral state at the tetrapod-teleost split, we further explored the enhancer activity of the exons homologous to the RRs in two independent lineages (mouse and elephant shark) as well as the coding paralog in fish (Figure 4).
In none of the cases tested was an enhancer activity detectable (Figure 4 and Table S3). As the exon orthologous to the RRs was tested in the Medaka embryo, the absence of activity could be due to trans-regulatory changes [34]. To rule out this hypothesis, the mouse exons orthologous to the RRs were tested directly in mouse. Again, in none of the cases tested was an enhancer activity detectable (Figure 4 and Material and Methods), confirming that the mouse exons orthologous to the RRs have no enhancer activity (at the time point assayed).
The results obtained so far provide convincing evidence that the enhancer function in teleosts was de novo acquired in this lineage. As most of the de novo genesis of enhancers is expected to occur in “neutrally” evolving sequences, these cases of de novo enhancers deriving from cooption may constitute a very small subset of all possible de novo enhancers.
We roughly estimate at several thousands the number of de novo enhancers under positive selection since the tetrapod-teleost split (450 mya [35], see Text S1 and Figure S7 for a more detailed analysis of the estimation of the number of de novo enhancers). Considering that those de novo elements under purifying selection may constitute only a tiny fraction of all possible regulatory elements generated, the rate of genesis of new enhancers (regardless of their evolutionary fate) may be very high in vertebrate genomes. While this estimation of the number of de novo enhancers is only tentative and based on a number of assumptions (see Text S1), a more accurate prediction of the de novo enhancers across various phylogenetic branches of vertebrates will require further studies. Nonetheless, these results highlight the importance of the genesis of enhancers and provide one possible explanation amongst others of the widespread observation that a large fraction of TFBSs appears non-conserved [36]. Nonetheless, those TFBSs forming de novo enhancers may represent only a fraction of all the apparent lineage-specific binding sites found by genome-wide chromatin immunoprecipitation experiments.
In an attempt to predict what the possible TFBS involved in the generation of the de novo enhancers are, we further investigated at the sequence level the difference in terms of putative TFBSs between the RRs and the exons (Materials and Methods). We found from five to seven binding sites in the medaka RRs that are specific to teleosts and are not present in other vertebrate species nor in the predicted ancestral reconstruction (Figure S8). Interestingly, dock9RR in medaka (with enhancer activity in the cerebellum) has a new binding site for Pax2, a transcription factor known to be involved in cerebellum development [37].
Function of the De Novo Enhancers in Gene Regulation
These de novo enhancers may either confer additional domains of expression to their target genes or rather act as redundant enhancers. To tackle the functional consequences of the de novo enhancers, we took advantage of a conserved block flanking the ccdc46RR homologous exon previously shown to be bound by p300 in mouse forebrain (Figure S9, orange bar, upper panel) [38]. We tested the mouse extended region containing both the p300 pulldown region and the extended exonic sequence (Figure S9, light green bar, upper panel) and detected enhancer activity in the medaka forebrain (Figure S9A). This activity was not altered when deleting the exonic sequence (Figure S9, blue bar, upper panel and Figure S9B), demonstrating that the exon itself is not required for enhancer function (see also Figure 4). Similarly, the shark and medaka sequences (Figure S9, orange bar, lower panel) orthologous to the mouse p300-bound enhancer also show forebrain activity (Figure S9C–D). These results demonstrate that the p300-bound enhancer element is an ancestral feature and suggest that the nearby ccdc46RR de novo enhancer in fish has complementary function to reinforce the forebrain expression rather than creating a new expression domain. Similarly dock9RR is active in the medaka cerebellum, while the mouse zic2 and 5 genes are also expressed in this structure [39].
While those de novo enhancers may still quantitatively modify the transcript level within the cell or activate transcription in related cell types within the same domains, these results favour the hypothesis of redundant enhancer. This hypothesis is supported by the recent finding that redundant enhancers confer phenotypic robustness [40],[41] and thus are likely to be selected for.
Similar to TFBS turnover by the de novo emergence of new binding sites [42], complete enhancers may also be turned over, leading to the disappearance of the ancestral element.
Discussion
It has long been thought that new functions emerge primarily by duplication and/or modification of existing functional elements [43]. On the gene level, this view has begun to change with the recent publication of several studies reporting the de novo origin of genes in yeast [44], drosophila [45], and human [46]. In this study we show that not only genes but also enhancers can be de novo generated.
De novo genesis of enhancers raises the question of how evolution can produce such complex functional elements. Indeed, enhancers were generally believed to have a stringent regulatory code, and thus the odds for generating a de novo enhancer were believed to be low. Recent studies have already started challenging that view by pointing either to the flexibility of this code [18],[47] or the rapid turnover of binding sites [14],[15],[42]. It is possible that the appearance of new binding sites can not only modify pre-existing enhancer but also lead to the creation of completely new autonomous enhancers.
This work further shows the relative “facility” of conferring regulatory activities to non-regulatory sequences. Consequently, the birth of regulatory elements is a highly dynamic property of vertebrate genomes and should also be considered as an evolutionary toolkit for innovation. The results of this study have significant implications, notably in the gene regulation and medical genetic fields by pointing out that genomic variation could lead to the generation of enhancers in regions with no apparent regulatory function. As such variation may also lead to altered gene expression, more attention should be devoted to variation in so-called “neutral” DNA.
Materials and Methods
Computational Pipeline
Summary of the computational pipeline
In order to find RRs we undertook a conservative analysis of the stickleback non-coding genomic sequences mapping to the human exome. For this, a total of 282,599 human annotated exons were mapped to the stickleback genome using BLASTZ. BLASTZ is a sensitive alignment tool suited for non-coding genomic sequences. In order to retain only the stickleback non-coding regions, hits matching even partially an annotated exon in stickleback were removed. To identify putative RRs we took advantage of the last WGD in teleosts followed by the massive loss of the duplicated genes. Only hits in the syntenic loci between human and stickleback were further processed. From the WGD, two such syntenic loci per human locus can be found in fish (one locus contains the functional protein, while the other locus has lost the gene). Thus we restrict the search to only hits containing stop codon(s) disrupting the ORF and found in the locus of the lost gene. Such hits are good candidates for having acquired a de novo enhancer function controlling nearby genes. As experimental validation is performed during embryogenesis, we further selected those hits flanked by at least one gene annotated to be involved in development (Figure 1B and Figure S1).
We identified four BLASTZ hits on the stickleback genome as putative RR candidates and transferred the hits to the medaka genome (Figure 1C and Table S1) for experimental validation.
Human exons
The repeat-masked DNA sequences of a total of 282,599 human annotated exons (length >19 bp for BLASTZ) were retrieved from EnsEMBL v. 49 [48].
Alignment with stickleback (Gasterosteus aculeatus)
DNA sequences corresponding to the human (Homo sapiens) exons were matched to the repeat-masked stickleback genome (EnsEMBL v. 49) using BLASTZ (default parameters, score above 2,900) [49]. A total of 145,095 human exons (51%) have at least one BLASTZ hit on the stickleback genome. This number corresponds to 24,214 human genes. The average BLASTZ score is 5,220. The average number of hits on the stickleback genome is 7.3 hits per conserved exon. For each exon, hits on the stickleback genome within 1 kb from each other are considered to be part of the same regulatory unit and were therefore fused. To deplete the dataset from un-annotated genes or exons, only exons from human genes with at least one annotated ortholog in stickleback were further considered. Any hits within 2 Mb of the stickleback ortholog(s) locus were removed. Alignments matching even partially an annotated exon or EST in stickleback or any other sequenced teleosts (EnsEMBL gene annotation) were also removed.
Synteny assessment
All the neighbouring developmental genes (see section below) within a 300 kb window upstream and downstream of the human exon were retrieved. Next, the positions of the corresponding orthologs in stickleback were compared with all the positions of the BLASTZ hits. If one hit is less that 100 kb away from the identified orthologs and no more than five genes are located in between, the hit is retained. To remove false positives due to un-annotated genes, if more than one hit per gene is found within a window of 300 kb, all the hits are discarded.
GO filtering
We define developmental genes as genes with the following GO annotation: GO:0045165 (cell fate commitment), GO:0032502 (developmental process), GO:0030528 (transcription regulator activity), and GO:0003700 (transcription factor activity) as well as the descendant annotations as defined by the Open Biomedical Ontologies (version 1.2) [50].
Assessment of reading frames
The nucleotide regions on the stickleback genome corresponding to the BLASTZ hits were aligned to the corresponding human exon using BLASTx. If the resulting alignment span of the entire stickleback region within one frame and no stop codon can be found, the region is discarded.
Bioinformatic Analysis of the Candidate RR
Assessment of reading frames
Using the human exon coordinates (Table S1), we retrieved the 46-way multiz hg19 alignments for mouse (Mus musculus), chicken (Gallus gallus), and xenopus (Xenopus tropicalis). Missing sequences (medaka (Oryzias latipes), ciona (Ciona savignyi)) were retrieved using EnsEMBL v.49, and the orthologous sequences from elephant shark (Callorhinchus milii) were retrieved using the homepage of the elephant shark genome project (http://esharkgenome.imcb.a-star.edu.sg) [51]. If no orthologous exon was annotated, tBLASTn was used to retrieve the corresponding regions. The sequences were translated in the reading frame corresponding to the human exon, and an alignment of the orthologous AA sequences was performed (CLUSTALW). For DOCK9 the 5′UTR was removed in all species analysed. The human TTC29 exon extends over two exons in the ciona lineage; thus the coding sequence of both ciona exons was fused to do the translation. In medaka, no ttc29 gene could be found.
Multiple alignments, percentage identity/similarity, and Ka/Ks
Sequences were retrieved as described above. The sequences missing from the multiz alignments were added subsequently by global alignment (cost matrix 65% similarity (5.0/−4.0), gap open/extension penalty: 12/3). The percentage identity/similarity to the human exon sequence was calculated using the alignments from pairwise BLASTn (for the nucleotide identity, default parameters) and tBLASTx (for the AA similarity, word size parameter set to 2). Percentages were calculated using the alignable length of the human exon as reference. The Ka/Ks ratio [33] was calculated using the alignable length of the human exon as reference sequence. Because the RRs contain elements that disrupt the ORF (see Assessment of Reading Frames), indels and stop codons were removed prior to calculating the Ka/Ks. Calculations were done using the PAML package included in the PAL2NAL tool of the Bork-Group at EMBL (http://www.bork.embl.de/pal2nal/#RunP2N) [52].
Ancestral reconstruction and TFBS composition
Using the human exon coordinates (Table S1), we retrieved the 46-way multiz hg19 alignments. Missing sequences were manually added to the alignment as described above. From this alignment, the predicted ancestral sequence at the root of the bony vertebrates was reconstructed using the Prequel package (default parameters) [53]. Next, we searched for TFBSs in the medaka RR sequences using the Jaspar database (restricting to the Jaspar core vertebrata, 80% relative profile score threshold) [54] and kept only those binding sites that are specific to the teleosts and absent from all the other vertebrate sequences, including the predicted ancestral reconstruction.
Experimental Methods
Medaka stocks
Medaka (Oryzias latipes) strains CAB and Heino were kept in closed stocks at EMBL Heidelberg and University of Heidelberg as described [55]. In short, fish were maintained in a constant recirculating system at 28°C on a 14 h light/10 h dark cycle. Pairwise mating was performed and collected embryos were kept at room temperature until hatched.
Cloning of candidates and enhancer assay
Chromosomal coordinates and species (assembly) of all cloned and tested fragments are listed in Table S3. Genomic candidate regions (extended to a maximum of 200 bp on each side) tested in the enhancer assay were amplified from genomic DNA of medaka, inbred CAB strain (extraction described in [56]), mouse (C57BL/6 strain, kind gift from F. Spitz), and elephant shark (Callorhinchus milii, kind gift from B. Venkatesh) using standard PCR methods. For the dock9 mouse and shark exon constructs, only the exon and 200 bp downstream sequence could be cloned. The 200 bp upstream sequence corresponds to a repeat and could not be amplified. The deletion-constructs were generated by applying a PCR-driven “splicing by overlap extension” approach [57]. For the deletion constructs, in all reporter gene constructs the sequence corresponding to the human exon (the putative RR) was spliced and the flanking genomic sequences were fused. Coordinates of the fused fragments are given in Table S3.
The enhancer assay was performed as described in detail in [21]. In short, genomic sequences were cloned into a transgenesis-vector upstream of a zebrafish hsp70 minimal promoter and GFP reporter gene flanked by I-SceI Meganuclease sites using standard cloning techniques [58]. The constructs were sequenced in order to verify the sequence and the orientation of the cloned regions. Deletions and orthologous constructs were cloned in the same orientation relative to the reporter gene compared to the RR constructs. Meganuclease-mediated transgenesis by injection into one-cell stage medaka embryos (heino or cab strains) was performed as described in [59]. The hsp70 core promoter triggers a strong and specific lens expression from stage 28 on [22], and this feature is used to calculate the percentage of specific expression (Table S3). Stable transgenic lines for all positive enhancer constructs were obtained. Images of transient/stable transgenic embryos were taken using an Olympus MVX10 fluorescence microscope with a Leica DC500 camera or a Leica SPE confocal microscope (10× dipping lens). Images were assembled and processed using ImageJ and Adobe Photoshop. All confocal images displayed are Z-projections of stacks.
Mouse transgenic enhancer assay
The mouse sequences orthologous to the RR were cloned upstream of the human β-globin minimal promoter-driven LacZ reporter gene [60]. The constructs were sequenced in order to verify the sequence and the orientation of the cloned regions. The sequences were cloned in the same orientation relative to the reporter gene compared to the RR constructs. Chromosomal coordinates of the cloned and tested mouse fragments are listed in Table S3 (column 7). To evaluate what embryonic developmental stage to test for enhancer activity, we analyzed the expression pattern of the predicted target gene and compared those patterns with the enhancer activity of the RRs: For the ttc29 locus, the medaka enhancer is active in the retina and optic rectum. The putative target gene for this enhancer is Pou4f2. The mouse Pou4f2 is expressed in the hindbrain and retina from E10.5 to after birth [61]. We therefore assayed at embryonic stage E12.5. For the dock9 locus, the medaka enhancer is active in the cerebellum. The putative target genes for this enhancer are Zic2 and Zic5. The mouse Zic2 and Zic5 are expressed in the hindbrain from stage E10.5 to after birth [61]. We therefore looked at embryonic stage E12.5. For the ccdc46RR, the medaka enhancer is active in the forebrain. The putative target gene for this enhancer is axin2. The mouse Axin2 is expressed in the telencephalon at stage 14.5 [61]. We therefore looked at embryonic stage E14.5.
Generation of transgenic mice and embryo staining were carried out by Cyagen (Cyagen Bioscience Inc.). The dock9-exon-mouse construct resulted in eight transgenic embryos with two lacZ positive embryos in inconsistent embryonic domains. The ccdc46-exon-mouse construct resulted in 11 transgenic embryos with only one lacZ positive embryo. The ttc29-exon-mouse construct resulted in six transgenic embryos with no lacZ positive embryo.
Whole-mount in situ hybridization and double-fluorescent whole-mount in situ hybridization
Whole mount in situ hybridization using digoxigenin labelled antisense RNA probes followed by NBT/BCIP colour detection was performed as previously described [62]. Template cDNA clones were obtained from the medaka full-length cDNA expression library of the Wittbrodt group [62]. The following clones were used to generate the labelled riboprobes: FOE002-P00099-DPE-F_B12 (pou4f2, genomic location chr1:22399713-22401052), FOE002-P00076-DPE-F_B12 (zic2, genomic location chr21:9245812-9248089), FOE002-P00108-DPE-F_N08 (zic5, genomic location chr21:9252145-9254638), and FOE002-P00056-DPE-F_H05 (axin2 (1 of 2), genomic location chr1:4554420-4565844). For genes without a clone in the library, template fragments for in vitro transcription were directly amplified from generated cDNA and cloned into a pTOPO vector (Invitrogen). Total RNA was extracted from 5-d-old embryos using TRIZOL (Invitrogen), and reverse transcription was performed using the Superscript III enzyme (Invitrogen). The following primers were used to amplify cDNA fragments of the genes ednra (1 of 2) (fwd: TACAGGGCTGTAGCATCTTGGAGCAG, rev: CGTGTTGACGTTGTTGGGTTCTGG), clybl (fwd: GGTAGAAGAGCTCGCAGATGTCTATATG, rev: CTGGCGCAGAAGTCGTCTGAGCC), and frmpd1 (fwd: ACAGAGAATCCACTCTCCACGTCTACG, rev: TTGGATTTTGTGCTCTGCAGGGATG). In vitro transcription to generate antisense riboprobes was performed using sp6, T3, and T7 RNA polymerases (Roche). Images of whole-mount in situ hybridizations were taken using a Zeiss Axiophot Microscope with a Leica DC500 camera.
Double fluorescent in situ hybridization using digoxygenin-labelled probes against the candidate gene (see above) and a fluorescein-labelled antisense RNA probe generated against GFP were performed as described in [62]. The probes were visualized using Fast Red staining (Roche) and the TSA-Kit (PerkinElmer) as in [62].
Imaging of double-fluorescent whole-mount in situ hybridizations was done using a Leica SPE confocal mircroscope with a 10× dipping lens. Images were assembled and processed using ImageJ and Adobe Photoshop. All confocal images displayed are Z-projections of stacks. Brightness and contrast were adjusted uniformly across the entire image.
Supporting Information
Acknowledgments
We would like to thank Jochen Wittbrodt for in-depth discussion, advice, useful suggestions, and comments on the manuscript. We would like to thank François Spitz for discussion and comments on the manuscript and providing the mouse transgenesis vector. We would like to also thank Lazaro Centanin, Thomas Auer, Thomas Sandmann, and Heiko Runz for comments on the manuscript and Emma Tallack for proofreading. Thanks to Aidan Budd for suggestions and comments. We would like to thank Byrappa Venkatesh for Elephant shark genomic DNA; Vincent Laudet, Kiyoshi Naruse, and Rudolf Wicker for genomic DNA; and Franziska Gruhl for helping with the Medaka injections. Finally, we would like to thank the fish room team.
Abbreviations
- AA
amino acids
- GFP
green fluorescent protein
- MYA
million years ago
- RR
recycled region
- TFBS
transcription factor binding sites
- WGD
whole genome duplication
Footnotes
The authors have declared that no competing interests exist.
European Community's Seventh Framework Programme (FP7/ 2007-2013) under the grant CISSTEM, DFG-SFB488. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Wray G. A. The evolutionary significance of cis-regulatory mutations. Nat Rev Genet. 2007;8:206–216. doi: 10.1038/nrg2063. [DOI] [PubMed] [Google Scholar]
- 2.Carroll S. B. Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell. 2008;134:25–36. doi: 10.1016/j.cell.2008.06.030. [DOI] [PubMed] [Google Scholar]
- 3.Chan Y. F, Marks M. E, Jones F. C, Villarreal G. J, Shapiro M. D, et al. Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer. Science. 2010;327:302–305. doi: 10.1126/science.1182213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ohno S. ED Bittar N. B, editor. The role of gene duplication in vertebrate evolution. The biological basis of medicine. 1969. pp. 109–132.
- 5.Gompel N, Prud'homme B, Wittkopp P. J, Kassner V. A, Carroll S. B. Chance caught on the wing: cis-regulatory evolution and the origin of pigment patterns in Drosophila. Nature. 2005;433:481–487. doi: 10.1038/nature03235. [DOI] [PubMed] [Google Scholar]
- 6.Prud'homme B, Gompel N, Carroll S. B. Emerging principles of regulatory evolution. Proc Natl Acad Sci U S A. 2007;104(Suppl 1):8605–8612. doi: 10.1073/pnas.0700488104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Prabhakar S, Visel A, Akiyama J. A, Shoukry M, Lewis K. D, et al. Human-specific gain of function in a developmental enhancer. Science. 2008;321:1346–1350. doi: 10.1126/science.1159974. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Rebeiz M, Pool J. E, Kassner V. A, Aquadro C. F, Carroll S. B. Stepwise modification of a modular enhancer underlies adaptation in a Drosophila population. Science. 2009;326:1663–1667. doi: 10.1126/science.1178357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Frankel N, Erezyilmaz D. F, McGregor A. P, Wang S, Payre F, et al. Morphological evolution caused by many subtle-effect substitutions in regulatory DNA. Nature. 2011;474:598–603. doi: 10.1038/nature10200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Rebeiz M, Jikomes N, Kassner V. A, Carroll S. B. Evolutionary origin of a novel gene expression pattern through co-option of the latent activities of existing regulatory sequences. Proc Natl Acad Sci U S A. 2011;108:10036–10043. doi: 10.1073/pnas.1105937108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Bejerano G, Lowe C. B, Ahituv N, King B, Siepel A, et al. A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature. 2006;441:87–90. doi: 10.1038/nature04696. [DOI] [PubMed] [Google Scholar]
- 12.Sasaki T, Nishihara H, Hirakawa M, Fujimura K, Tanaka M, et al. Possible involvement of SINEs in mammalian-specific brain formation. Proc Natl Acad Sci U S A. 2008;105:4220–4225. doi: 10.1073/pnas.0709398105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bourque G, Leong B, Vega V. B, Chen X, Lee Y. L, et al. Evolution of the mammalian transcription factor binding repertoire via transposable elements. Genome Res. 2008;18:1752–1762. doi: 10.1101/gr.080663.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Schmidt D, Wilson M. D, Ballester B, Schwalie P. C, Brown G. D, et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science. 2010;328:1036–1040. doi: 10.1126/science.1186176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kasowski M, Grubert F, Heffelfinger C, Hariharan M, Asabere A, et al. Variation in transcription factor binding among humans. Science. 2010;328:232–235. doi: 10.1126/science.1183621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Blow M. J, McCulley D. J, Li Z, Zhang T, Akiyama J. A, et al. ChIP-Seq identification of weakly conserved heart enhancers. Nat Genet. 2010;42:806–810. doi: 10.1038/ng.650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Romano L. A, Wray G. A. Conservation of Endo16 expression in sea urchins despite evolutionary divergence in both cis and trans-acting components of transcriptional regulation. Development. 2003;130:4187–4199. doi: 10.1242/dev.00611. [DOI] [PubMed] [Google Scholar]
- 18.Brown C. D, Johnson D. S, Sidow A. Functional architecture and evolution of transcriptional elements that drive gene coexpression. Science. 2007;317:1557–1560. doi: 10.1126/science.1145893. [DOI] [PubMed] [Google Scholar]
- 19.Kalay G, Wittkopp P. J. Nomadic enhancers: tissue-specific cis-regulatory elements of yellow have divergent genomic positions among Drosophila species. PLoS Genet. 2010;6:e1001222. doi: 10.1371/journal.pgen.1001222. doi: 10.1371/journal.pgen.1001222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Jaillon O, Aury J-M, Brunet F, Petit J-L, Stange-Thomann N, et al. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004;431:946–957. doi: 10.1038/nature03025. [DOI] [PubMed] [Google Scholar]
- 21.Mongin E, Auer T. O, Bourrat F, Gruhl F, Dewar K, et al. Combining computational prediction of cis-regulatory elements with a new enhancer assay to efficiently label neuronal structures in the medaka fish. PLoS One. 2011;6:e19747. doi: 10.1371/journal.pone.0019747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Blechinger S. R, Evans T. G, Tang P. T, Kuwada J. Y, Warren J. T. J, et al. The heat-inducible zebrafish hsp70 gene is expressed during normal lens development under non-stress conditions. Mech Dev. 2002;112:213–215. doi: 10.1016/s0925-4773(01)00652-9. [DOI] [PubMed] [Google Scholar]
- 23.Brown S. A, Warburton D, Brown L. Y, Yu C. Y, Roeder E. R, et al. Holoprosencephaly due to mutations in ZIC2, a homologue of Drosophila odd-paired. Nat Genet. 1998;20:180–183. doi: 10.1038/2484. [DOI] [PubMed] [Google Scholar]
- 24.Behrens J, Jerchow B. A, Wurtele M, Grimm J, Asbrand C, et al. Functional interaction of an axin homolog, conductin, with beta-catenin, APC, and GSK3beta. Science. 1998;280:596–599. doi: 10.1126/science.280.5363.596. [DOI] [PubMed] [Google Scholar]
- 25.Wang S. W, Gan L, Martin S. E, Klein W. H. Abnormal polarization and axon outgrowth in retinal ganglion cells lacking the POU-domain transcription factor Brn-3b. Mol Cell Neurosci. 2000;16:141–156. doi: 10.1006/mcne.2000.0860. [DOI] [PubMed] [Google Scholar]
- 26.Dong X, Navratilova P, Fredman D, Drivenes O, Becker T. S, et al. Exonic remnants of whole-genome duplication reveal cis-regulatory function of coding exons. Nucleic Acids Res. 2010;38:1071–1085. doi: 10.1093/nar/gkp1124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Neznanov N, Umezawa A, Oshima R. G. A regulatory element within a coding exon modulates keratin 18 gene expression in transgenic mice. J Biol Chem. 1997;272:27549–27557. doi: 10.1074/jbc.272.44.27549. [DOI] [PubMed] [Google Scholar]
- 28.Zimmermann N, Colyer J. L, Koch L. E, Rothenberg M. E. Analysis of the CCR3 promoter reveals a regulatory region in exon 1 that binds GATA-1. BMC Immunol. 2005;6:7. doi: 10.1186/1471-2172-6-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Pierce R. A, Moore C. H, Arikan M. C. Positive transcriptional regulatory element located within exon 1 of elastin gene. Am J Physiol Lung Cell Mol Physiol. 2006;291:L391–L399. doi: 10.1152/ajplung.00441.2004. [DOI] [PubMed] [Google Scholar]
- 30.Lin Z, Ma H, Nei M. Ultraconserved coding regions outside the homeobox of mammalian Hox genes. BMC Evol Biol. 2008;8:260. doi: 10.1186/1471-2148-8-260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Tumpel S, Cambronero F, Sims C, Krumlauf R, Wiedemann L. M. A regulatory module embedded in the coding region of Hoxa2 controls expression in rhombomere 2. Proc Natl Acad Sci U S A. 2008;105:20077–20082. doi: 10.1073/pnas.0806360105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Woltering J. M, Duboule D. Conserved elements within open reading frames of mammalian Hox genes. J Biol. 2009;8:17. doi: 10.1186/jbiol116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Goldman N, Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994;11:725–736. doi: 10.1093/oxfordjournals.molbev.a040153. [DOI] [PubMed] [Google Scholar]
- 34.Ritter D. I, Li Q, Kostka D, Pollard K. S, Guo S, et al. The importance of being cis: evolution of orthologous fish and mammalian enhancer activity. Mol Biol Evol. 2010;27:2322–2332. doi: 10.1093/molbev/msq128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kumar S, Hedges S. B. A molecular timescale for vertebrate evolution. Nature. 1998;392:917–920. doi: 10.1038/31927. [DOI] [PubMed] [Google Scholar]
- 36.Birney E, Stamatoyannopoulos J. A, Dutta A, Guigó R ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447:799–816. doi: 10.1038/nature05874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Urbanek P, Fetka I, Meisler M. H, Busslinger M. Cooperation of Pax2 and Pax5 in midbrain and cerebellum development. Proc Natl Acad Sci U S A. 1997;94:5703–5708. doi: 10.1073/pnas.94.11.5703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Visel A, Blow M. J, Li Z, Zhang T, Akiyama J. A, et al. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature. 2009;457:854–858. doi: 10.1038/nature07730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Nagai T, Aruga J, Takada S, Gunther T, Sporle R, et al. The expression of the mouse Zic1, Zic2, and Zic3 gene suggests an essential role for Zic genes in body pattern formation. Dev Biol. 1997;182:299–313. doi: 10.1006/dbio.1996.8449. [DOI] [PubMed] [Google Scholar]
- 40.Frankel N, Davis G. K, Vargas D, Wang S, Payre F, et al. Phenotypic robustness conferred by apparently redundant transcriptional enhancers. Nature. 2010;466:490–493. doi: 10.1038/nature09158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Perry M. W, Boettiger A. N, Bothma J. P, Levine M. Shadow enhancers foster robustness of Drosophila gastrulation. Curr Biol. 2010;20:1562–1567. doi: 10.1016/j.cub.2010.07.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Dermitzakis E. T, Clark A. G. Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. Mol Biol Evol. 2002;19:1114–1121. doi: 10.1093/oxfordjournals.molbev.a004169. [DOI] [PubMed] [Google Scholar]
- 43.Force A, Lynch M, Pickett F. B, Amores A, Yan Y. L, et al. Preservation of duplicate genes by complementary, degenerative mutations. Genetics. 1999;151:1531–1545. doi: 10.1093/genetics/151.4.1531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Cai J, Zhao R, Jiang H, Wang W. De novo origination of a new protein-coding gene in Saccharomyces cerevisiae. Genetics. 2008;179:487–496. doi: 10.1534/genetics.107.084491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Levine M. T, Jones C. D, Kern A. D, Lindfors H. A, Begun D. J. Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc Natl Acad Sci U S A. 2006;103:9935–9939. doi: 10.1073/pnas.0509809103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Knowles D. G, McLysaght A. Recent de novo origin of human protein-coding genes. Genome Res. 2009;19:1752–1759. doi: 10.1101/gr.095026.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Zinzen R. P, Girardot C, Gagneur J, Braun M, Furlong E. E. M. Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature. 2009;462:65–70. doi: 10.1038/nature08531. [DOI] [PubMed] [Google Scholar]
- 48.Hubbard T. J. P, Aken B. L, Ayling S, Ballester B, Beal K, et al. Ensembl 2009. Nucleic Acids Res. 2009;37:D690–D697. doi: 10.1093/nar/gkn828. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Schwartz S, Kent W. J, Smit A, Zhang Z, Baertsch R, et al. Human-mouse alignments with BLASTZ. Genome Res. 2003;13:103–107. doi: 10.1101/gr.809403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Smith B, Ashburner M, Rosse C, Bard J, Bug W, et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007;25:1251–1255. doi: 10.1038/nbt1346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Venkatesh B, Kirkness E. F, Loh Y-H, Halpern A. L, Lee A. P, et al. Survey sequencing and comparative analysis of the elephant shark (Callorhinchus milii) genome. PLoS Biol. 2007;5:e101. doi: 10.1371/journal.pbio.0050101. doi: 10.1371/journal.pbio.0050101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Suyama M, Torrents D, Bork P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006;34:W609–W612. doi: 10.1093/nar/gkl315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Hubisz M. J, Pollard K. S, Siepel A. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief Bioinform. 2011;12:41–51. doi: 10.1093/bib/bbq072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Sandelin A, Alkema W, Engström P, Wasserman W. W, Lenhard B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 2004;32:D91–D94. doi: 10.1093/nar/gkh012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Koster R, Stick R, Loosli F, Wittbrodt J. Medaka spalt acts as a target gene of hedgehog signaling. Development. 1997;124:3147–3156. doi: 10.1242/dev.124.16.3147. [DOI] [PubMed] [Google Scholar]
- 56.Martinez-Morales J-R, Naruse K, Mitani H, Shima A, Wittbrodt J. Rapid chromosomal assignment of medaka mutants by bulked segregant analysis. Gene. 2004;329:159–165. doi: 10.1016/j.gene.2003.12.028. [DOI] [PubMed] [Google Scholar]
- 57.Heckman K. L, Pease L. R. Gene splicing and mutagenesis by PCR-driven overlap extension. Nat Protoc. 2007;2:924–932. doi: 10.1038/nprot.2007.132. [DOI] [PubMed] [Google Scholar]
- 58.Sambrook J, Fritsch E. F, Maniatis T. Molecular cloning, a laboratory manual. Cold Spring Harbor: Cold Spring Harbor Press; 1989. [Google Scholar]
- 59.Thermes V, Grabher C, Ristoratore F, Bourrat F, Choulika A, et al. I-SceI meganuclease mediates highly efficient transgenesis in fish. Mech Dev. 2002;118:91–98. doi: 10.1016/s0925-4773(02)00218-6. [DOI] [PubMed] [Google Scholar]
- 60.Yee S. P, Rigby P. W. The regulation of myogenin gene expression during the embryonic development of the mouse. Genes Dev. 1993;7:1277–1289. doi: 10.1101/gad.7.7a.1277. [DOI] [PubMed] [Google Scholar]
- 61.Finger J. H, Smith C. M, Hayamizu T. F, McCright I. J, Eppig J. T, et al. The mouse Gene Expression Database (GXD): 2011 update. Nucleic Acids Res. 2011;39:D835–D841. doi: 10.1093/nar/gkq1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Souren M, Martinez-Morales J. R, Makri P, Wittbrodt B, Wittbrodt J. A global survey identifies novel upstream components of the Ath5 neurogenic network. Genome Biol. 2009;10:R92. doi: 10.1186/gb-2009-10-9-r92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Christoffels A, Koh E. G, Chia J. M, Brenner S, Aparicio S, et al. Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Mol Biol Evol. 2004;21:1146–1151. doi: 10.1093/molbev/msh114. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.