Significance
The neocortex mediates complex cognitive and motor tasks in all mammals. A long-debated question is how this complex structure evolved in primitive mammals. Here we investigate the role of novel mammalian gene regulatory sequences in the emergence of the neocortex and the mechanisms by which these sequences emerged. We find that ∼20% of elements active during human and mouse neocortical development were born in early mammals. These novel mammalian elements enrich for cell migration, cell signaling, and axon guidance functions, implicating these processes in neocortical origins. In contrast to recent studies, we propose a model in which novel regulatory elements emerge as short sequences of minimal biological significance. Many disappear, but those that survive become increasingly complex over time.
Keywords: regulatory innovation, neocortical development, epigenetics, brain evolution
Abstract
Morphological innovations such as the mammalian neocortex may involve the evolution of novel regulatory sequences. However, de novo birth of regulatory elements active during morphogenesis has not been extensively studied in mammals. Here, we use H3K27ac-defined regulatory elements active during human and mouse corticogenesis to identify enhancers that were likely active in the ancient mammalian forebrain. We infer the phylogenetic origins of these enhancers and find that ∼20% arose in the mammalian stem lineage, coincident with the emergence of the neocortex. Implementing a permutation strategy that controls for the nonrandom variation in the ages of background genomic sequences, we find that mammal-specific enhancers are overrepresented near genes involved in cell migration, cell signaling, and axon guidance. Mammal-specific enhancers are also overrepresented in modules of coexpressed genes in the cortex that are associated with these pathways, notably ephrin and semaphorin signaling. Our results also provide insight into the mechanisms of regulatory innovation in mammals. We find that most neocortical enhancers did not originate by en bloc exaptation of transposons. Young neocortical enhancers exhibit smaller H3K27ac footprints and weaker evolutionary constraint in eutherian mammals than older neocortical enhancers. Based on these observations, we present a model of the enhancer life cycle in which neocortical enhancers initially emerge from genomic background as short, weakly constrained “proto-enhancers.” Many proto-enhancers are likely lost, but some may serve as nucleation points for complex enhancers to evolve.
The evolution of animal morphology requires changes in fundamental developmental processes. Recent studies suggest that altered gene regulation during development contributes to morphological differences between species (1–4). In several cases, individual regulatory changes have been shown to have strong effects on morphology, including reduction or loss of existing anatomical units (5–7). However, the mechanisms underlying morphological innovation, which includes the emergence of entirely novel anatomical structures and radical transformations of existing structures, remain unclear (see ref. 8).
One hypothesis is that morphological innovations are driven by the widespread emergence of new regulatory functions. These may arise through several potential mechanisms: modification of regulatory elements with ancestral functions, exaptation of specific classes of transposons to generate new regulatory sequences, and emergence of new regulatory elements in situ from nonfunctional, unconstrained genomic sequences. Although recent theoretical work in flies suggests that entire regulatory elements can evolve from genomic background on relatively short time scales (9), the de novo generation of regulatory elements by transposon exaptation is a particularly compelling mechanism. Many transposons include binding sites for multiple transcription factors, and transposition provides the means to deliver new regulatory functions to genes (10). Transposons were reported to have altered gene regulatory networks in human and mouse embryonic stem cells (ESCs) (11), and the origin and subsequent diversification of the placenta also likely involved widespread transposon exaptation (12–14). However, the process of de novo genesis of developmental regulatory elements during mammalian evolution, and the potential contribution of these elements to morphological innovations, has not been extensively investigated.
The mammalian neocortex is one of the most important innovations in vertebrate evolution and provides an experimental system to study de novo birth of developmental regulatory elements. In all extant mammals, the neocortex is organized into six layers, each comprising neurons with distinct identities and connectivities (15). The neocortex is derived from the dorsal pallium of the developing telencephalon, and its basic laminar architecture is specified during corticogenesis (15, 16). Nonmammalian vertebrates lack the six-layered forebrain architecture that defines the mature neocortex (17, 18). Adult structures derived from the dorsal pallium in birds and reptiles are vastly different from the neocortex at the structural, functional, and molecular level, complicating efforts to understand how the neocortex evolved (18–22). A recent study reported major transcriptomic divergence between the adult mouse neocortex and various chicken forebrain structures, supporting the hypothesis that mammal-specific regulatory functions contribute to the divergent morphology of the mammalian neocortex (20). Regulatory drivers of neocortical origins may therefore be found in the set of regulatory elements active during corticogenesis.
Recent studies have identified enhancers active during neocortical development in multiple mammals, making it possible to investigate the process and role of de novo enhancer genesis in this key mammalian innovation (23–25). Here, we began with sets of enhancers defined using epigenetic signatures of enhancer activity in the human and mouse developing neocortex. This contrasts with recent studies that attempt to identify and characterize genome-wide regulatory changes over vertebrate evolution using comparative genomics methods alone (26, 27). We identified neocortical enhancers that were likely active in the mammalian stem lineage and then inferred at what point these enhancer sequences emerged in vertebrate evolution. We found that ∼20% arose in the stem mammalian lineage, coincident with the emergence of the neocortex. These enhancers are overrepresented near genes involved in cell migration and axon guidance, most prominently in the ephrin and semaphorin signaling pathways. We did not find strong evidence for en bloc repeat exaptation as a mechanism for generating lineage-specific enhancers in the neocortex. Instead, our results evoke a model of enhancer evolution in which enhancers emerge from neutral background as simple regulatory sequences, or “proto-enhancers,” comprising a small number of sites under weak evolutionary constraint. Proto-enhancers that survive likely undergo substantial modification and over time become composites of younger and older functional segments. Thus, the emergence of the neocortex likely involved the utilization and modification of ancient regulatory functions in the forebrain, coupled with the emergence of novel regulatory functions in stem mammals associated with fundamental neurodevelopmental processes.
Results
Defining a Set of Neocortical Enhancers Exhibiting Conserved Epigenetic Marking in Mammals.
We used histone H3K27 acetylation data from human and mouse developing cortex to identify enhancers active during corticogenesis (Fig. 1A) (23). In each species, we examined three time points spanning critical events of corticogenesis, including the emergence of multiple transient proliferative embryonic zones in the dorsal pallium, which may be a mammal-specific characteristic; genesis and migration of cortical neurons; and initiation of neuronal connections (17, 22, 28). To infer neocortical enhancers that were likely active in the mammalian stem lineage, we identified enhancer regions in human and mouse that showed reproducible activity in both species based on epigenetic signatures (Fig. 1A and Dataset S1). We chose this stringent approach to minimize potential false-positive enhancer calls. For the purposes of our analysis, we defined an enhancer marked in both species as a reproducibly marked human enhancer from the midembryonic, late embryonic, or early fetal time point that overlaps a reproducibly marked mouse enhancer from any mouse time point. The term “neocortical enhancer” hereafter refers to a human enhancer that is also marked in mouse as defined above.
Dating the Sequence Origins of Neocortical Enhancers.
To identify neocortical enhancers that arose de novo in the mammalian stem lineage, we determined the vertebrate lineage on which each enhancer sequence emerged. First, we developed a method to assign inferred evolutionary ages across the alignable, nonexonic portion of the human genome (Fig. 1B and Fig. S1A). We identified sequence segments in the human genome that we inferred were specific to apes, primates, eutherians, therians, mammals, amniotes, tetrapods, gnathostomes, or vertebrates (Fig. 1C). These segments were then intersected with the set of neocortical enhancers as defined above. We determined the age of each enhancer based on the age of the oldest sequence segment within it. Using this approach, we found that ∼20% of neocortical enhancers were specific to mammals (Fig. 1 C and D, Fig. S1B, and Dataset S1). The oldest part of these enhancers, hereafter referred to as the enhancer “core,” likely originated in stem mammals as did the neocortex itself. We hypothesize that the majority of these mammal-specific sequences gained activity shortly after their birth in stem mammals. It is possible that some enhancers became active long after the underlying sequences were born but before divergence of the human and mouse lineages. However, we consider this to be less likely than the alternative hypothesis, as it would require the sequences to persist for long periods in the absence of purifying selection before gaining activity. We also examined a comprehensive adult liver H3K27ac enhancer dataset from placental mammals, and estimated that over 80% of mammal-specific enhancer sequences in this dataset were likely to be active in the placental mammal stem lineage (33) (see Materials and Methods).
We examined the age distribution of enhancers in other embryonic and adult tissues to understand if enhancer conservation patterns differ between tissues. The fraction of mammal-specific enhancers is roughly 20% in all of the tissues investigated (Fig. 1D and Fig. S1B), suggesting that the neocortex does not exhibit a disproportionately greater rate of mammal-specific regulatory innovation compared with other tissues. Adult tissues contain a higher proportion of recently evolved enhancers (eutherian- and therian-specific enhancers) than embryonic tissues. Conversely, embryonic tissues contain a higher proportion of ancient enhancers (amniote-, tetrapod-, gnathostome-, and vertebrate-specific enhancers) than adult tissues. Thus, active enhancers in developing tissues are more ancient than those in adult tissues, consistent with previous studies supporting the “developmental hourglass” model of gene regulatory evolution, in which embryonic stages of development exhibit higher conservation of gene expression than earlier or later stages (29, 30).
Mammal-Specific Neocortical Enhancers Are Enriched Near Genes with Ancient Functions.
We next explored the potential functional impact of mammal-specific neocortical enhancers. We developed a permutation strategy to identify biological processes and pathways in which these enhancers are overrepresented (Fig. S2A; see Materials and Methods for details). The key component of our approach is that we restricted permutation of enhancers to genome regions of the same phylogenetic age; for example, mammal-specific enhancers were shuffled to mammal-specific segments of the genome. The advantage of this approach is that it potentially minimizes confounding enrichments arising due to the ages of the sequences alone (independent of their neocortical enhancer functions). To identify enriched processes and pathways, we associated mammal-specific enhancers and the age-matched permuted enhancers with nearby genes and compared the number of enhancers in Gene Ontology (GO) Biological Processes and KEGG annotated pathways (Fig. S2A and Materials and Methods).
Mammal-specific neocortical enhancers are significantly overrepresented near genes involved in cell and neurite movement, including cell migration, the actin cytoskeleton, and axon guidance (Fig. 2A and Dataset S2). Transcription and TGF-B signaling are also enriched. Axon guidance is of particular interest: Mammal-specific enhancers are significantly overrepresented near axon guidance genes at all three stages of development (Fig. 2A and Dataset S2). Axons are guided to their targets during development by a variety of signaling molecules, including netrins, slits, ephrins, and semaphorins, which are received by specific receptors. These axon guidance subpathways are also enriched in mammal-specific enhancers: Ephrin signaling shows enrichment at all three stages of development, semaphorin signaling at two stages, and slit signaling at one stage (Dataset S2). Examination of the ephrin signaling subpathway reveals a trend seen in some enriched functional categories. Genes within this ancient pathway associate with ancient (amniote and older), intermediate (mammal-specific), and recently evolved (eutherian and therian) neocortical enhancers (Fig. 2B; see also Fig. S2B and Dataset S2). This suggests that some pathways active in the vertebrate forebrain have been recurrently modified by novel regulatory elements over the course of vertebrate forebrain evolution.
For comparison, we examined the potential functions of other clade-specific enhancers in the cortex and other tissues. Although mammal-specific and younger neocortical enhancers are overrepresented near axon guidance genes, recently evolved neocortical enhancers also exhibit a distinct set of enrichments, including intracellular signal transduction, response to stress, and protein ubiquination (Dataset S2 and Fig. S3). Amniote-specific neocortical enhancers are overrepresented near axonogenesis genes, and the most ancient neocortical enhancers do not exhibit any enrichments. Although there is some overlap of enriched functions across tissues (e.g., many signaling pathways), there are also tissue-specific signatures (Dataset S2). For example, active enhancers in the liver are overrepresented near genes involved in blood coagulation, insulin signaling, and a number of metabolic processes, which are all functions related to liver biology (Dataset S2 and Fig. S3). In contrast, neocortical enhancers enrich near axon guidance genes and other brain-related processes (Dataset S2 and Fig. S3).
We also used an orthogonal method to investigate the function of mammal-specific neocortical enhancers by integrating our enhancer maps with gene coexpression analysis in the developing neocortex. Coexpressed genes likely share regulatory mechanisms in common. Therefore, sets of coexpressed genes that are enriched with mammal-specific neocortical enhancers may reveal biological pathways involved in neocortical origins. We generated a gene coexpression network using publicly available RNA-seq data acquired from multiple human neocortical regions from embryonic to early fetal development, which encompasses comparable developmental stages as the chromatin data we used to identify enhancers (Materials and Methods). This expression network consists of 96 “modules,” which are subsets of genes with highly correlated expression across developmental time and space. We used permutation analysis to determine which network modules are significantly enriched with mammal-specific neocortical enhancers. As described above, we associated enhancers with genes and permuted enhancers to age-matched genome sequence. We found that 13 modules are significantly enriched with mammal-specific neocortical enhancers (Fig. 3A and Dataset S3). We also examined overrepresentation of neocortical enhancers of other ages in the network. We found that some modules exhibit enrichment across most of the enhancer sets (e.g., enhancers specific to eutherians, therians, mammals, and amniotes are enriched in module 4), whereas some modules are only enriched with enhancers dating to a single clade (e.g., modules 8, 12, 27, 40, 47, and 81 are enriched only for mammal-specific neocortical enhancers) (Dataset S3).
The five largest modules enriched for mammal-specific neocortical enhancers (3–5, 7, 8) contain a sufficiently large number of well-annotated genes to detect potential functional enrichments (Fig. 3B and Dataset S3). Module 3, containing 4,619 genes, is enriched with mammal-specific neocortical enhancers at all stages of development (Fig. 3 A and B and Dataset S3). Mammal-specific enhancers in module 3 enrich for functions that emerged at the genome-wide level, including cell migration, axon guidance, transcription, and TGF-B signaling (Fig. 3B and Dataset S2). Consistent with this, axon guidance and transcription-related functions are enriched with mammal-specific enhancers in three modules each (Fig. 3 A and B and Dataset S2). Thus, coexpression network analysis independently points to many of the same functions revealed by genome-wide pathway analyses. Interestingly, the axon guidance subpathways mentioned above partition by module: Mammal-specific enhancers enrich near ephrin-signaling genes in modules 3 and 7 and near semaphorin-signaling genes in module 4 (Fig. 3 A and B and Dataset S2).
Neocortical Enhancers Do Not Exhibit Strong Evidence of Transposon Exaptation.
Our functional characterization of enhancers that emerged de novo in the mammalian stem lineage suggests that these enhancers may have modified ancient developmental processes in the forebrain. Understanding the molecular mechanisms that generated these enhancers would shed further light on neocortical evolution and the origins of novel regulatory elements in general. Because transposable element-repeat exaptation has been implicated in regulatory and morphological innovation, we first examined transposable element repeats in neocortical enhancers.
To understand the background distribution of repeats in the age-segmented genome, we calculated repeat content across the alignable, nonexonic portion of the human genome. We used repeat annotations from two well-annotated genomes, human and mouse, to identify as many repeats as possible (see Materials and Methods). We found that a high percentage of positions in the most recently evolved genomic sequences (ape, primate, and eutherian) overlap with repeat elements, but this percentage declines in sequences of progressively more ancient origin (Fig. 4A, Left). We next investigated whether neocortical enhancers exhibit the same pattern, focusing on repeat composition in enhancer cores (the oldest region within each enhancer). The trend is similar in neocortical enhancer cores: Roughly 30% of bases in eutherian-specific enhancer cores intersect a repeat element, whereas less than 6% do in mammal-specific and older enhancer cores (Fig. 4A, Right). This suggests either that the majority of mammal-specific and more ancient enhancers are not derived from repeats or that they are derived from repeats that have since lost their repeat signatures due to subsequent sequence changes. In either case, en bloc repeat exaptation—where the entire repeat or most of it was exapted and maintained in the host genome—is not a likely mechanism for the emergence of these older enhancer functions. We also examined the enrichment of repeats in enhancers of different ages by developing and implementing a permutation test that controls for sequence age. Repeat classes, families, and species emerge at specific points in vertebrate evolution. Therefore, genome sequences of specific phylogenetic ages will be enriched for particular repeats simply because of their age, irrespective of whether they encode enhancers or not. The goal of our analysis was to identify repeats contributing to the evolution of novel neocortical enhancers specifically, rather than new sequences in general (Fig. S2A and Materials and Methods). Using this approach, we did not find strong enrichments (defined here as >threefold over background) of any repeat in neocortical enhancers of any age over what would be expected based on the ages of the sequences alone (Dataset S4 and Materials and Methods).
We considered two possible explanations for why eutherian-specific neocortical enhancers exhibit relatively high repeat content, whereas older neocortical enhancers exhibit low repeat content. One possibility is that repeats played a larger role in the emergence of novel enhancer functions in stem eutherians than in more ancient lineages. Alternatively, the large number of repeats found in eutherian-specific enhancers may simply be a consequence of the age of the underlying sequence, rather than indicating a specific role for repeats in the evolution of new regulatory functions. To investigate these hypotheses, we examined signatures of inferred evolutionary constraint within repeat elements in enhancers. Strong evidence of constraint in repeats would suggest they specifically contribute to enhancer function. We assessed constraint using eutherian mammal phastCons conserved elements, which are defined based on inferred sequence constraint in eutherian genomes. To allow for comparisons of constraint across enhancer sets of different ages, we used eutherian phastCons elements for all sets. In eutherian-specific enhancer cores and most of the other sets, we found that a median of <1% of repeat-derived positions overlap a phastCons element (Fig. 4B and Fig. S4A). Thus, repeats in eutherian-specific enhancer cores do not appear to encode strongly constrained regulatory functions.
Recently Evolved Neocortical Enhancers Exhibit Weak Constraint Compared with Older Enhancers.
The observation that unconstrained repeats make a substantial contribution to eutherian enhancer sequences suggests that enhancers of distinct ages may show categorically different degrees of constraint. We therefore considered the level of constraint, within eutherian mammals, in all neocortical enhancers. Although most eutherian-specific enhancer cores contain at least one eutherian phastCons element, these elements overlap a median of less than 5% of positions in those cores (Fig. 5A and Fig. S4B). In contrast, a median of ∼25% of positions in mammal-specific enhancer cores and ∼75% in the most ancient cores are constrained (Fig. 5A and Fig. S4B). In addition, the few bases that are constrained in eutherian enhancer cores exhibit weaker constraint scores (Fig. S4C). We obtained similar results using other constraint metrics (average per-base phastCons score, average per-base phyloP score; Fig. S4D), and the same pattern holds in other embryonic and adult tissues (Fig. S4E). Although enhancer cores of all ages exhibit significantly stronger constraint than age-matched genome background [Benjamini Hochberg (BH) P < 0.00025 for all tests; Materials and Methods], eutherian enhancer cores clearly exhibit weaker constraint within eutherian mammals and have smaller constraint footprints than older enhancer cores. These results do not appear to be a product of poor alignability or lack of power to detect constraint in eutherian-specific sequences, as enhancer cores of all ages show high (>95%) and similar degrees of alignability in eutherian genomes (Fig. S4F) and some eutherian sequences exhibit extremely high constraint [log-odds (LOD) scores >1,000]. Thus, our results suggest that the proportion of nucleotides encoding regulatory information under constraint in eutherian-specific enhancers, and the magnitude of constraint at those positions, is relatively small. Many recently evolved neocortical enhancers—despite their conserved biochemical activity—may not exert strong effects on gene expression or developmental processes compared with ancient enhancers and may be more likely to turn over.
Neocortical Enhancers Are Composites of Functional Segments of Distinct Ages.
Consistent with our finding that recently evolved enhancer cores contain fewer constrained positions than older enhancers, we found that recently evolved enhancers exhibit smaller H3K27ac signatures (Fig. 5B). The evolutionary composition of neocortical enhancers provides a molecular basis for this phenomenon. As illustrated in Fig. 1B and summarized in Fig. 5B, many enhancers consist of a core that was likely the first functional unit of the enhancer under constraint as well as sequence segments of more recent origin. Younger and older sequence segments encompassed by the biochemical signatures of enhancer activity are not ordered in any particular arrangement. For example, in some cases, ancient segments are found in the center of the enhancer but in other cases are toward the edge, and ancient segments can be adjacent to recently evolved segments without intermediate-aged sequence in between (for one example, see Fig. S5A). Also, there is no clear difference in segment arrangement between enhancer sets of different ages. In all sets, the most constrained segment can be found anywhere along the length of the enhancer, and there is no difference in the median location across sets (Fig. S5B).
Older enhancers may have larger biochemical footprints because the derived sequences within them contain transcription factor binding sites that recruit chromatin modifiers over a wider area. Supporting this, derived sequences in neocortical enhancers also exhibit evidence of evolutionary constraint (Fig. 5C). There are more positions under constraint in therian, mammalian, and older enhancers than can be accounted for by the cores alone, suggesting that derived sequences near the core contribute to enhancer function (Fig. 5C). Moreover, the distribution of phastCons elements in derived sequences near enhancer cores is not just a property of the genome-wide background, as there are significantly more phastCons elements in derived segments of therian, mammalian, amniote, and older-than-amniote enhancers than expected (Fig. S5C). This supports that enhancers are composites of functional segments of distinct ages and that cores are nucleation points around which enhancers can be further modified (see Discussion).
Discussion
Our goal in this study was to characterize the role of lineage-specific regulatory elements in the emergence of morphological innovations and to gain insight into the evolutionary mechanisms that generated those novel regulatory functions. Starting with biochemically defined enhancers, we used a comparative genomic approach to identify the subset of active neocortical enhancer sequences that emerged during the same period of vertebrate phylogeny as the neocortex itself. We did not observe a “burst” of mammal-specific enhancer innovations in the neocortex compared with other tissues—roughly 20% of enhancers in all of the tissues we examined are mammal-specific. To explore the role mammal-specific neocortical enhancers might have played in neocortical origins, we used two complementary metrics: pathway enrichment and enrichment in gene coexpression networks. Both methods independently point to many of the same processes and pathways, notably those related to cell and neurite movement. Because correct positioning of neurons and establishment of precise synaptic connections during corticogenesis play a prominent role in the formation of a laminar neocortex during development, it is not surprising to see these same processes altered during evolution of the structure. The sets of genes and enhancers we identified in these pathways provide a biologically informed starting point for future studies aimed at dissecting the precise genetic mechanisms underlying neocortical origins.
Our results also provide insight into the evolutionary processes by which new enhancer functions arose in the cortex. We hypothesized that many of the lineage-specific neocortical enhancers we identified would be derived from repeat sequences, as repeats have been shown in other systems to drive regulatory innovation (11, 14). If the primary mechanism of enhancer innovation in the neocortex was en bloc repeat exaptation, we would expect to find constraint across entire repeat bodies as well as strong repeat signals at distant time scales. Instead, we found that mammal-specific and older enhancers in the developing cortex and other tissues contain few detectable repeats. Eutherian-specific enhancers do contain many repeats, but this reflects eutherian-aged genome background, and only a small fraction of each repeat exhibits evidence of sequence constraint in eutherian mammals (Fig. 4B). This is consistent with results from a recent study showing that many constrained eutherian elements are derived from repeats, but only a small fraction of bases within each repeat are under constraint (31). Thus, widespread, en bloc repeat exaptation was likely not responsible for the emergence of novel neocortical enhancers at any of the time scales we investigated. It is possible that this mechanism of regulatory innovation is more likely to occur in cell-autonomous regulatory networks (such as in ES cells) or in tissues that show large lineage-specific variability (such as the placenta) (32). Deeply conserved regulatory networks contributing to patterning and differentiation of morphological structures may be less flexible to such large-scale rewiring, as disrupting these networks would disrupt development.
Model of the Enhancer Life Cycle in the Vertebrate Forebrain
In the absence of widespread repeat exaptation, we propose an alternative model of the evolutionary life cycle of developmental enhancers in the forebrain (Fig. 6). We infer a mechanism of enhancer genesis based on characteristics of eutherian-specific enhancers and later stages of the enhancer life cycle based on older enhancers in our dataset.
Emergence of Proto-enhancers.
Many eutherian-specific neocortical enhancers have small biochemical footprints and contain few positions under constraint (Fig. 5). This suggests that eutherian enhancers encode less regulatory information maintained by purifying selection than older enhancers. Although it was surprising to find that enhancers shared at the biochemical level are not strongly constrained at the sequence level, two properties of eutherian genomes help explain this result. First, eutherian genomes are highly similar due to the eutherian radiation; second, it is well understood that unconstrained, biologically nonfunctional sequences are able to recruit transcriptional machinery. A region of the eutherian genome that is not strongly constrained at the sequence level may thus exhibit conserved “enhancer” activity simply because it contains similar information content.
Based on this, we infer that neutrally evolving, biochemically active binding sites, such as those present in the eutherian genome, provided the raw material in earlier lineages for minimally functional regulatory elements to emerge in the neocortex (Fig. 6). These minimal elements, which we term proto-enhancers, consist of a small number of sites under evolutionary constraint. Repeat expansions appear to have been a major source of new sequences in mammalian genomes (Fig. 4A); thus, many proto-enhancers may emerge out of transposable element repeats but without en bloc exaptation of entire repeat sequences. Not all eutherian-specific enhancers are weakly constrained and therefore are likely past the proto-enhancer stage of their life cycle. However, based on the large number that are weakly constrained, we infer that at any given time in the vertebrate forebrain, there exists a large pool of proto-enhancers with biochemical activity. Many of these proto-enhancers likely have weak or no effects on gene expression and are subject to rapid turnover (33).
To be clear, our inference that proto-enhancers existed in ancient vertebrate lineages is based on observations of eutherian-specific neocortical enhancers active in human and mouse. We are able to make this inference because the property of the eutherian genome that likely enables proto-enhancers to arise—its large size—was also a property of more ancient vertebrate genomes (34–36). There is an increased likelihood in a larger sequence space that minimally functional elements will arise by chance out of nonfunctional DNA. Thus, any large genome is likely to contain thousands of proto-enhancers that are continuously turning over.
Proto-enhancers Evolve into Enhancer Cores.
In contrast to the majority of eutherian-specific enhancer cores, older neocortical enhancer cores exhibit moderate to high sequence constraint (Fig. 5 and Fig. S4). This suggests that proto-enhancers that are maintained in evolution may serve as nucleation points for the formation of complex enhancer cores, which contain more regulatory information under constraint than proto-enhancers (Fig. 6). If proto-enhancers in the forebrain emerged in past lineages from repeat-derived sequences as suggested above, then enhancer cores necessarily underwent substantial modification, as repeat signatures are minimal in older enhancer cores (37). It is possible that the initial nucleation event, whether it occurs in a repeat element or not, makes chromatin around the site more accessible and visible to selection. This would facilitate the transformation of small proto-enhancers into complex regulatory cores.
Enhancer Cores Evolve into Composite Enhancers.
Increased chromatin accessibility at an enhancer core may also facilitate the evolution of enhancer cores into composite enhancers (Fig. 6). Two general characteristics of the older enhancers in our dataset indicate that they are composites of older and younger functional segments: They have larger biochemical footprints, and they contain derived sequences that are under constraint (Fig. 5 and Fig. S5). We find much support for composite enhancers in the literature. For example, a Fugu cis-regulatory element for Sox21 consists of a core sequence that is deeply conserved in deuterostomes and a lineage-specific flanking region, both of which are required for its robust transgenic activity in the zebrafish lens (38). Experimental surveys of enhancers in transgenic mouse reporter assays also suggest that derived sequences contribute to enhancer function. For two ancient enhancers, a version that contains both ancient and derived sequences drove stronger lacZ reporter activity in transgenic mouse embryos than the ancient sequence alone, and in one of these cases, a novel expression domain was added (Fig. S5 D and E) (39, 40). Finally, our observations are consistent with results from an earlier study that used noncoding conservation to identify blocks of sequence constraint in vertebrates (41). Similar to our enhancers, the constrained elements in this study exhibited two characteristics: Elements that are deeply conserved in vertebrates are longer than those conserved only between humans and rodents, and deeply conserved elements consist of a small ancient core sequence and derived sequences.
Our model of neocortical enhancer evolution presented above—that regulatory elements emerge as simple proto-enhancers and increase in complexity and size over time—resembles models put forth on the life cycle of genes. A number of studies suggest that genes often arise de novo from noncoding DNA, and in fact this mode of gene birth may be more prevalent than gene duplication (42–45). Recently evolved “proto-genes” are short and simple and have high turnover rates. Proto-genes that survive transform into bona fide genes, which are longer, more complex, and under strong purifying selection. These models of gene and enhancer evolution suggest that genomes are full of minimally functional elements that stochastically emerge de novo from neutral sequence. The emergence of proto-elements may be a universal feature of large genomes. Many of these elements eventually disappear, but some of them provide the substrate for complex, highly constrained, and biologically central genomic elements to emerge. Taken together, our results support that large vertebrate genomes contain thousands of sequences at various stages of the enhancer life cycle. These may provide the substrate necessary for widespread regulatory innovations to occur by modification of existing sequence elements.
Materials and Methods
Tissue and ChIP-Seq.
The cortical H3K27ac ChIP-seq data used in this study were generated previously (23). Briefly, these data were derived from human cortical tissue from 7 postconception weeks (p.c.w.; referred to as midembryonic), 8.5 p.c.w. (referred to as late embryonic), and 12 p.c.w. (referred to as early fetal). Mouse cortical tissue was collected from embryonic day (E) 11.5 (referred to as midembryonic), E14.5 (referred to as late embryonic), and E17.5 (referred to as early fetal). Two biological replicates were used for each time point in each species. For human samples from 12 p.c.w. and mouse samples from E17.5 cortex, ChIP-seq reads from the primitive frontal and occipital lobes (which were microdissected and treated separately in ref. 23) were merged before read alignment and peak detection. The use of human embryonic tissue was reviewed and approved by the Yale Human Investigation Committee. Mouse cortical tissues were harvested according to approved Yale Institutional Animal Care and Use Committee protocols.
We included enhancer maps from additional tissues in some of our analyses. Human H3K27ac data for ESCs, spleen, liver, heart, small intestine, and kidney are from the Roadmap Epigenomics Project (www.roadmapepigenomics.org); for developing limb from ref. 46; and for a second replicate of ESC from the ENCODE project. Mouse H3K27ac data for developing limb are from ref. 46 and for ESC, spleen, liver, heart, small intestine, and kidney are from ref. 47.
Identifying Enhancers Marked in Human and Mouse.
ChIP-seq reads were aligned and peaks were detected as in ref. 23. Reproducible peaks were defined as those having 1 bp minimum overlap between the two ChIP-seq replicates from the same time point and species. Reproducible peak boundaries were defined by merging the coordinates of each replicate. Reproducible peaks that did not overlap a 1-kb segment upstream of any transcriptional starts site or any annotated exon in Gencode (v10) were called putative enhancers.
Putative enhancers were then compared between species using liftOver from the UCSC genome browser. An enhancer region was considered marked in human and mouse if the human region lifted over to mouse at a unique location and if it intersected a reproducible peak from any time point in mouse by at least 1 bp; the intersected mouse peak was then required to uniquely lift over to human and intersect with the original human region. Such two-way orthologous enhancers were identified as having conserved epigenetic marking in human and mouse and were used in all downstream analyses.
Age Segmentation of the Human Genome and Inference of Phylogenetic Enhancer Ages.
We used the 46-way MultiZ alignment from the UCSC genome browser for our age segmentation analysis. Other recent large-scale comparative genomic studies have also used the MultiZ alignment (26, 31). We excluded species with genomes with less than 5× sequencing coverage, as low coverage genomes are known to result in alignment artifacts (26). This left 25 high-quality vertebrate genomes for age segmentation (see Fig. S1A for a list of all of the species). Nonexonic regions of the human genome were profiled for age segmentation based on Gencode (v10) annotation. First, the human genome was split into 100-bp windows overlapping by 50 bp, and exons were subtracted from windows. In each window, the percentage of alignable bases to the human reference in each of the 25 vertebrate species was counted. The most distantly related species with at least 50% alignable bases to human was identified for each window, and that identification was used to assign the window to one of the following clades: Ape, Primate, Eutheria, Theria, Mammalia, Amniota, Tetrapoda, Gnathostomata, or Vertebrata. Ref. 26 used a similar method to date conserved noncoding elements in the human genome, but in that analysis only ∼33% of each element was required to align to identify its most recent common ancestor. To further reduce the number of misalignments in our dataset, sequences assigned to clades Amniota, Tetrapoda, Gnathostomata, and Vertebrata were required to align to intermediate species (for example, an amniote sequence was required to have at least 50% alignability between human and chicken, zebra finch, or lizard, in addition to 50% alignability between human and platypus; see Dataset S5 for the specific alignability requirements). If a window did not align with intermediate species, it was considered to have a low confidence alignment and was not used to date enhancers. After inferring phylogenetic ages for each genome window, overlapping or adjacent windows of the same age were merged into longer segments. To infer phylogenetic ages of enhancers marked in human and mouse, the human enhancers were intersected with the age-segmented human genome. The oldest segment intersecting an enhancer by ≥100 bp was used to assign the enhancer to one of the groups mentioned above. Enhancers assigned to groups Tetrapoda, Gnathostomata, and Vertebrata were pooled into one group called “Older Than Amniota.”
To infer the percentage of mammal-specific neocortical enhancers that were active in early mammals, we leveraged a comprehensive adult liver H3K27ac dataset from placental mammals (33). As argued in ref. 33, if liver enhancer activity is shared in many lineages among several distinct clades of placental mammals, it was likely active in stem placentals. We defined a set of liver enhancers that exhibit shared activity in human and mouse, and then dated these enhancers using our enhancer dating approach. We determined that >80% of “mammal-specific” liver enhancers also have activity in at least one other placental mammal with a high-quality genome that is outside the human-mouse clade, suggesting they were likely active in early placental mammals.
Permutation Enrichment Analysis for KEGG Pathways and GO Terms.
To identify KEGG pathways and GO Biological Process terms enriched with neocortical enhancers, we developed a customized permutation test based on shuffling neocortical enhancers to age-matched genome segments. Each set of human neocortical enhancers at each time point in human that was assigned a phylogenetic age as described above (Eutheria, Theria, Mammalia, Amniota, Tetrapoda, Gnathostomata, and Vertebrata) was randomly reassigned to genome segments with the same age as the enhancers (each enhancer had to intersect an age-matched genome segment by ≥100 bp and could not intersect any older segment). In addition, reassignment was required to be on the same chromosome and with a similar distance to a gene (enhancers were categorized as being within 10 kb of a gene, 100 kb, more than 100 kb, or within an intron) (12). This permutation was performed 20,000 times, and these permuted sets of enhancers were used in all permutation tests referred to in the manuscript.
The association of enhancers to genes was performed using the regulatory domain rules from GREAT (48). First, genes were assigned regulatory domains: the basal regulatory domain of a gene was defined as 5 kb upstream and 1 kb downstream of the transcriptional start site; the gene regulatory domain was extended up to 1,000 kb in both directions to the nearest gene’s basal domain. After regulatory domain assignment, each enhancer or shuffled enhancer was assigned to all genes whose regulatory domain it overlaps.
For each set of enhancers (e.g., mammal-specific enhancers from the late embryonic time point in human) and permuted enhancers, the number of enhancers associated with genes in each KEGG pathway (August 2012) or GO term (Ensembl v65) was counted. The permutation P value for each test was calculated as the number of permuted enhancer sets with an equal or higher number of enhancers in each KEGG pathway or GO term than the observed set of enhancers, divided by 20,000 (the total number of permutations). The permutation P values were then corrected for multiple testing errors using the BH method (49). We report in the Results a subset of enriched categories that have a BH-corrected permutation P value of <0.05, that are enriched at two or more time points, and that contain ≥20 enhancers. The complete results are presented in Dataset S2.
We also divided the axon guidance KEGG pathway into subpathways based on KEGG annotations: Netrin signaling, Ephrin signaling, Semaphorin signaling, and Slit signaling. We performed permutation enrichment analysis of the subpathways as described above. For Fig. 2B, we pooled the eutherian and therian neocortical enhancers from the late embryonic stage into a set called “younger than mammal” and pooled the amniote and older than amniote enhancers into a set called “older than mammal.” For axon guidance subpathway permutation results, we did not require a minimum number of enhancers per category to report enrichment, as the subpathways contain many fewer genes.
Coexpression Network Analysis and Permutation Enrichment Analysis for Network Modules.
In this study, we used a previously generated human neocortical coexpression network (23). This network was constructed using RNA-seq data from human neocortical regions spanning 8–15 p.c.w., generated by the BrainSpan consortium (www.brainspan.org). Weighted Coexpression Network Analysis (WGCNA) was performed using the Bioconductor package WGCNA (50). See ref. 23 for a detailed description of network construction. Modules were visualized in Cytoscape using the organic layout option. For Fig. 3A, Right, genes in module 3 with an absolute correlation coefficient >0.85 with any other gene in the module were visualized on the figure.
The expression correlation of all of the modules in the network was determined by correlating module eigengene expression. Module eigengene expression is similar to the average expression profile of the module. See ref. 23 for details on using module eigengene expression to construct the network of modules. For Fig. 3A, Left, we visualized modules in the network with an absolute correlation coefficient >0.20 with any other module.
To test for significant enrichment of neocortical enhancers of a certain age (e.g., mammal-specific enhancers) in each module in the WGCNA network, we used the permutation test described above, shuffling neocortical enhancers to age-matched genome segments. The number of neocortical enhancers and shuffled enhancers that were associated with genes in each of the modules (using GREAT rules as described above) was counted. The permutation P value for each test was calculated as the number of permuted enhancer sets with an equal or higher number of enhancers in each network module than the observed set of enhancers, divided by 20,000 (the total number of permutations). The permutation P values were then corrected for multiple testing errors using the BH method (49). We note that mammal-specific enhancers in the cortex are likely to be correlated with genes expressed in the cortex, so there may be some background enrichment in the network that is “boosting” the observed enrichments compared with the set of all mammal-aged segments in the genome. However, we see enrichment in modules that are not highly expressed (e.g., modules 27, 54, 81, and 95), and we see a lack of enrichment in modules that are expressed (e.g., modules 2, 6, 9, and 10), which mitigates our concern.
For all network modules, we tested for overrepresentation of neocortical enhancers in KEGG pathways and GO terms as described above. We report in the Results a subset of enriched categories that have a BH-corrected permutation P value of <0.05, that are enriched at two or more time points, and that contain ≥10 enhancers. The complete results are presented in Dataset S2. For axon guidance subpathways, we did not require a minimum number of enhancers per category to report enrichment.
Repeat Analysis.
We performed two complementary analyses of repeat content in enhancers. In the first analysis, we measured repeat overlap in the nonexonic human genome and in the neocortical enhancer sets we defined (Fig. 4). Repetitive element annotations were downloaded from RepeatMasker tracks (May 2013 for human genome hg19 and mouse genome mm9) from the UCSC Genome Browser. To include ancient repeats that may not be annotated in the human genome, mouse annotated repeats were lifted over to hg19; those that lifted over were merged with repeats annotated in hg19 and included in the analyses below. For the intersection of repeats with aged genome segments (Fig. 4A, Left), we counted the percentage of bases in all nonexonic human genome segments in each age category that overlap with repetitive elements. For the intersection of repeats with neocortical enhancers of different ages (Fig. 4A, Right), we counted the percentage of bases in the enhancer cores (the union of all windows with the oldest age assignment in each enhancer) that overlap with repetitive elements.
In our second analysis, we examined enrichment of specific repeats in enhancer sets of different ages (Dataset S4). We were specifically interested in whether neocortical enhancers of a particular phylogenetic age are enriched for repeats over what would be expected due to the age of the underlying sequences. This would potentially identify specific repeats contributing to the evolution of novel neocortical enhancers in particular rather than new sequences in general. Other methods used to detect repeat enrichment (12, 14, 51, 52) were not suitable to use here, as these methods do not consider the nonrandom variation in the ages of background sequences in the genome when estimating the background expectation for enrichment tests. First, we estimated the phylogenetic ages of individual repeats in the human genome. Repeats annotated in hg19 were intersected with the age-segmented human genome, and the age of a repeat was determined as the age of the oldest segment intersecting with the repeat by ≥100 bp or containing >90% of the repeat. We then used permutation to test for the enrichment of specific repeats in neocortical enhancers of different ages (and those from other embryonic and adult tissues) (Dataset S4). Enhancers were shuffled to age-matched genome segments as described in previous sections. Only individual repeats with the same age as the enhancer cores were considered. The permutation P value for each test was calculated as the number of permuted enhancer sets with an equal or higher number of enhancers containing a specific repeat class, family, or species than the observed set of enhancers, divided by 20,000 (the total number of permutations). The permutation P values were then corrected for multiple testing errors using the BH method (49). We performed the permutation test for all annotated repeats in RepeatMasker but present a filtered set of the most informative results in Dataset S4 as follows. We did not report results for simple/low complexity repeats, any repeat species with an unknown family or class affiliation, or repeats that intersect fewer than 20 enhancers and do not exhibit enrichments in any of our datasets. Some repeat classes, families, and species were found to be enriched, especially in the eutherian neocortical enhancers (BH permutation P value of <0.05, a minimum enhancer count of 20, and >1% of the enhancer set contains the repeat of interest). However, they were weakly enriched (defined here as less than a threefold enrichment relative to background; Dataset S4). We also examined depletions of repeats in enhancers, but few repeats appear to be significantly depleted in enhancers, and those that are depleted are weakly depleted (defined here as less than a threefold depletion relative to background) or are present at low copy number in age-matched genome segments overall (Dataset S4).
Evolutionary Constraint Analysis.
Eutherian mammal phastCons elements and scores and per-base eutherian mammal phastCons and phyloP scores for the hg19 human genome assembly were downloaded from the UCSC genome browser (53, 54). To infer levels of evolutionary constraint in enhancers, neocortical enhancers of different ages were intersected with eutherian mammal phastCons elements. We calculated the percentage of bases in enhancer cores that overlap with a phastCons element (Fig. 5A and Fig. S4 B and E). We also calculated the percentage of bases in repeat elements (annotated on hg19 and mm9 as described above) in enhancer cores that overlap a phastCons conserved element (Fig. 4B and Fig. S4A). We also calculated the total number of bases in both the core and the whole enhancer that overlap a phastCons element (Fig. 5C). Finally, we considered the best scoring eutherian mammal phastCons element in enhancer cores (Fig. S4C), average per-base phastCons scores across enhancer cores (Fig. S4D), and average phyloP scores across enhancer cores (Fig. S4D). All methods for inferring constraint gave similar results.
For comparing evolutionary constraint of enhancer cores to age-matched genome background, we implemented the same permutation strategy described above (Results). We calculated the percentage of bases of enhancer cores and shuffled enhancer cores that overlap a phastCons element. The permutation P value for each test was calculated as the number of permuted enhancer sets with an equal or higher median fraction of bases covered by phastCons elements than the observed set of enhancers, divided by 20,000 (the total number of permutations). The permutation P values were then corrected for multiple testing errors using the BH method (49).
To determine whether our finding of weak constraint in young enhancers was influenced by poor alignability in younger enhancer sequences, we examined sequence alignability in neocortical enhancer sets of all ages. For Fig. S4F, we looked at conserved 100-bp windows in enhancer cores, determined the percentage of aligned bases between human and each of the eutherian species in our dataset, and calculated the average percentage of aligned bases for each enhancer. Some values are above 1 because the human sequence may have deletions compared with the other species.
To test whether derived segments of enhancers are enriched with phastCons elements (Fig. S5C), we permuted therian, mammalian, amniote, and older than amniote neocortical enhancers from the human late embryonic time point to random locations on the same chromosome. A total of 10,000 permutations were performed for each enhancer set. Permuted cores were annotated at corresponding positions within permuted enhancers. Permuted enhancers were not allowed to overlap exons or 1 kb upstream of transcriptional start sites (as annotated in Ensembl release 67). Permuted sets of enhancers and cores were intersected with eutherian mammal phastCons conserved elements. The numbers of phastCons elements that intersected permuted enhancers and cores were summed across every chromosome for each individual permutation. We subtracted the number of phastCons elements that intersected permuted cores from the number that intersected permuted enhancers, which yielded the number of phastCons elements that fell outside of permuted cores. We compared the number of phastCons elements outside of permuted cores to that in therian, mammalian, amniote, and older than amniote neocortical enhancer sets. The z scores for the observed number of phastCons elements outside of cores were calculated based on the permutation distribution.
LacZ Mouse Transgenic Enhancer Assay.
Images from previously tested enhancers hs322 and hs174 (Fig. S5D) were obtained from the VISTA Enhancer Browser (39). Images for the previously tested enhancer HACNS1 (HACNS1 “full length” in Fig. S5E) are from ref. 40. The HACNS1 core (Fig. S5E) sequence (chr2:236773664–236774216 in hg19) was cloned into a previously described Hsp68-lacZ reporter vector. Generation of transgenic mice and embryo staining was performed as previously described (55). One injection series was performed.
Code.
Scripts used to conduct all analyses can be found here: https://github.com/yinjun111/emera2016pnas.git.
Supplementary Material
Acknowledgments
We thank Heather Adinolfi for technical assistance, Albert Ayoub and Pasko Rakic for insightful discussions on neocortical development, and Tim Nottoli and Carole Pease at the Yale Animal Genomics Service for generating transgenic mice. This work was supported by National Institutes of Health Grants GM094780 (to J.P.N.) and F32 GM106628 (to D.E.), a Brown Coxe Fellowship in the Medical Sciences (to J.Y.), and a National Science Foundation Graduate Research Fellowship (to S.K.R.).
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1603718113/-/DCSupplemental.
References
- 1.Jeong S, et al. The evolution of gene regulation underlies a morphological difference between two Drosophila sister species. Cell. 2008;132(5):783–793. doi: 10.1016/j.cell.2008.01.014. [DOI] [PubMed] [Google Scholar]
- 2.McGregor AP, et al. Morphological evolution through multiple cis-regulatory mutations at a single gene. Nature. 2007;448(7153):587–590. doi: 10.1038/nature05988. [DOI] [PubMed] [Google Scholar]
- 3.Frankel N, et al. Morphological evolution caused by many subtle-effect substitutions in regulatory DNA. Nature. 2011;474(7353):598–603. doi: 10.1038/nature10200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Cleves PA, et al. Evolved tooth gain in sticklebacks is associated with a cis-regulatory allele of Bmp6. Proc Natl Acad Sci USA. 2014;111(38):13912–13917. doi: 10.1073/pnas.1407567111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Sagai T, Hosoya M, Mizushina Y, Tamura M, Shiroishi T. Elimination of a long-range cis-regulatory module causes complete loss of limb-specific Shh expression and truncation of the mouse limb. Development. 2005;132(4):797–803. doi: 10.1242/dev.01613. [DOI] [PubMed] [Google Scholar]
- 6.Chan YF, et al. Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer. Science. 2010;327(5963):302–305. doi: 10.1126/science.1182213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Cooper KL, et al. Patterning and post-patterning modes of evolutionary digit loss in mammals. Nature. 2014;511(7507):41–45. doi: 10.1038/nature13496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wagner GP. Evolutionary innovations and novelties: Let us get down to business! Zool Anz. 2015;256:75–81. [Google Scholar]
- 9.Duque T, Sinha S. What does it take to evolve an enhancer? A simulation-based study of factors influencing the emergence of combinatorial regulation. Genome Biol Evol. 2015;7(6):1415–1431. doi: 10.1093/gbe/evv080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Feschotte C. Transposable elements and the evolution of regulatory networks. Nat Rev Genet. 2008;9(5):397–405. doi: 10.1038/nrg2337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kunarso G, et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat Genet. 2010;42(7):631–634. doi: 10.1038/ng.600. [DOI] [PubMed] [Google Scholar]
- 12.Chuong EB, Rumi MAK, Soares MJ, Baker JC. Endogenous retroviruses function as species-specific enhancer elements in the placenta. Nat Genet. 2013;45(3):325–329. doi: 10.1038/ng.2553. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lynch VJ, Leclerc RD, May G, Wagner GP. Transposon-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in mammals. Nat Genet. 2011;43(11):1154–1159. doi: 10.1038/ng.917. [DOI] [PubMed] [Google Scholar]
- 14.Lynch VJ, et al. Ancient transposable elements transformed the uterine regulatory landscape and transcriptome during the evolution of mammalian pregnancy. Cell Reports. 2015;10(4):551–561. doi: 10.1016/j.celrep.2014.12.052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rakic P. Evolution of the neocortex: A perspective from developmental biology. Nat Rev Neurosci. 2009;10(10):724–735. doi: 10.1038/nrn2719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Molnár Z, et al. Evolution and development of the mammalian cerebral cortex. Brain Behav Evol. 2014;83(2):126–139. doi: 10.1159/000357753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Cheung AFP, et al. The subventricular zone is the developmental milestone of a 6-layered neocortex: Comparisons in metatherian and eutherian mammals. Cereb Cortex. 2010;20(5):1071–1081. doi: 10.1093/cercor/bhp168. [DOI] [PubMed] [Google Scholar]
- 18.Cheung AFP, Pollen AA, Tavare A, DeProto J, Molnár Z. Comparative aspects of cortical neurogenesis in vertebrates. J Anat. 2007;211(2):164–176. doi: 10.1111/j.1469-7580.2007.00769.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Jarvis ED, et al. Avian Brain Nomenclature Consortium Avian brains and a new understanding of vertebrate brain evolution. Nat Rev Neurosci. 2005;6(2):151–159. doi: 10.1038/nrn1606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Belgard TG, et al. Adult pallium transcriptomes surprise in not reflecting predicted homologies across diverse chicken and mouse pallial sectors. Proc Natl Acad Sci USA. 2013;110(32):13150–13155. doi: 10.1073/pnas.1307444110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Striedter GF. The telencephalon of tetrapods in evolution. Brain Behav Evol. 1997;49(4):179–213. doi: 10.1159/000112991. [DOI] [PubMed] [Google Scholar]
- 22.Montiel JF, Vasistha NA, García-Moreno F, Molnár Z. From sauropsids to mammals and back: New approaches to comparative cortical development. J Comp Neurol. 2015;524(3):630–645. doi: 10.1002/cne.23871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Reilly SK, et al. Evolutionary genomics. Evolutionary changes in promoter and enhancer activity during human corticogenesis. Science. 2015;347(6226):1155–1159. doi: 10.1126/science.1260943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Visel A, et al. A high-resolution enhancer atlas of the developing telencephalon. Cell. 2013;152(4):895–908. doi: 10.1016/j.cell.2012.12.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Nord AS, et al. Rapid and pervasive changes in genome-wide enhancer usage during mammalian development. Cell. 2013;155(7):1521–1531. doi: 10.1016/j.cell.2013.11.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lowe CB, et al. Three periods of regulatory innovation during vertebrate evolution. Science. 2011;333(6045):1019–1024. doi: 10.1126/science.1202702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Lowe CB, Clarke JA, Baker AJ, Haussler D, Edwards SV. Feather development genes and associated regulatory innovation predate the origin of Dinosauria. Mol Biol Evol. 2015;32(1):23–28. doi: 10.1093/molbev/msu309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Molyneaux BJ, Arlotta P, Menezes JRL, Macklis JD. Neuronal subtype specification in the cerebral cortex. Nat Rev Neurosci. 2007;8(6):427–437. doi: 10.1038/nrn2151. [DOI] [PubMed] [Google Scholar]
- 29.Irie N, Kuratani S. Comparative transcriptome analysis reveals vertebrate phylotypic period during organogenesis. Nat Commun. 2011;2:248. doi: 10.1038/ncomms1248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Kalinka AT, et al. Gene expression divergence recapitulates the developmental hourglass model. Nature. 2010;468(7325):811–814. doi: 10.1038/nature09634. [DOI] [PubMed] [Google Scholar]
- 31.Lindblad-Toh K, et al. Broad Institute Sequencing Platform and Whole Genome Assembly Team; Baylor College of Medicine Human Genome Sequencing Center Sequencing Team; Genome Institute at Washington University A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011;478(7370):476–482. doi: 10.1038/nature10530. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Emera D, Wagner GP. Transposable element recruitments in the mammalian placenta: Impacts and mechanisms. Brief Funct Genomics. 2012;11(4):267–276. doi: 10.1093/bfgp/els013. [DOI] [PubMed] [Google Scholar]
- 33.Villar D, et al. Enhancer evolution across 20 mammalian species. Cell. 2015;160(3):554–566. doi: 10.1016/j.cell.2015.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Amemiya CT, et al. The African coelacanth genome provides insights into tetrapod evolution. Nature. 2013;496(7445):311–316. doi: 10.1038/nature12027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Smith JJ, et al. Sequencing of the sea lamprey (Petromyzon marinus) genome provides insights into vertebrate evolution. Nat Genet. 2013;45(4):415–421, e1–e2. doi: 10.1038/ng.2568. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Gregory TR, et al. Eukaryotic genome size databases. Nucleic Acids Res. 2007;35(Database issue):D332–D338. doi: 10.1093/nar/gkl828. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Emera D, Wagner GP. Transformation of a transposon into a derived prolactin promoter with function during human pregnancy. Proc Natl Acad Sci USA. 2012;109(28):11246–11251. doi: 10.1073/pnas.1118566109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Pauls S, Goode DK, Petrone L, Oliveri P, Elgar G. Evolution of lineage-specific functions in ancient cis-regulatory modules. Open Biol. 2015;5(11):150079. doi: 10.1098/rsob.150079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Visel A, Minovitsky S, Dubchak I, Pennacchio LA. VISTA Enhancer Browser--A database of tissue-specific human enhancers. Nucleic Acids Res. 2007;35(Database issue):D88–D92. doi: 10.1093/nar/gkl822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Prabhakar S, et al. Human-specific gain of function in a developmental enhancer. Science. 2008;321(5894):1346–1350. doi: 10.1126/science.1159974. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Prabhakar S, et al. Close sequence comparisons are sufficient to identify human cis-regulatory elements. Genome Res. 2006;16(7):855–863. doi: 10.1101/gr.4717506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Carvunis A-R, et al. Proto-genes and de novo gene birth. Nature. 2012;487(7407):370–374. doi: 10.1038/nature11184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Neme R, Tautz D. Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution. BMC Genomics. 2013;14:117. doi: 10.1186/1471-2164-14-117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Neme R, Tautz D. Evolution: Dynamics of de novo gene emergence. Curr Biol. 2014;24(6):R238–R240. doi: 10.1016/j.cub.2014.02.016. [DOI] [PubMed] [Google Scholar]
- 45.Knowles DG, McLysaght A. Recent de novo origin of human protein-coding genes. Genome Res. 2009;19(10):1752–1759. doi: 10.1101/gr.095026.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Cotney J, et al. The evolution of lineage-specific regulatory activities in the human embryonic limb. Cell. 2013;154(1):185–196. doi: 10.1016/j.cell.2013.05.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Shen Y, et al. A map of the cis-regulatory sequences in the mouse genome. Nature. 2012;488(7409):116–120. doi: 10.1038/nature11243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.McLean CY, et al. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol. 2010;28(5):495–501. doi: 10.1038/nbt.1630. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing on JSTOR. J R Statist Soc B. 1995;57(1):289–300. doi: 10.2307/2346101. [DOI] [Google Scholar]
- 50.Langfelder P, Horvath S. WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9(1):559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Kapusta A, et al. Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet. 2013;9(4):e1003470. doi: 10.1371/journal.pgen.1003470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Notwell JH, Chung T, Heavner W, Bejerano G. A family of transposable elements co-opted into developmental enhancers in the mouse neocortex. Nat Commun. 2015;6:6644. doi: 10.1038/ncomms7644. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Siepel A, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15(8):1034–1050. doi: 10.1101/gr.3715005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010;20(1):110–121. doi: 10.1101/gr.097857.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Kothary R, et al. Inducible expression of an hsp68-lacZ hybrid gene in transgenic mice. Development. 1989;105(4):707–714. doi: 10.1242/dev.105.4.707. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.