Skip to main content
Genome Biology and Evolution logoLink to Genome Biology and Evolution
. 2024 Mar 19;16(4):evae052. doi: 10.1093/gbe/evae052

Conserved Noncoding Elements Evolve Around the Same Genes Throughout Metazoan Evolution

Paul Gonzalez 1, Quinn C Hauck 2, Andreas D Baxevanis 3,
Editor: Selene Fernandez-Valverde
PMCID: PMC10988421  PMID: 38502060

Abstract

Conserved noncoding elements (CNEs) are DNA sequences located outside of protein-coding genes that can remain under purifying selection for up to hundreds of millions of years. Studies in vertebrate genomes have revealed that most CNEs carry out regulatory functions. Notably, many of them are enhancers that control the expression of homeodomain transcription factors and other genes that play crucial roles in embryonic development. To further our knowledge of CNEs in other parts of the animal tree, we conducted a large-scale characterization of CNEs in more than 50 genomes from three of the main branches of the metazoan tree: Cnidaria, Mollusca, and Arthropoda. We identified hundreds of thousands of CNEs and reconstructed the temporal dynamics of their appearance in each lineage, as well as determining their spatial distribution across genomes. We show that CNEs evolve repeatedly around the same genes across the Metazoa, including around homeodomain genes and other transcription factors; they also evolve repeatedly around genes involved in neural development. We also show that transposons are a major source of CNEs, confirming previous observations from vertebrates and suggesting that they have played a major role in wiring developmental gene regulatory mechanisms since the dawn of animal evolution.

Keywords: comparative genomics, conserved noncoding elements, homeobox genes, cis-regulatory elements


Significance.

In vertebrates, conserved noncoding elements (CNEs) tend to cluster around homeodomain transcription factors and regulate the expression of these genes. As a result, they can be used to infer the location of hundreds of potential cis-regulatory elements with crucial roles for embryonic development and body plan evolution. Whether this is true outside of vertebrates remains an open question. Here, we identify CNEs in dozens of genomes from three invertebrate phyla, providing genome-wide maps of potential cis-regulatory elements. In addition to providing a wealth of candidate regulatory sequences for future studies, our data uncover general trends in the dynamics of CNE evolution. Importantly, we show that CNEs do cluster around the same genes in all these groups, including homeodomain transcription factors, revealing an underappreciated role for transposable elements as a major source of CNEs.

Introduction

Animal genomes are mostly comprised of noncoding sequences with no known function. Scattered throughout this sea of noncoding DNA are sparsely distributed islands of functional sequences that regulate gene expression. Importantly, enhancers, silencers, and insulators orchestrate the complex spatiotemporal patterns of gene expression necessary for embryonic development. Unlike genes, these regulatory sequences cannot be identified based on sequence alone. Consequently, they are mostly skipped over by genome annotation pipelines, limiting our ability to investigate many important evolutionary and developmental questions that often rely on understanding how changes in expression of highly conserved genes can result in the emergence of new traits. While the DNA sequence of a single species is insufficient to identify cis-regulatory elements and other functional noncoding sequences, they can be identified indirectly by detecting conservation across species, thereby discerning sequences under negative (purifying) selection from the neutrally evolving background. This method, called phylogenetic footprinting (Tagle et al. 1988; Duret and Bucher 1997), was first validated in targeted studies that identified conserved noncoding elements (CNEs) in short genomic regions around genes of interest (Aparicio et al. 1995; Hardison et al. 1997; Shashikant et al. 1998; Brickner et al. 1999; Dubchak et al. 2000; Loots et al. 2000; Wasserman et al. 2000; Bagheri-Fam et al. 2001; Frazer et al. 2001; DeSilva et al. 2002; Cooper et al. 2003; Ghanem et al. 2003; Nobrega et al. 2003; Sabarinadh et al. 2003; Santini et al. 2003; Spitz et al. 2003; Thomas et al. 2003; Frazer et al. 2004; Cooper et al. 2005; King et al. 2005), followed by whole-chromosome (Dermitzakis et al. 2002; Dermitzakis et al. 2003; Dermitzakis et al. 2004) and whole-genome comparisons (Chiaromonte et al. 2003; Bejerano et al. 2004; Sandelin et al. 2004; Woolfe et al. 2004; Shin et al. 2005). These studies focused almost entirely on vertebrates, ranging from intraprimate to human–fish comparisons. Among the variety of CNEs identified to date, ultraconserved elements (Bejerano et al. 2004)—sequences of more than 200 bp having perfect identity between human and mouse—have attracted significant attention because the underlying causes of such extreme conservation are still unclear. Overall, ∼5% of the human genome is under selective constraint, with only 1.5% representing coding sequences (Chiaromonte et al. 2003; Cooper et al. 2004).

Noncoding conservation may, in theory, reveal the location of a wide range of functional elements, including various types of cis-regulatory elements, noncoding RNAs, and structural elements. However, studies have shown that most CNEs are enhancers that are disproportionately involved in regulating developmental genes, notably homeodomain transcription factors. This was based on the observation that CNEs tend to be preferentially located around these genes (Bejerano et al. 2004; Sandelin et al. 2004; Woolfe et al. 2004), a finding later confirmed using functional assays (Boffelli et al. 2003; Frazer et al. 2004; Woolfe et al. 2004; De La Calle-Mustienes et al. 2005; Shin et al. 2005; Pennacchio et al. 2006; Prabhakar et al. 2006).

Targeted searches for known vertebrate CNEs in a variety of invertebrate genomes revealed that small numbers of them are conserved between distantly related phyla and that they act as enhancers for the same developmental gene (Maeso et al. 2013). However, exhaustive whole-genome searches for CNEs outside the vertebrates are limited and have focused exclusively on small-scale comparisons between different species of Drosophila (Bergman and Kreitman 2001; Papatsenko et al. 2006), Drosophila and mosquitoes (Glazov et al. 2005; Siepel et al. 2005; Engström et al. 2007), bees and ants (Rubin et al. 2019; Brody et al. 2020), and nematodes of a single genus, Caenorhabditis (Shabalina and Kondrashov 1999; Siepel et al. 2005; Vavouri et al. 2007). As a result, general trends of CNE evolution across the Metazoa cannot be deduced based solely on these data. Further, while much more whole-genome sequence data from a wide variety of species has become available over the last several years, the sheer amount of computational power required to perform these analyses at scale is a significant limiting factor that has heretofore hindered a more systematic analysis of CNE evolution.

Here, we present a large-scale identification of CNEs in dozens of genomes from three of the most diverse animal phyla: Cnidaria, Mollusca, and Arthropoda. This dataset constitutes a catalog of potential cis-regulatory sequences, providing a foundation for uncovering general trends in the evolutionary dynamics of CNEs and potential lineage-specific departures from these trends. We show that CNEs evolve around the same genes across the Metazoa, including around homeodomain genes and other transcription factors with crucial roles in embryonic development, as well as around genes involved in axon guidance and neural development. We also show that transposons are a major source of CNEs, confirming previous observations from vertebrates and suggesting that they have played a major role in wiring developmental gene regulatory mechanisms since the dawn of animal evolution. Together, these results reveal that similar evolutionary processes in distantly related lineages have led to similar patterns of noncoding conservation across the Metazoa.

Results

CNE Identification in 51 Cnidarian, Mollusc, and Arthropod Genomes

We identified CNEs in 13 cnidarian, 14 mollusc, and 24 arthropod genomes (Supplementary Table 1). These phyla were chosen because they span three of the main branches of the animal tree—the nonbilaterians, Spiralia, and Ecdysozoa—and because they are some of the most diverse and most studied animal groups, as reflected in the availability of published genomes from these phyla. We also sought to represent the diversity found within each phylum, including species pairs of varying distance, from congeneric species separated by tens of millions of years to species that diverged more than 500 Mya. The k-mer-based tool CNEFinder (Ayad et al. 2018) was used to identify CNEs sharing more than 75% similarity over a range of 50 to 2,000 bp between any two species. For each phylum, we searched for CNEs between each possible species pair, resulting in 276, 78, and 91 whole-genome pairwise comparisons for Arthropoda, Cnidaria, and Mollusca, respectively. Potential coding and ribosomal sequences were filtered out, and CNEs from multiple species linked by chains of pairwise matches were clustered into groups of potentially homologous sequences (supplementary fig. S1, Supplementary Material online). The common ancestor of each CNE was inferred using an ancestral character state reconstruction algorithm (Fig. 1) and potentially convergent sequences were filtered out. The range of final CNE counts per genome was 93 to 25,687 (mean 5,180) for arthropods, 2,049 to 110,941 (mean 25,036) for cnidarians, and 1,164 to 96,983 (mean 26,819) for molluscs. The number of CNEs at each filtering step is summarized in Supplementary Table 2. In all three phyla, ∼90% of CNEs are roughly equally divided between introns and intergenic regions; the remaining CNEs located mostly around the transcription start sites (Fig. 2a and supplementary fig. S2, Supplementary Material online). The phylogenetic distribution of CNEs illustrates a hierarchical pattern of noncoding conservation, with ancient sequences that survived since the birth of these phyla, as well as a wealth of more recent, lineage-specific sequences that may be relevant to some defining features of these specific taxa.

Fig. 1.

Fig. 1.

Phylogenetic trees of the a) 24 arthropods, b) 13 cnidarians, and c) 14 molluscs included in the analysis. The numbers at each node represent the number of CNEs (multispecies clusters) inferred to have evolved at that node.

Fig. 2.

Fig. 2.

Overview of CNE characteristics. a) Location of CNEs relative to genes, introns, transcription start sites and transcription termination sites. b) Scatterplot showing the relationship between CNE count and divergence time. Each dot represents a comparison between two genomes. c) Scatterplot showing the relationship between CNE length and divergence time. Each dot represents a CNE. A jitter representation is used to show all CNEs of a given species pair comparison.

The Rate of Noncoding Element Conservation is Similar Between Cnidarians and Molluscs, But Lower in Arthropods

To investigate whether the number of CNEs between any two species is inversely correlated with their evolutionary distance and whether the slope of this relationship would be uniform across different phyla, we collected molecular clock estimates of divergence times from the literature for all nodes in our dataset (Supplementary Table 3), plotting the number of CNEs between every genome pair relative to their evolutionary distance (Fig. 2b). As expected, there is a negative correlation between the number of CNEs and evolutionary distance in cnidarians (R = −0.84, P < 2 × 10−16) and molluscs (R = −0.58, P < 1.5.10−9), with the slope of this relationship being similar for both phyla. Surprisingly, this inverse relationship was not observed in arthropods: the deepest nodes of the tree have at least as many CNEs than more recent ones, and there is no correlation between CNE count and divergence time (R = 0.096, P = 0.12). Additionally, arthropod CNE counts are generally lower than in the other two phyla, even when comparing species pairs with similar evolutionary distances. Interestingly, the number of CNEs reported from published studies in humans and other vertebrates, as well as between two echinoid echinoderms, are in the same range as that of molluscs and cnidarians (Supplementary Table 4). While these studies are not directly comparable with ours and between each other because different methods, filtering steps, lengths, and similarity thresholds were used (Leypold and Speicher 2021), they nevertheless suggest that the rate of noncoding conservation is remarkably similar across animals as distantly related as corals, snails, sea urchins, and humans, while arthropods show a singular demarcation from this shared trend.

While the number of CNEs drops with evolutionary time, the length of these CNEs remains relatively constant, with most falling within a range of approximately 75 to 125 bp regardless of the evolutionary distance considered (Fig. 2c; supplementary fig. S3, Supplementary Material online; supplementary note S1, Supplementary Material online; and Supplementary Table 5). This suggests that they are selected as units of relatively constant size that either remain conserved as a discrete unit or disappear entirely over the course of evolution.

Association With Homeodomain Transcription Factors is a Metazoan-Wide Characteristic of CNEs

In vertebrates, CNEs are located preferentially near homeodomain transcription factor genes, presumably because many of them function as cis-regulatory elements with crucial roles in early development. To determine whether CNEs are also associated with developmental transcription factors and homeodomain genes in other branches of the metazoan tree, we assigned each CNE to its closest gene and ranked genes according to the number of CNEs associated with them (Fig. 3a). The percentage of CNEs closest to a homeodomain gene (Supplementary Table 6) averages 3.3% for Arthropoda, 1.4% for Cnidaria, and 2.6% for Mollusca. The highest percentages of CNEs found near homeodomain genes is 13.3% in arthropods (Apis mellifera), 2.8% in cnidarians (Pocillopora damicornis), and 8.4% in molluscs (Crassostrea gigas). For comparison, the proportion of ultraconserved elements between human and mouse (divergence time ∼70 Mya) located closest to a homeobox is around 8% (21 out of 256, as estimated from Fig. 2 in Bejerano et al. 2004). The proportion of CNEs assigned to a homeodomain gene is higher than the proportion of homeodomain genes in the genome (∼0.5% to 1% of genes in most species) in 19 out of 24 arthropod species, 10 out of 13 cnidarian species, and 12 out of 14 molluscan species (Fig. 3b), suggesting that CNE distribution is skewed toward the vicinity of these genes as a result of a nonneutral evolutionary process. To test this hypothesis, we examined whether genes with more CNEs than average showed overrepresentation of any protein domain (Fisher’s exact test). We then aggregated the results by ranking protein domains according to the number of species in which they were significantly overrepresented. Homeobox is the top-ranked domain in arthropods (10 out of 24 species) and molluscs (8 out of 14 species), and is one of the top domains in cnidarians (5 out of 13 species) (Fig. 4a–c and Supplementary Tables 7 and 8); it is also one of a just a few domain families to be overrepresented in all three phyla (Fig. 4d and Supplementary Table 9). The link with homeodomain genes is most apparent in species with large numbers of CNEs (supplementary fig. S4, Supplementary Material online).

Fig. 3.

Fig. 3.

CNEs are preferentially associated with homeodomain genes. a) Overview of the method illustrating the association of CNEs with their closest gene. b) Log odds ratio of the proportion of CNEs associated with a homeodomain gene as compared to the proportion of homeodomain genes in the genome. Positive values indicate that CNEs are associated with homeodomain genes more frequently than if they were distributed evenly across all genes. Asterisks indicate significant (Fisher’s test adjusted P-value < 0.05) overrepresentation of homeobox in genes with more CNEs than average.

Fig. 4.

Fig. 4.

Protein domain overrepresentation in genes with high numbers of associated CNEs. a–c) Protein domains ranked by number of species in which they are overrepresented in each phylum. d) Venn diagram showing the overlap between overrepresented protein domains in each phylum. Only protein domains overrepresented in at least two species for a given phylum are considered.

We confirmed these results using additional approaches. As regulatory elements are not necessarily located closest to their target gene, we counted CNEs within various size windows (10 to 500 kb) around each gene, testing for protein domain overrepresentation in genes with more CNEs than average within those windows (supplementary fig. S5, Supplementary Material online), and found homeobox to be the most overrepresented domain in all three phyla (supplementary note S2, Supplementary Material online and Supplementary Table 10). Additionally, we calculated the distance between every gene and its closest CNE (Fig. 5), then compared the mean distance to the closest CNE between homeodomain genes and all other genes in the genome (Mann–Whitney U test). The average distance to the closest CNE is significantly lower for homeodomain genes than for the rest of the genome in 11 out of 24 species in arthropods, 6 out of 13 species in cnidarians, and 8 out of 14 species in molluscs (Supplementary Table 11).

Fig. 5.

Fig. 5.

Distance between genes and their closest CNE (as illustrated in a) in the genomes of b) the honeybee Apis mellifera, c) the ant Ooceraea biroi, d) the stony corals Pocillopora damicornis and e) Stylophora pistillata; f) the oyster Crassostrea virginica, and g) the scallop Pecten maximus. Each dot represents a homeobox gene (yellow) or a nonhomeobox gene (gray). The genes are ordered along the x axis in ascending order of distance.

CNEs Evolve in Association With the Same Developmental Transcription Factors and Nervous System-Related Ig-Containing Genes Across Animal Phyla

In addition to their widespread link with homeodomain genes, CNEs are generally associated with the same classes of genes in distinct lineages (Fig. 4 and Supplementary Table 9), with these functional classes falling into two categories. The first includes transcription factors with known roles in development. Overrepresented protein domains characteristic of this category include the helix–loop–helix DNA-binding domain, the Myc-type basic helix–loop–helix domain, and a variety of zinc finger domains, helix–turn–helix motifs, Forkhead, Paired, T-box, and ETS domains. The second category comprises immunoglobulin-like domain-containing genes with functions in axon guidance, neural, and neuronal development. Analysis of the genes with the most CNEs in this category reveals that, in all three phyla, islands of CNEs evolved near the same specific orthologous genes in this category, including semaphorin, netrin, neurotrimin, lachesin, hemicentin, fasciclin, zig-8, and collier. (For lists of genes ranked by number of associated CNEs, see Supplementary Dataset 1.) We speculate that some of these CNEs are under strong purifying selection because they regulate finely tuned mechanisms of nervous system organization.

An intriguing example of CNEs evolving in relation to the same gene in different phyla involves Meis/homothorax. Meis is associated with the largest CNE cluster in both vertebrates (Sandelin et al. 2004) and flies (Glazov et al. 2005). A small-scale case study also identified Meis as the most CNE-rich gene between two sea urchins (Tan et al. 2019). Unexpectedly, Meis also stands out as one of the most CNE-rich genes in our dataset. Very dense clusters of CNEs fill out the intronic space of Meis in arthropods, cnidarians, and molluscs, but these CNEs are only detectable between closely related species (Fig. 6). Meis is the most CNE-rich gene in the two hymenopterans among arthropods; it is also the most CNE-rich gene between the two closely related pairs of scallops and oysters within the molluscs. Among cnidarians, Meis ranks first and second in two of the sea anemones and ranks between 30th and 40th in the stony corals, with dozens of associated CNEs. The presence of islands of CNEs found around the same gene in different lineages suggests that some metazoan-wide shared feature of this genomic region caused it to come under selective constraint in lineages as different as corals, insects, oysters, and humans. The selective pressure responsible for the prevalence of CNEs at this locus may be an ancient feature dating back to their common ancestor, with slow sequence turnover over time resulting in a pattern where only closely related extant species maintain a high enough level of sequence similarity to allow for the detection of CNEs, as proposed by Harmston et al. (2013).

Fig. 6.

Fig. 6.

Distribution of CNEs around Meis/Homothorax in groups of closely related species within the a) Arthropoda (ants and bees), b) Cnidaria (stony corals), and c) Mollusca (scallops and oysters).

While CNEs tend to cluster near the same genes across phyla, our analysis also revealed phylum-specific trends (see Supplementary Table 9 for lists of overrepresented domains specific to each phylum). In cnidarians, many CNEs cluster near transposon-related domains, and we discuss this observation in more detail below. In molluscs, CNEs are found around components of the Wnt signaling pathway, as well as around unexpected genes such as the motor proteins kinesin and myosin. In some cases, clustering of CNEs near certain genes can only be interpreted in the context of the specific genomic evolutionary history of a particular lineage. For example, CNEs are abundant near nicotinic acetylcholine receptors in our mollusc dataset, most notably in the bivalves. Interestingly, these receptors have undergone a massive expansion in molluscs (particularly in bivalves) through successive episodes of retroposition and tandem duplications (Jiao et al. 2019). This expansion may have been accompanied by expansion of associated noncoding elements involved in regulating their expression.

Overall, our results suggest that the functional categories most widely associated with CNEs in metazoans are related to the regulation of embryonic development, neural development, and axon guidance, while genes associated with CNEs only in specific lineages may belong to a wider array of functional classes, reflecting lineage-specific aspects of their genome's evolutionary history.

Transposable Elements are a Major Source of CNEs

Transposable element (TE)-related protein domains (including integrase, reverse transcriptase, and ribonuclease H domains) are among the most common domains overrepresented in cnidarians, where 4.7% of CNEs are closest to a gene containing a TE-related domain. While we cannot unequivocally determine why these CNEs are closer to TE-related domains than expected by chance, one possible explanation is that, in some cases, these CNEs may represent fragments of the noncoding portion of a TE, leading us to further explore the relationship between CNEs and TEs. To that end, we conducted BLAST similarity searches against Repbase and NCBI's nt and nr databases. Total evidence from BLAST searches against all databases suggests that an average of 11% of CNEs in arthropods, 7% of CNEs in cnidarians, and 5% of CNEs in molluscs may be derived from a TE (Fig. 7a). The percentage of CNEs with evidence of TE ancestry is very high in specific species within each phylum. In Arthropoda, it reaches 28% in Apolygus lucorum and 22% in Anopheles aegypti. In Cnidaria, it amounts to 36% in Nematostella vectensis and 30% in Hydra vulgaris, while in Mollusca, it stands at 11% in Lottia gigantea and 10% in Aplysia californica (Supplementary Table 12). Most of these species correspond to ones in which large numbers of TE sequences are available in Repbase, suggesting that these high numbers may reflect their true prevalence.

Fig. 7.

Fig. 7.

Transposable elements are a major source of CNEs. a) Percentage of CNEs with evidence of transposon origin. Each dot represents a genome. b) Proportion of CNEs with Repbase matches to each class of TE. c) Taxonomic distribution of Repbase matches.

Matches resulting from searches against Repbase belong to a wide variety of TE families, with hAT, DIRS, and mariner/Tc1 being the most common (Fig. 7b). The taxonomic distribution of Repbase matches (Fig. 7c) shows that CNEs match most often to TEs from their own phylum, but many of them match TEs from different animal phyla, and even to plants and fungi. In all three phyla, the second ranked taxon is Vertebrata, perhaps due to better availability of reference TE sequences for that group in Repbase.

Highly Similar Noncoding Sequences Across Phyla: Ancient CNEs?

As some of the CNEs in our dataset can be traced back to the common ancestor of their phylum, we investigated whether any of them was even more ancient, and perhaps could even be present in other phyla. We conducted a BLAST similarity search between each CNE in our dataset and the nucleotide database of 21 other metazoan phyla. Surprisingly, a small percentage of CNEs in each of the three phyla has a BLAST hit with at least 75% similarity over more than 50 bp in almost every other animal phylum (Fig. 8), averaging 1.10%, 0.07%, and 0.39% of CNEs for arthropods, cnidarians, and molluscs, respectively. The lower number in cnidarians is consistent with their more distant relationships to most other animals. Conversely, arthropods have a higher percentage of their CNEs mapping to other phyla than cnidarians and molluscs. Strikingly, 6.9% of their CNEs have a match to a chordate sequence, compared to 0.35% for cnidarians and 2.27% for molluscs. This is surprising given that they have an overall lower level of noncoding conservation within the phylum, as shown above. This may be related to the fact that even though arthropods have fewer CNEs overall, their CNEs are older, many of them tracing back to nodes as far back as 500 Mya.

Fig. 8.

Fig. 8.

Heatmap showing the number of CNEs in each phylum (Arthropoda, Cnidaria, and Mollusca) with a BLAST match of 75% similarity over at least 50 bp to a sequence in one of the 21 phyla on the y axis.

Discussion

In this study, we identified hundreds of thousands of CNEs distributed through the genomes of more than 50 species in three phyla. These elements represent some of the sequences that have evolved under the strongest selection pressure in animal evolution, suggesting that they fulfill important functions. Using a unique approach where we identified CNEs in all possible pairs of genomes within each phylum—an approach that allowed us to have many more data points than reported in previous studies that typically align several genomes to a single (human) reference genome—we were able to demonstrate that, despite occupying very distant branches of the metazoan tree, cnidarians, molluscs, and vertebrates have highly similar rates of noncoding conservation, while arthropod genomes distinguish themselves by being relatively CNE-poor. This difference may be caused by variation in branch length across these lineages, which may better correlate with noncoding conservation than chronological time.

Previous studies in vertebrates and in small subsets of insects and nematodes have shown that CNEs are associated with homeodomain transcription factors. By greatly expanding the taxonomic breadth being analyzed, we show that homeodomain consistently stands out as the gene class most strongly associated with CNEs in all animals studied to date. Therefore, this CNE-homeodomain association may reasonably be viewed as a general feature of genomic evolution in metazoans and as an example of recurrent evolution (Maeso et al. 2012). While this association is a conserved feature of animal genomes, most homeodomain-associated CNEs are themselves relatively short-lived. The steep decay of CNE number over time described here implies that the vast majority of CNEs, including homeodomain-associated ones, do not survive longer than the first 200 million years of divergence between any two species. The strong negative selection acting on noncoding sequences associated with these genes results in a continuously renewed pattern where islands of CNEs appear in their vicinity as new branches of the phylogenetic tree originate, taking longer than any other region of the genome to succumb from the constant assault of mutations.

Our results strengthen the expanding body of evidence that transposable elements are a major source of CNEs. An analysis of the opossum genome (Mikkelsen et al. 2007) showed that as many as 16% of all CNEs may have evolved from transposons, speculating that actual rates may be higher. Another study comparing 29 genomes revealed more than 280,000 instances of transposon-derived CNEs, explaining 19% of conserved elements arising since the origins of eutherian mammals 90 million years ago (Lindblad-Toh et al. 2011). Two possibilities can explain the presence of TEs in our CNE dataset. First, some may not represent true CNEs, but instead may have been acquired through recent lateral transfer. Our pipeline includes a step that filters out sequences when their phylogenetic distribution is more congruent with independent evolution than with inheritance from a common ancestor. However, we cannot rule out that this step failed to filter out some independently acquired CNEs. Alternatively, these TE-derived CNEs may have originated through cooption of a TE to fulfill regulatory roles after insertion into their host genome, coming under purifying selection as a result. In addition to the coding fraction of their genome, TEs contain noncoding regulatory sequences necessary for their own expression (Chuong et al. 2017). As a result, TE insertions often result in gene expression changes that in rare cases may be beneficial, resulting in permanent exaptation (Etchegaray et al. 2021). The high prevalence of TE-derived CNEs in our dataset suggests that transposable elements may have a crucial role in spreading sequences with regulatory potential within the genomes of their host and, in rare cases, perhaps even facilitating their jump across species through lateral transfer.

Finally, we observed that some CNEs may be shared between distantly related phyla, including between bilaterians and early branching metazoans such as sponges and ctenophores. Additional studies would be required to rule out whether some (or all) of these highly similar sequences are the result of lateral transfer, other source of convergent evolution, or even cross-species contamination or spurious hits from accidentally similar repetitive sequences, as opposed to true conservation. However, these results are consistent with previous studies that showed that some vertebrate CNEs are shared with distantly related invertebrate phyla (Maeso et al. 2013), and point tantalizingly to a scenario suggesting the potential existence of CNEs that have remained almost unchanged since the origin of the Metazoa in the pre-Cambrian.

Methods

Genome Sampling

We conducted a CNE search within the Cnidaria (13 genomes), Mollusca (14 genomes), and Arthropoda (24 genomes). These taxa were chosen because of the availability of published genomes and as representatives of the three main branches of the animal tree where CNEs have not received much attention: the nonbilaterians, spiralians, and ecdysozoans. Within each phylum, specific genomes were chosen to best-represent the diversity of the clade. Species and accession numbers are listed in Supplementary Table 1.

Genome preprocessing

CNEs were identified using CNEFinder (Ayad et al. 2018). CNEFinder is designed to conduct pairwise comparisons between chromosomes. Most genome assemblies in our dataset are not chromosome-scale and are often fragmented in dozens to thousands of scaffolds. As a result, conducting CNEFinder runs for each possible pair of scaffolds would be impractical. In order to minimize the number of CNEFinder runs, each genome was concatenated into one or several sequences before using as input for CNEFinder. One hundred “N” characters were added at the junction between scaffolds in order to avoid detecting erroneous CNEs overlapping a scaffold boundary. The reverse complement of each genome was generated for running separately.

CNEFinder takes as input an “exon” file containing the coordinates of all coding sequences to be excluded from the search. Exon files were generated by retrieving relevant columns from gff files, removing duplicate exon coordinates from redundant annotations when necessary, and converting exon coordinates to match the concatenated FASTA files.

CNEFinder Runs

CNEFinder was run for each possible pairwise comparison of species within each of the three phyla, resulting in 78, 91, and 276 whole-genome pairwise comparisons for Cnidaria, Mollusca and Arthropoda, respectively. CNEFinder was set to identify CNEs with a minimum similarity of 75% and length >50 bp with the parameters: -y 1 -z 1 -a 0 -b 0 -c 0 -d 0 -t 0.75 -l 50 -T 56. We chose these criteria because they correspond to those of sequences under selection between human and mouse as determined by Chiaromonte et al. (2003). It should be noted that using these criteria, CNEFinder accept sequences shorter than 50 bp as long as one CNE in the pair is longer than 50 bp and as long as the 75% similarity threshold is satisfied. CNEFinder output corresponding to each species pair was combined when needed (for large genomes that were run in several fragments), and output from forward and reverse complement runs were merged.

When the same CNE is identified across multiple CNEFinder runs (between different pairs of species), the precise boundaries of the conserved sequence may vary between comparisons. CNEs with overlapping coordinates across multiple CNEFinder runs were aggregated together in order to generate a set of unique, nonoverlapping CNE sequences for each genome. The final boundary of each CNE was determined by finding the longest sequence that includes all CNEs originating from that locus.

BLAST Filtering

A sequence similarity search was conducted for each CNE using BLASTN against nt, with the option -max_target_seqs 20 and a E-value threshold of 0.01. Additionally, a separate BLASTN search was conducted against each of 21 animal phyla; these data were also used to search for potential ancient CNEs conserved between different phyla. Ribosomal RNA sequences were filtered out from the CNE dataset by removing sequence matches containing the terms “ribosom” and “rRNA”. ORF prediction was conducted using TransDecoder.LongOrfs from Transdecoder (version 5.5.0) using option -m 15. For each CNE, the protein sequence of the best ORF was used as query for a BLASTP search against nr. CNEs with a BLASTP match below an E-value threshold of 0.01 were filtered out.

Multispecies CNE Clustering

CNEFinder was run for all possible pairs of species within each phylum. As a result, it is possible to cluster CNEs in groups spanning multiple species, assuming that if a CNE is conserved between species A and B, and between species B and C, it is also conserved between A and C. While in some cases, the CNEFinder run between species A and C may identify this conservation, it is also possible that conservation of this locus between these species is lower than the CNEFinder threshold of 75%. As a result, not all CNEs within a multispecies cluster are within 75% similarity of each other. It is also important to note that a CNE in species A may be conserved with multiple loci in species B. As a result, clusters may contain more than one sequence from each species. Clustering was conducted first by finding all overlapping CNE coordinates in pairs of CNEFinder runs with one species in common, and then by recursively clustering CNEs linked by chains of CNEFinder matches.

Ancestral Character Reconstruction and Filtering of Potentially Convergent Sequences

Most studies of CNEs assume that noncoding sequences with high levels of sequence similarity between two genomes separated by sufficient evolutionary time are conserved. Our approach, where CNEs are identified between all possible pairs of genomes and clustered across multiple species, allows us to use ancestral character reconstruction methods to attempt to determine if each CNE is truly inherited from a common ancestor or if it was instead acquired through other means such as convergent evolution or lateral transfer.

CNEs present in only two sister species and absent in every other species were assumed to have originated from their last common ancestors. For all other CNEs, presence or absence at each node of the tree was reconstructed using the maximum-likelihood tool PASTML (Ishikawa et al. 2019). Each CNE was treated as a discrete character coded either as present or absent in any species. We used the output of PASTML to group CNEs into three groups: (i) CNEs present in at least one node of the tree and only in descendants of that node (unambiguously conserved); (ii) CNEs present in at least one node of the tree but also present at one or more tips of the tree which are not descendants of that node (both conserved and potentially convergent); or (iii) CNEs only present at tips, absent at all nodes of the tree (convergent). CNEs identified as convergent (the third group) were excluded from the rest of the analysis. Sequences and coordinates of CNEs after filtering are available in Supplementary Datasets 2 and 3.

Determining the Location of CNEs

The HOMER (Heinz et al. 2010) (v4.11) script anotate_peaks.pl was used to determine the location of CNEs relative to genes, classifying them as being either intronic, intergenic, near the transcription start site (≤1 kb 5′ of transcription start), or near the transcription termination site (≤1 kb 3′ of transcription stop). Mytilus galloprovincialis and Cloeon dipterum were not included in this analysis because the formatting of their GFF annotation files was not compatible with the analysis script.

CNE Numbers and Lengths Relative to Evolutionary Distance

We conducted a literature search to determine approximate divergence times between every species in our dataset. Distances and references are summarized in Supplementary Table 3. We used this data to determine the relationships between number of CNEs, CNE length, and evolutionary distance.

Association Between CNEs and Genes and Identification of Protein Domains Preferentially Associated With High Numbers of CNEs

CNEs were associated with the closest gene (either 5′ or 3′) in the genome. Protein domain annotations were generated using InterProScan (v 5.60-92.0). Protein domains present in genes with more associated CNEs than average were tested for overrepresentation using a one-tailed Fisher’s exact test and Bonferroni P-value adjustment. Domains with adjusted P-value < 0.05 were considered significantly overrepresented (Supplementary Dataset 4). Additionally, CNEs were counted in windows of various sizes around each gene. We used windows sizes of 10, 25, 50, 75, 100, 200, 300, 400, and 500 kb. Protein domain overrepresentation analysis was conducted as described above. Distance between each gene and its closest CNE was calculated by identifying the CNE closest to any exon of the gene. A Mann–Whitney U test was used to determine if the mean distance of homeodomain genes to their closest CNE was different from all other genes in the genome.

TE Analysis

Evidence for similarity between CNE sequences and transposable elements was gathered from three different sources. First, BLASTN search was conducted between every CNE and Repbase (Bao et al. 2015) v26.08. Additionally, the BLAST results used for CNE filtering (see above) were queried for the terms “transposable element”, “transposon”, “mobile element”, “reverse transcriptase”, “transposase”, and “integrase”. Finally, CNEs located closest to a TE-associated protein domain were identified. The number of CNEs identified using each method is shown in Supplementary Table 12.

Supplementary Material

evae052_Supplementary_Data

Acknowledgments

This work utilized the Biowulf high-performance supercomputing resource of the Center for Information Technology at the National Institutes of Health (https://hpc.nih.gov). We thank Michael Connelly, Travis Moreland, Tyra Wolfsberg, and Suiyuan Zhang for developing and implementing the CNE track hubs and the accompanying web interface.

Contributor Information

Paul Gonzalez, Center for Genomics and Data Science Research, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.

Quinn C Hauck, Center for Genomics and Data Science Research, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.

Andreas D Baxevanis, Center for Genomics and Data Science Research, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.

Supplementary Material

Supplementary material is available at Genome Biology and Evolution online.

Funding

This research was supported by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health (ZIA HG000140 to A.D.B.).

Data Availability

The data (including supplementary tables) and code underlying this article are available on Figshare (https://figshare.com/projects/CNEs_across_the_Metazoa/172623) and github (https://github.com/paulgzlz/CNE_scripts). The distribution of CNEs for 31 metazoan genomes, with links to their respective UCSC track hubs are publicly available through a web portal located at https://research.nhgri.nih.gov/manuscripts/Gonzalez/index.shtml.

Literature Cited

  1. Aparicio  S, Morrison  A, Gould  A, Gilthorpe  J, Chaudhuri  C, Rigby  P, Krumlauf  R, Brenner  S. Detecting conserved regulatory elements with the model genome of the Japanese puffer fish, Fugu rubripes. Proc Natl Acad Sci. 1995:92(5):1684–1688. 10.1073/pnas.92.5.1684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Ayad  LA, Pissis  SP, Polychronopoulos  D. CNEFinder: finding conserved non-coding elements in genomes. Bioinformatics. 2018:34(17):i743–i747. 10.1093/bioinformatics/bty601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bagheri-Fam  S, Ferraz  C, Demaille  J, Scherer  G, Pfeifer  D. Comparative genomics of the SOX9 region in human and Fugu rubripes: conservation of short regulatory sequence elements within large intergenic regions. Genomics. 2001:78(1–2):73–82. 10.1006/geno.2001.6648. [DOI] [PubMed] [Google Scholar]
  4. Bao  W, Kojima  KK, Kohany  O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob DNA.  2015:6(1):1–6. 10.1186/s13100-015-0041-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bejerano  G, Pheasant  M, Makunin  I, Stephen  S, Kent  WJ, Mattick  JS, Haussler  D. Ultraconserved elements in the human genome. Science. 2004:304(5675):1321–1325. 10.1126/science.1098119. [DOI] [PubMed] [Google Scholar]
  6. Bergman  CM, Kreitman  M. Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences. Genome Res. 2001:11(8):1335–1345. 10.1101/gr.178701. [DOI] [PubMed] [Google Scholar]
  7. Boffelli  D, McAuliffe  J, Ovcharenko  D, Lewis  KD, Ovcharenko  I, Pachter  L, Rubin  EM. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science. 2003:299(5611):1391–1394. 10.1126/science.1081331. [DOI] [PubMed] [Google Scholar]
  8. Brickner  AG, Koop  BF, Aronow  BJ, Wiginton  DA. Genomic sequence comparison of the human and mouse adenosine deaminase gene regions. Mamm Genome. 1999:10(2):95–101. 10.1007/s003359900951. [DOI] [PubMed] [Google Scholar]
  9. Brody  T, Yavatkar  A, Kuzin  A, Odenwald  WF. Ultraconserved non-coding DNA within Diptera and hymenoptera. G3 (Bethesda). 2020:10(9):3015–3024. 10.1534/g3.120.401502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Chiaromonte  F, Weber  RJ, Roskin  KM, Diekhans  M, Kent  WJ, Haussler  D. The share of human genomic DNA under selection estimated from human-mouse genomic alignments. Cold Spring Harb Symp Quant Biol. 2003:68:245–254. 10.1101/sqb.2003.68.245. [DOI] [PubMed] [Google Scholar]
  11. Chuong  EB, Elde  NC, Feschotte  C. Regulatory activities of transposable elements: from conflicts to benefits. Nat Rev Genet. 2017:18(2):71–86. 10.1038/nrg.2016.139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cooper  GM, Brudno  M, Green  ED, Batzoglou  S, Sidow  A, Program  NCS. Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes. Genome Res. 2003:13(5):813–820. 10.1101/gr.1064503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Cooper  GM, Brudno  M, Stone  EA, Dubchak  I, Batzoglou  S, Sidow  A. Characterization of evolutionary rates and constraints in three mammalian genomes. Genome Res. 2004:14(4):539–548. 10.1101/gr.2034704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Cooper  GM, Stone  EA, Asimenos  G, Green  ED, Batzoglou  S, Sidow  A. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 2005:15(7):901–913. 10.1101/gr.3577405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. De La Calle-Mustienes  E, Feijóo  CG, Manzanares  M, Tena  JJ, Rodríguez-Seguel  E, Letizia  A, Allende  ML, Gómez-Skarmeta  JL. A functional survey of the enhancer activity of conserved non-coding sequences from vertebrate Iroquois cluster gene deserts. Genome Res. 2005:15(8):1061–1072. 10.1101/gr.4004805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Dermitzakis  ET, Kirkness  E, Schwarz  S, Birney  E, Reymond  A, Antonarakis  SE. Comparison of human chromosome 21 conserved nongenic sequences (CNGs) with the mouse and dog genomes shows that their selective constraint is independent of their genic environment. Genome Res. 2004:14(5):852–859. 10.1101/gr.1934904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Dermitzakis  ET, Reymond  A, Lyle  R, Scamuffa  N, Ucla  C, Deutsch  S, Stevenson  BJ, Flegel  V, Bucher  P, Jongeneel  CV. Numerous potentially functional but non-genic conserved sequences on human chromosome 21. Nature. 2002:420(6915):578–582. 10.1038/nature01251. [DOI] [PubMed] [Google Scholar]
  18. Dermitzakis  ET, Reymond  A, Scamuffa  N, Ucla  C, Kirkness  E, Rossier  C, Antonarakis  SE. Evolutionary discrimination of mammalian conserved non-genic sequences (CNGs). Science. 2003:302(5647):1033–1035. 10.1126/science.1087047. [DOI] [PubMed] [Google Scholar]
  19. DeSilva  U, Elnitski  L, Idol  JR, Doyle  JL, Gan  W, Thomas  JW, Schwartz  S, Dietrich  NL, Beckstrom-Sternberg  SM, McDowell  JC. Generation and comparative analysis of 3.3 Mb of mouse genomic sequence orthologous to the region of human chromosome 7q11. 23 implicated in Williams syndrome. Genome Res. 2002:12(1):3–15. 10.1101/gr.214802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Dubchak  I, Brudno  M, Loots  GG, Pachter  L, Mayor  C, Rubin  EM, Frazer  KA. Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Res. 2000:10(9):1304–1306. 10.1101/gr.142200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Duret  L, Bucher  P. Searching for regulatory elements in human noncoding sequences. Curr Opin Struct Biol. 1997:7(3):399–406. 10.1016/S0959-440X(97)80058-9. [DOI] [PubMed] [Google Scholar]
  22. Engström  PG, Sui  SJH, Drivenes  Ø, Becker  TS, Lenhard  B. Genomic regulatory blocks underlie extensive microsynteny conservation in insects. Genome Res. 2007:17(12):1898–1908. 10.1101/gr.6669607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Etchegaray  E, Naville  M, Volff  J-N, Haftek-Terreau  Z. Transposable element-derived sequences in vertebrate development. Mob DNA.  2021:12(1):1–24. 10.1186/s13100-020-00229-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Frazer  KA, Sheehan  JB, Stokowski  RP, Chen  X, Hosseini  R, Cheng  J-F, Fodor  SP, Cox  DR, Patil  N. Evolutionarily conserved sequences on human chromosome 21. Genome Res. 2001:11(10):1651–1659. 10.1101/gr.198201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Frazer  KA, Tao  H, Osoegawa  K, de Jong  PJ, Chen  X, Doherty  MF, Cox  DR. Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional. Genome Res. 2004:14(3):367–372. 10.1101/gr.1961204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Ghanem  N, Jarinova  O, Amores  A, Long  Q, Hatch  G, Park  BK, Rubenstein  JL, Ekker  M. Regulatory roles of conserved intergenic domains in vertebrate Dlx bigene clusters. Genome Res. 2003:13(4):533–543. 10.1101/gr.716103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Glazov  EA, Pheasant  M, McGraw  EA, Bejerano  G, Mattick  JS. Ultraconserved elements in insect genomes: a highly conserved intronic sequence implicated in the control of homothorax mRNA splicing. Genome Res. 2005:15(6):800–808. 10.1101/gr.3545105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Hardison  RC, Oeltjen  J, Miller  W. Long human–mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. Genome Res. 1997:7(10):959–966. 10.1101/gr.7.10.959. [DOI] [PubMed] [Google Scholar]
  29. Harmston  N, Barešić  A, Lenhard  B. The mystery of extreme non-coding conservation. Philos Trans R Soc B Biol Sci. 2013:368(1632):20130021. 10.1098/rstb.2013.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Heinz  S, Benner  C, Spann  N, Bertolino  E, Lin  YC, Laslo  P, Cheng  JX, Murre  C, Singh  H, Glass  CK. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010:38(4):576–589. 10.1016/j.molcel.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Ishikawa  SA, Zhukova  A, Iwasaki  W, Gascuel  O. A fast likelihood method to reconstruct and visualize ancestral scenarios. Mol Biol Evol. 2019:36(9):2069–2085. 10.1093/molbev/msz131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Jiao  Y, Cao  Y, Zheng  Z, Liu  M, Guo  X. Massive expansion and diversity of nicotinic acetylcholine receptors in lophotrochozoans. BMC Genomics. 2019:20(1):1–15. 10.1186/s12864-019-6278-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. King  DC, Taylor  J, Elnitski  L, Chiaromonte  F, Miller  W, Hardison  RC. Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. Genome Res. 2005:15(8):1051–1060. 10.1101/gr.3642605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Leypold  NA, Speicher  MR. Evolutionary conservation in noncoding genomic regions. Trends Genet. 2021:37(10):903–918. 10.1016/j.tig.2021.06.007. [DOI] [PubMed] [Google Scholar]
  35. Lindblad-Toh  K, Garber  M, Zuk  O, Lin  MF, Parker  BJ, Washietl  S, Kheradpour  P, Ernst  J, Jordan  G, Mauceli  E. A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011:478(7370):476–482. 10.1038/nature10530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Loots  GG, Locksley  RM, Blankespoor  CM, Wang  Z-E, Miller  W, Rubin  EM, Frazer  KA. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science. 2000:288(5463):136–140. 10.1126/science.288.5463.136. [DOI] [PubMed] [Google Scholar]
  37. Maeso  I, Irimia  M, Tena  JJ, Casares  F, Gómez-Skarmeta  JL. Deep conservation of cis-regulatory elements in metazoans. Philos Trans R Soc B Biol Sci. 2013:368(1632):20130020. 10.1098/rstb.2013.0020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Maeso  I, Roy  SW, Irimia  M. Widespread recurrent evolution of genomic features. Genome Biol Evol. 2012:4(4):486–500. 10.1093/gbe/evs022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Mikkelsen  TS, Wakefield  MJ, Aken  B, Amemiya  CT, Chang  JL, Duke  S, Garber  M, Gentles  AJ, Goodstadt  L, Heger  A. Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature. 2007:447(7141):167–177. 10.1038/nature05805. [DOI] [PubMed] [Google Scholar]
  40. Nobrega  MA, Ovcharenko  I, Afzal  V, Rubin  EM. Scanning human gene deserts for long-range enhancers. Science. 2003:302(5644):413–413. 10.1126/science.1088328. [DOI] [PubMed] [Google Scholar]
  41. Papatsenko  D, Kislyuk  A, Levine  M, Dubchak  I. Conservation patterns in different functional sequence categories of divergent Drosophila species. Genomics. 2006:88(4):431–442. 10.1016/j.ygeno.2006.03.012. [DOI] [PubMed] [Google Scholar]
  42. Pennacchio  LA, Ahituv  N, Moses  AM, Prabhakar  S, Nobrega  MA, Shoukry  M, Minovitsky  S, Dubchak  I, Holt  A, Lewis  KD. In vivo enhancer analysis of human conserved non-coding sequences. Nature. 2006:444(7118):499–502. 10.1038/nature05295. [DOI] [PubMed] [Google Scholar]
  43. Prabhakar  S, Poulin  F, Shoukry  M, Afzal  V, Rubin  EM, Couronne  O, Pennacchio  LA. Close sequence comparisons are sufficient to identify human cis-regulatory elements. Genome Res. 2006:16(7):855–863. 10.1101/gr.4717506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Rubin  BE, Jones  BM, Hunt  BG, Kocher  SD. Rate variation in the evolution of non-coding DNA associated with social evolution in bees. Philos Trans R Soc B. 2019:374(1777):20180247. 10.1098/rstb.2018.0247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Sabarinadh  C, Subramanian  S, Mishra  RK. Extreme conservation of non-repetitive non-coding regions near HoxD complex of vertebrates. Genome Biol. 2003:4(4):1–14. 10.1186/gb-2003-4-4-p2. [DOI] [Google Scholar]
  46. Sandelin  A, Bailey  P, Bruce  S, Engström  PG, Klos  JM, Wasserman  WW, Ericson  J, Lenhard  B. Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes. BMC Genomics. 2004:5(1):99. 10.1186/1471-2164-5-99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Santini  S, Boore  JL, Meyer  A. Evolutionary conservation of regulatory elements in vertebrate Hox gene clusters. Genome Res.  2003:13(6a):1111–1122. 10.1101/gr.700503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Shabalina  SA, Kondrashov  AS. Pattern of selective constraint in C. elegans and C. briggsae genomes. Genet Res. 1999:74(1):23–30. 10.1017/S0016672399003821. [DOI] [PubMed] [Google Scholar]
  49. Shashikant  CS, Kim  CB, Borbély  MA, Wang  WC, Ruddle  FH. Comparative studies on mammalian Hoxc8 early enhancer sequence reveal a baleen whale-specific deletion of a cis-acting element. Proc Natl Acad Sci  1998:95(26):15446–15451. 10.1073/pnas.95.26.15446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Shin  JT, Priest  JR, Ovcharenko  I, Ronco  A, Moore  RK, Burns  CG, MacRae  CA. Human-zebrafish non-coding conserved elements act in vivo to regulate transcription. Nucleic Acids Res. 2005:33(17):5437–5445. 10.1093/nar/gki853. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Siepel  A, Bejerano  G, Pedersen  JS, Hinrichs  AS, Hou  M, Rosenbloom  K, Clawson  H, Spieth  J, Hillier  LW, Richards  S. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005:15(8):1034–1050. 10.1101/gr.3715005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Spitz  F, Gonzalez  F, Duboule  D. A global control region defines a chromosomal regulatory landscape containing the HoxD cluster. Cell. 2003:113(3):405–417. 10.1016/S0092-8674(03)00310-6. [DOI] [PubMed] [Google Scholar]
  53. Tagle  DA, Koop  BF, Goodman  M, Slightom  JL, Hess  DL, Jones  RT. Embryonic ε and γ globin genes of a prosimian primate (Galago crassicaudatus): nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J Mol Biol. 1988:203(2):439–455. 10.1016/0022-2836(88)90011-3. [DOI] [PubMed] [Google Scholar]
  54. Tan  G, Polychronopoulos  D, Lenhard  B. CNEr: a toolkit for exploring extreme noncoding conservation. PLoS Comput Biol. 2019:15(8):e1006940. 10.1371/journal.pcbi.1006940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Thomas  J, Touchman  J, Blakesley  R, Bouffard  G, Beckstrom-Sternberg  S, Margulies  E, Blanchette  M, Siepel  A, Thomas  P, McDowell  J. Comparative analyses of multi-species sequences from targeted genomic regions. Nature. 2003:424(6950):788–793. 10.1038/nature01858. [DOI] [PubMed] [Google Scholar]
  56. Vavouri  T, Walter  K, Gilks  WR, Lehner  B, Elgar  G. Parallel evolution of conserved non-coding elements that target a common set of developmental regulatory genes from worms to humans. Genome Biol. 2007:8(2):R15. 10.1186/gb-2007-8-2-r15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Wasserman  WW, Palumbo  M, Thompson  W, Fickett  JW, Lawrence  CE. Human–mouse genome comparisons to locate regulatory sites. Nat Genet. 2000:26(2):225–228. 10.1038/79965. [DOI] [PubMed] [Google Scholar]
  58. Woolfe  A, Goodson  M, Goode  DK, Snell  P, McEwen  GK, Vavouri  T, Smith  SF, North  P, Callaway  H, Kelly  K. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 2004:3(1):e7. 10.1371/journal.pbio.0030007. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

evae052_Supplementary_Data

Data Availability Statement

The data (including supplementary tables) and code underlying this article are available on Figshare (https://figshare.com/projects/CNEs_across_the_Metazoa/172623) and github (https://github.com/paulgzlz/CNE_scripts). The distribution of CNEs for 31 metazoan genomes, with links to their respective UCSC track hubs are publicly available through a web portal located at https://research.nhgri.nih.gov/manuscripts/Gonzalez/index.shtml.


Articles from Genome Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES