Abstract
There is currently little information about the evolution of gene clusters, genome architectures and karyotypes in early branching animals. Slowly evolving anthozoan cnidarians can be particularly informative about the evolution of these genome features. Here we report chromosome-level genome assemblies of two related anthozoans, the sea anemones Nematostella vectensis and Scolanthus callimorphus. We find a robust set of 15 chromosomes with a clear one-to-one correspondence between the two species. Both genomes show chromosomal conservation, allowing us to reconstruct ancestral cnidarian and metazoan chromosomal blocks, consisting of at least 19 and 16 ancestral linkage groups, respectively. We show that, in contrast to Bilateria, the Hox and NK clusters of investigated cnidarians are largely disintegrated, despite the presence of staggered hox/gbx expression in Nematostella. This loss of microsynteny conservation may be facilitated by shorter distances between cis-regulatory sequences and their cognate transcriptional start sites. We find no clear evidence for topologically associated domains, suggesting fundamental differences in long-range gene regulation compared to vertebrates. These data suggest that large sets of ancestral metazoan genes have been retained in ancestral linkage groups of some extant lineages; yet, higher order gene regulation with associated 3D architecture may have evolved only after the cnidarian-bilaterian split.
Subject terms: Evolutionary genetics, Genome, Molecular evolution, Genome evolution
Slowly evolving cnidarians are useful models to study genome architecture. This study shows that sea anemones have a high degree of chromosomal macrosynteny, but poor microsynteny conservation. This is correlated with a small genome size and short distances of cis-regulatory elements to genes.
Introduction
Comparative genomics and epigenomics have provided fundamental insights into the evolution of gene regulation in multicellular organisms. Among basally branching animals, this led to the surprising discovery of conserved gene repertoires1–3, of microsyntenic gene clusters and patterns of DNA methylation and histone modification codes4–6. Recent applications of long-read sequencing technologies and proximity ligation methods such as Hi-C facilitated the genome assembly on chromosomal level of many organisms. These chromosome-level genome assemblies have presented the opportunity to compare the content and localization of homologous genes between distantly related species and the reconstruction of ancestral linkage groups. In vertebrates and several other bilaterians, topological analysis has revealed three-dimensional chromosomal architecture7, organized as chromosomal compartments and topologically associated domains (TADs)8. The boundaries of TADs have been shown to act as barriers of gene regulation9–12. Analyses on chromosome evolution and three-dimensional genome structure have mainly focused on bilaterians, and it remains unclear whether non-bilaterian animals share the topological signatures of bilaterian chromosomes. In this regard, studying Cnidaria, the sister clade to Bilateria, is a crucial step to understanding the evolution of animal genomes.
Cnidaria constitutes a large clade of basally branching Metazoa, dating back between 590 and 690 Mya13–15. Their robust phylogenetic position as the sister group to Bilateria makes them the key group to study the evolution of bilaterian features, such as axis organization, mesoderm formation and central nervous system development16. The starlet sea anemone, Nematostella vectensis (Fig. 1a—Hexacorallia; Actiniaria; Edwardsiidae), has been developed into an important model organism17–19 and in 2007 became the first non-bilaterian animal to have a draft scaffold-level genome assembled1. The Nematostella genome revealed uncanny conservation of gene content to vertebrates as well as the first observations of large-scale macrosyntenic conservation by way of comparisons with human chromosomes to Nematostella scaffolds1.
Since the release of the Nematostella genome, genomes of the representatives of all five cnidarian classes have become available2,20–27 providing valuable insight into various aspects of the cnidarian gene complement and genome organization. However, these genomes originated from distantly-related species, and few genomic studies of cnidarians have sought to search for genomic conservation signals of recently-diverged species, a cornerstone of comparative genomics28. For example, no genome sequence of an edwardsiid sea anemone has been reported, aside from that of Nematostella. Another sea anemone in the family Edwardsiidae is the “worm sea anemone” Scolanthus callimorphus (Fig. 1b), dwelling in European intertidal zones29,30, which according to our molecular clock calculations has separated from Nematostella vectensis ~174 Mio years ago, to date its closest sequenced relative (Supplementary Fig. 1, see Materials and Methods for details).
Here, we report chromosome-level genome assemblies and new gene models for the model organism Nematostella vectensis and of the closely-related species Scolanthus callimorphus. We show that the macrosyntenic localization of orthologous genes across the chromosomes are highly conserved across many cnidarians and bilaterians, allowing us to reconstruct the minimal set of ancestral eumetazoan chromosomal linkage blocks. Notably, the macrosyntenic conservation contrasts with the loss of most of the microsyntenic blocks, including the integrity of the Hox cluster. Moreover, in Nematostella, which like all non-bilaterians lacks the TAD boundary protein CTCF4,31, HiC analyses did not reveal any obvious TAD-like structures, as known in Bilateria. We propose that the proximity of most cis-regulatory regions to their target genes is an ancestral metazoan feature, whereas large-scale three-dimensional structures, such as TADs, only became necessary in bilaterians with larger genomes and more distantly located enhancers.
Results
High quality chromosome-level assemblies of two Edwardsiid genomes
Using short-read sequencing and a k-mer coverage model, we estimated the genome length of Nematostella at 244 Mb (Supplementary Fig. 2), which is substantially shorter than previously suggested at 450 Mb1. This discrepancy could be partly attributed to the previous use of four haplotypes in sequencing, lower coverage and read length. The genome of the sea anemone Exaiptasia pallida is similar in length to Nematostella32, while the estimated 414 Mb of the Scolanthus genome is at present the largest sequenced sea anemone genome (Fig. 1c). Using PacBio long-read sequencing and high-throughput conformation capture (Hi-C), we then assembled chromosome-level Nematostella and Scolanthus genomes, which surpass the quality of the original Nematostella genome in terms of contiguity, accuracy and mappability (see Supplementary Notes, Supplementary Figs. 2,3 for details).
Gene completeness as measured by alignment of single-copy pan-metazoan genes to the assemblies using the BUSCO method33 was comparable to the previous assembly (Supplementary Fig. 2i), however in order to completely assess this, we generated new gene models. Using a combination of IsoSeq and RNASeq data, we identified 24,525 gene models and 36,280 transcripts (see Supplementary Notes for details). BUSCO analysis showed that the gene set was more complete than a previously-generated gene set34. In addition, the new gene models showed a better mappability to a publicly available single-cell RNA-Seq data set35,36 (Supplementary Fig. 4).
To facilitate the usage of the newly assembled genomes, we established a publicly accessible genome browser. Both new genome assemblies and associated data are available for browsing, downloading, and BLAST at SIMRbase (https://simrbase.stowers.org). The Nematostella vectensis genome assembly, referred to as Nvec200, has an abundance of aligned track data, including the newly generated gene models, a large collection of published RNAseq and ChIP-seq analyses, as well as 145 ultra-conserved non-coding elements (UCNEs) shared between Nematostella and Scolanthus (Supplementary Notes; Supplementary data file 8).
Comparison of Nematostella chromosomes to metazoan genomes
We identified 15 chromosomes in the new genome assemblies of Nematostella and Scolanthus, and we numbered these chromosomes from 1 to 15 according to decreasing size (Fig. 1d, e). This is in line with the previous estimates based on the number of Nematostella metaphase plates1 and the analysis of Nematostella chromosome spreads37. Most chromosomes, according to their homologous pair, corresponded in length but are much larger in Scolanthus (Fig. 4d). This was accounted for by a large fraction of unclassified and potentially lineage-specific repeat sequences (Supplementary data file 1, Supplementary Notes). No obvious heteromorphic sex chromosomes could be identified based on read mapping depth, or from the level of heterochromatin usually associated with non-recombinant sex chromosomes. Ascertaining the sex-determining regions of the genome will require more data and detailed analyses.
Next we wished to determine the extent to which the Nematostella and Scolanthus chromosomes exhibit conservation of gene content and order (micro- and macrosynteny). Indeed, each of the 15 chromosomes of both species shared a majority of orthologous genes with a single corresponding chromosome in the other species (Fig. 2a). We found that 8117 of 8692 mutual best BLAST hits between Nematostella and Scolanthus were retained on their respective chromosomes, implying a one-to-one homology between all 15 chromosomes. However, gene order was largely lost from the most recent common ancestor (MRCA), which we estimate to have diverged approximately 174 Mya (Supplementary Fig. 1).
To assess the macrosyntenic conservation between anthozoans we compared the Nematostella chromosomes to those of the sea anemone Exaiptasia pallida, the stony coral Acropora millepora and the soft coral Xenia sp. (Fig. 2a). While both Exaiptasia and Acropora are only assembled on the scaffold level, we observed that the gene content of the scaffolds suggest a similar karyotype to Nematostella. The chromosome-level genome assembly of the octocoral soft coral Xenia sp. also appears to have 15 chromosome-scale scaffolds20. However, despite the apparent identical number of chromosomes, the chromosome-scale scaffolds of Xenia showed surprisingly little homology to those of the hexacoral Nematostella. We discovered many translocations and fusions, with only three chromosomes demonstrating a one-to-one relationship (Fig. 2a).
Notably, the hydrozoan Hydra vulgaris, which belongs to the clade of Medusozoa, also has 15 chromosomes38. While our analyses revealed many ancestral linkage groups in the comparison with Hydra, we also detected numerous chromosomal rearrangements, including splits and fusions (Fig. 2a, Supplementary Fig. 5). Specifically, we observe that Nematostella chromosomes 2 and 3 are partially or completely split in other cnidarian genomes. By comparison, the scyphozoan jellyfish Rhopilema has 21 chromosomes, supported with karyotype images21,39. We observed that all Rhopilema and Nematostella chromosomes show a clear 1-to-1, 1-to-2 or, in a single case, a 1-to-3 macrosyntenic correspondence, which can be traced to the ancestral linkage groups (Supplementary Figs. 5, 7). These comparisons among cnidarians suggest that the history of cnidarian chromosomes may be more complex than previously envisioned. This is in line with recent interpretations of cnidarian chromosome comparisons40.
We then compared the chromosome-scale scaffolds of Nematostella with that of a bilaterian, the cephalochordate Branchiostoma floridae. Cephalochordates are early branching chordates lacking the two rounds of whole-genome duplication and allotetraploidization suggested for vertebrates41,42, as represented by their single Hox cluster43 (Fig. 2a). Strikingly, Branchiostoma and Nematostella chromosomes retain extensive macrosynteny since their divergence from the bilaterian-cnidarian MRCA. The large-scale macrosynteny retention detected in Branchiostoma becomes less obvious once we focus on more recently-branching bilaterian clades. Vertebrates, such as the early branching actinopterygian fish Lepisosteus oculatus44 and humans, underwent two or more rounds of genome duplications and showed additional translocation events from the MRCA (Fig. 2a). Among protostomes, we observed that the lophotrochozoan Ram’s horn worm Streplosbio benedicti and the sea scallop Patinopecten yessoensis45, as well as the ecdysozoan millipede Trigoniulus corallinus46, retained many macrosyntenic links (Fig. 2a).
The high degree of macrosynteny conservation with Branchiostoma motivated us to compare Nematostella chromosomes to those of the representative of an even older lineage, the Porifera (sponges), which branched off prior to the bilaterian-cnidarian split (Supplementary Fig. 1) and is considered by many to be the sister group to all other animals47,48. Strikingly, the comparison with the recently-assembled chromosomes of the sponge Ephydatia muelleri revealed many shared ancestral linkage groups (Fig. 2a).
Reconstruction of ancestral linkage groups
The observed patterns of conserved macrosynteny between chromosomes of multiple distantly-related species and the varying degrees of gene shuffling and chromosomal rearrangements in others (e.g., Drosophila, Caenorhabditis, humans) prompted us to reconstruct the set of predicted ancestral linkage groups for the MRCA of cnidarians, bilaterians and metazoans (Fig. 2b). First, we inferred gene orthology across all genomes investigated. For each of Metazoa, Bilateria and Cnidaria, we determined ancestral linkage groups by determining genes with high chromosomal linkage throughout each clade. Our method finds ALGs which maximize their “modularity”, i.e., groups with high chromosomal linkage within ALGs and low linkage between ALGs, by evaluating multiple scenarios (see Methods and Supplemental notes for details) to infer “ancestral linkage groups” (ALGs). We can then color the oxford plots with the genes in these blocks suggesting a pattern of splits and fusions in extant chromosomes from ancestral chromosomes.
We identified 17 ALGs present in the ancestor of cnidarians, although we could find similar modularity in scenarios of 16–18 ALGs (see Supplementary notes for details). These correspond well to the extant cnidarian chromosomes (Fig. 2c, Supplementary Fig. 5), however, while the Rhopilema and edwardsiid chromosomes appear highly representative of the ancestral cnidarian karyotype, the soft coral Xenia chromosomes appear to have undergone more chromosomal translocation events. Our results also indicate that although both Hydrozoa and Edwardsiidae have a clear 15 chromosome karyotype, their chromosomes originate from distinct fusions of cnidarian ALGs. For the ancestor of Bilateria, we identified 16 ALGs (Fig. 2c). In a previous study 17 ALGs were proposed49, which resulted in similarly high modularity in our analysis (see Supplementary notes for details). By comparison with the sponge Ephydatia, we reconstructed a minimum of 16 metazoan ALGs for the metazoan ancestor, which were maintained in the cnidarian MRCA, Bilateria and the sponge (Fig. 2b, c). We also compare this to a recent reconstruction of the ancestor to bilaterians, cnidarians, and sponges in Supplementary data file 1340.
In order to visualize how the chromosomes of extant species have undergone major splits and fusions, we projected the reconstructed ALGs of the respective cnidarian, bilaterian and metazoan ancestor to the chromosomes of the extant species. We found that many ALGs correspond to the same chromosomes across multiple species and lineages (Fig. 2c; Supplementary Figs. 5–7). To determine the extent of this, we further explored the relationships between metazoan ALGs and those of cnidarians and bilaterians (Fig. 2c). Remarkably, many ancestral chromosomes exhibited a 1-to-1 correspondence across all predicted ancestral lineages (Fig. 2c) and carried through to the extant lineages (Fig. 2d).
Chromosomal organization of the NK and extended Hox gene clusters
The chromosome-level assembly of the Nematostella genome allowed us to address the evolution of specific gene clusters. Prominent examples of clusters of homeodomain transcription factor coding genes ancestral for Bilateria include the SuperHox cluster, the ParaHox cluster, the NK/NK-like cluster as well as NK2 group genes located separately43,50,51. It has been hypothesized that all of them originated from a single gene cluster, which then disintegrated during evolution51. Our analysis revealed that Nematostella possesses a separate ParaHox cluster of two genes, (Gsx and Xlox/Cdx) on chromosome 10, and a SuperHox cluster on chromosome 2 containing Hox, Evx, Mnx, and Rough, as well as more distant Mox and Gbx52 (Fig. 3, Supplementary Fig. 8, Supplementary data file 6). We identified an NK cluster on chromosome 5 containing NK1, NK5, Msx, NK4, NK3, NK7, NK6, a more distant Lbx, a possible highly derived Tlx-like gene and, intriguingly, Hex, which is also linked to the NK cluster in the hemichordate Saccoglossus kowalevskii53 and in the cephalochordate Branchiostoma floridae. Similar to Bilateria, the NK2 genes were clustered separately and found on the chromosome 2 (Fig. 3, Supplementary Fig. 8-9, Supplementary data file 6). In contrast, in the earlier-branching sponges, neither ParaHox nor extended Hox cluster genes exist, and only the NK cluster is present with a single NK2/3/4 gene, two NK5/6/7 genes, an Msx ortholog, as well as possible Hex and Tlx orthologs54, (Fig. 3, Supplementary Fig. 10). Taken together, this allows us to propose that the bilaterian-cnidarian MRCA possessed an NK-cluster on a chromosome different from the one carrying the SuperHox cluster, and a separate NK2 cluster, possibly on the same chromosome as the SuperHox cluster (Fig. 3). The hypothesized SuperHox-NK Megacluster43, if it ever existed, must have both formed and broken apart during the time after the separation of the sponge lineage, but before the origin of the bilaterian-cnidarian ancestor (Fig. 3a, Supplementary notes). The lack of selection pressure in favor of microsynteny conservation is clearly illustrated by the comparison of the divergent Hox clusters of Scolanthus and Nematostella. Although located on homologous chromosomes, the gene order, orientation, and the number of intervening genes differs substantially between these two species (Fig. 3b). In contrast to the atomized Hox clusters of Nematostella and Scolanthus, the Hox cluster of their very distant anthozoan relative Xenia, is compact and contains “anterior” Hox genes HoxA and HoxB immediately next to the non-anterior Hox gene HoxE (Fig. 3b) Aside from the tandem duplications, this compact state likely represents the ancestral organization. Similarly, HoxE and HoxB are immediate neighbors in the genome of the jellyfish Rhopilema, although the Rhopilema Hox cluster shows some evidence of disintegration (Fig. 3b).
Topologically associating domains are not detected in either sea anemone genome
In the past decade, high-resolution chromosome conformation capture has increased interest in topologically associating domains (TADs), recurring chromosomal-folding motifs evidenced by signals in Hi-C contact maps55. Flanking regions of TADs are positively correlated with CCCTC-binding factor (CTCF) binding sites. Interestingly, no CTCF ortholog has been detected in non-bilaterian animals31,55, but previous studies of non-bilaterian animals have proposed the existence of TAD-like structures47.
Similar to previous studies, we sought to identify putative TADs within the edwardsiid genomes. Initial inspections of the Hi-C maps of both Scolanthus and Nematostella, however, revealed that both species appeared to lack TAD-like structures (Fig. 4a; Supplementary Fig. 11) and instead the Hi-C contact frequency decayed smoothly as a function of read pair distance along the chromosomes (Supplementary Fig. 11). We quantified these findings by measuring the strength of topological boundaries, measured by insulation score, and found the topological boundaries to be weaker in the Nematostella genome compared to the fly (Supplementary Fig. 12, p = 8.38 × 10−18, Mann-Whitney U test) and mouse genomes (p = 2.93 × 10−55, Mann-Whitney U test). We also verified that there was a lack of bilaterian-like TADs and weak topological domain boundaries in an independent Nematostella chromosome-scale genome assembly and Hi-C dataset produced in parallel to ours (Supplementary Fig. 1156).
TAD structures are often well-conserved around gene clusters that control spatio-temporal gene expression, especially of developmental regulators, such as the TADs that separate the anterior and posterior hox genes in the mouse HoxD cluster57. We therefore explored whether TAD-like structures existed around Hox clusters in Nematostella. Nematostella has six Hox genes, one is on chromosome 5, while the other five Hox genes are located on chromosome 2, but in two clusters, separated by hundreds of genes between (Fig. 4). We analyzed the genomic regions surrounding both partial hox clusters but again we could not detect any significant TAD-like structure.
Correlation of gene regulation with three-dimensional genome organization
We note that the Nematostella genome is fairly gene-dense (10 ± 4 genes per 100 kb), leaving relatively little intergenic sequence. It is conceivable that most relevant cis-regulatory elements are located in close proximity to the regulated gene. This is supported by the observation that many Nematostella transgenic reporter constructs faithfully mimic endogenous expression patterns with only 1–3 kb of upstream promoter sequence. Regions of open chromatin are reflected by ATAC-seq peaks, many of which contain cis-regulatory regions. We therefore wished to assess whether there is a correlation between the distance of ATAC-seq peaks and the closest gene (i.e., transcriptional start or end site) and the genome size. We compared several species, where chromosome assemblies and ATAC-seq data are available and - as expected - we found that there is indeed a correlation between distance of ATAC-seq peaks to the next gene and the genome size of the organism. However, there is also a correlation between small genomes and the absence of clear TAD structures, for instance in C. elegans, sponges and cnidarians. Notably, at least one cnidarian genome, Hydra, is relatively large (1.27 Gb), yet does not show typical TAD structures, although smaller-scale 3D structures could be observed58. Thus, within bilaterians, species with small genomes may show a tendency to lack TAD structures, while in non-bilaterians, which lack CTCF, they may not organize into stable TAD-like structures. At present, Hydra is the only available large non-bilaterian genome with relatively short distances between ATAC-seq peaks to the next gene.
TADs not only assure maintenance and directionality of enhancers to the regulated gene, they frequently also encompass more than one gene. In vertebrates, many TADs are conserved between species, suggesting that their maintenance is under selection pressure. This predicts that in organisms with TADs the number of microsyntenic genes is higher than in organisms lacking TADs. To test this, we compared the number of microsyntenies in different cnidarians, chordates and protostomes, roughly representing similar times of divergence. We find that, while the amount of microsyntenies consisting mostly of two consecutive orthologous genes did not show any difference between clades, microsyntenies consisting of at least three or four orthologous genes were reduced in cnidarians. No microsyntenies of four and more genes were found in any pair of species spanning the anthozoan-medusozoan split (e.g., Nematostella-Rhopilema), whereas around a dozen such regions are shared among distantly branching spiralian and chordate-vertebrate (“deuterostome”) genomes (e.g., 11 blocks between the scallop Pecten maximus bilaterian and the annelid Streblospio benedicti, and 14 blocks for amphioxus Branchiostoma floridae to chicken Gallus gallus split) (Fig. 4f). A similar trend was observed for microsyntenies where intervening genes were allowed to be part of the blocks. For a commonly used threshold of up to 5 intervening genes49, Nematostella-Rhopilema species pair showed only 7 blocks, whereas Pecten-Streblospio had 82, and amphioxus to chicken had 45 microsyntenic blocks retained (Fig. 4f, Supplementary data file 12). Together, this data indicates higher retention and evolutionary exploration of longer syntenic regions in bilaterian genomes, potentially allowed for by the maintained distal topological architecture.
Discussion
Here we report the assembly of two high quality, chromosome-level edwardsiid sea anemone genomes. In comparing them in relation to other cnidarian, bilaterian and poriferan genomes, we have illuminated several intriguing aspects about early animal chromosomal evolution, the origin of NK and extended Hox clusters, the conservation of non-coding elements and the status of topologically associated domains in the bilaterian-cnidarian MRCA. In addition, the highly improved Nematostella genome and manually curated gene annotations will prove to be an invaluable resource for future studies of both coding and non-coding regions, structural variants among populations and continued development of functional tools for this model organism.
Nearly all members of the extended Hox cluster were distributed among distant, isolated microsyntenic blocks on chromosome 2 of Nematostella (pseudo-chromosome 4 in Scolanthus), with the single exception of HoxF/Anthox1, located on chromosome 5 (Fig. 3; Supplementary notes). This indicates a lack of proximity constraint on the Hox genes in cnidarians, contrasting with the situation in Bilateria. In addition, while a staggered spatiotemporal pattern of Hox expression along the secondary, directive axis of the Nematostella larva and polyp can be observed59, unlike Bilateria, there is no correlation between expression and cluster position60. Notably, HoxF/Anthox1 is not only located on a different chromosome but it is also the only Hox gene expressed in the (aboral) ectoderm, while all other Hox genes are expressed in the inner endomesodermal cell layer61,62.
The dispersed NK and extended Hox clusters may be due to the diminishment or, possible absence of higher-order chromosome organization at the level of microsynteny. In line with this, it was recently observed that the HoxD cluster boundaries in the mouse genome are marked by two TAD boundaries57, and the cluster’s intra-TAD gene order is deemed to be under selective pressure63. The lack of a CTCF gene in the Nematostella genome led us to hypothesize that the cnidarians might lack TADs, as TAD presence has been attributed to the appearance of CTCF63. Moreover, CTCF is absent not only in cnidarians but also in earlier branching ctenophores and sponges, which provides a possibility that the existence of TADs might represent a bilaterian-specific feature. While we were unable to detect any noticeable structure resembling the current definitions of TADs, it remains an open question as to whether larger or smaller structures, beyond the resolution of our data, could yet be detected. One study has suggested evidence for the higher-order chromosomal organization in a non-bilaterian, the sponge Ephydatia47. However, the contact maps resemble patterns we observe in our assemblies at the boundary of scaffolds or contigs, which can be the result of differential mappability from repetitive content or assembly issues. We therefore deliberately do not report any results from a TAD finder, since, after multiple rigorous rounds of manual assembly update, we can assert that the data we have generated do not qualitatively represent TAD boundaries per se, and most results would be likely false positives. While the precise definition of a TAD is still evolving64,65, both data sets lack many characters of TADs identified in CTCF-containing genomes: hierarchical compartments, mammalian-specific “corner peaks” indicating strong interactions, and in our case, loop peaks and inter-contig compartments. This suggests that the presence of CTCF is necessary for the formation of TADs. The apparent absence of TADs in our analysis could also be explained by a higher degree of cell type variability. Therefore, we cannot exclude the possibility that performing the experiment with a more homogenous cell population, or sequencing at a higher resolution, would reveal a signal on a smaller scale.
While microsynteny analyses reveal little conservation of the local gene order in the genomes of Nematostella and Scolanthus, macrosyntenic analysis of the edwardsiid chromosomes compared to available cnidarian genomes revealed a high level of conservation. We identified a stable set of 19 ALGs across all clades of sequenced cnidarian genomes. When compared to extant genomes, we can trace a small number of recombination events from the ALGs since the common cnidarian ancestor split an estimated 580 Mya. This stands in stark contrast to the history of, for example, the 326 Mya old ancestral genome of Amniota, which is estimated to have 49 distinct units, while the karyotypes of the extant amniote taxa consist of multiple translocated segments and variable chromosomes66. However, far more remarkable is the macrosynteny maintained between the edwardsiids, the early branching chordate Branchiostoma, and the sponge Ephydatia. Our analyses revealed clear one-to-one, one-to-few or few-to-one conservation of the chromosome-level linkages between cnidarians, sponges and early chordates, which suggests a striking retention of macrosyntenies throughout evolution of these animal lineages. This in turn allowed us to deduce a set of 16 ALGs of the last common metazoan ancestor, which was maintained in the cnidarian-bilaterian ancestor and gave rise to the 19 ancestral cnidarian chromosomes and the 17 ancestral bilaterian chromosomes49. The observed conservation of macrosyntenic linkages can result from a strong selective pressure to maintain intact chromosomes during meiosis, so that only local translocations within the chromosomes, scrambling local gene order, are favored. Changes in population sizes or asexual reproduction or selfing may facilitate chromosomal unit breakages. While such global chromosomal rearrangements have been observed for some clades (most notably dipterans but also nematodes and cephalopods), it still remains unclear whether additional gene regulatory constraints may exist at the whole-chromosomal level67. It is tempting to speculate that the emergence of the TADs in Bilateria may have restricted local rearrangements and at the same time released the constraints on maintaining the ancestral macrosyntenies conserved all the way back to the origin of multicellular animals.
The high level of macrosynteny maintaining ancestral chromosomal blocks over hundreds of millions of years contrasts with the low level of microsynteny conservation, even between more closely related cnidarians. Despite this reshuffling of genes within chromosomes, many developmental regulators involved in axial patterning show a conserved expression pattern, suggesting that their cis-regulatory elements have been maintained. For instance, in Nematostella and Hydra, which are separated over 500 Million years, all 12 Wnt genes, brachyury, and foxA are expressed at the oral pole, while foxQ2 and six3 are expressed aborally68–71. To ensure conservation of spatio-temporal gene expression patterns cis-regulatory elements must remain located in close proximity to the respective gene even when translocated to a new genomic position. This is supported by our distance analyses of ATAC-seq peaks. By contrast, bilaterians with large genomes, such as vertebrates, enhancers can be located up to several megabases away from the gene they regulate, often with several genes in between. To maintain the gene regulation by long distant enhancers, we predicted that the microsynteny conservation should be under higher selective pressure than in non-bilaterians. Indeed, microsynteny conservation is higher in vertebrates than in cnidarians, in line with recent comparisons between skates, mouse and garfish72. We therefore propose a hypothetical evolutionary scenario, where close cis-regulation is ancestral to metazoans and maintained in non-bilaterians, which lack CTCF. It follows that CTCF evolved in the bilaterian common ancestor and was recruited to bind to the genome to act as an insulator and, together with cohesin, as a structural component defining recombination boundaries. This kept complex gene regulation by distant enhancers as a physical boundary for recombination events, but also acted as an insulator in TADs. Bilaterians that evolve a more compact genome would have a tendency to lose TADs and the dependence on CTCF. This seems to be the case in the C. elegans autosomes73,74 and is also predicted for other small genomes, such as in urochordates.
Methods
Animal care and source
Nematostella vectensis animals (F1 of CH2xCH6, originally collected by Cadet Hand) were cultured at 18 °C under dim light conditions and fed daily with Artemia brine shrimps. Adult male and female individuals were verified by induction of spawning in isolation75. Spawning was induced by a combination of white light and a temperature of 25 °C for 10 h. Scolanthus callimorphus animals were collected at the Île Callot, Carantec, France in the frame of the Assemble grant 227799 to U.T. After transport, they were kept in seawater at 20 °C and fed freshly hatched Artemia salina weekly or biweekly. Spawning could not be induced in the laboratory and the sex of the polyps was unknown. Nematostella vectensis is a laboratory strain since several decades. Neither Nematostella vectensis nor Scolanthus callimorphus are endangered species and they are simple invertebrates. Therefore, no ethical approval was necessary.
Sequencing
Short read DNA-Seq
Genomic DNA samples were extracted from both adult male and female individual Nematostella adults using the DNeasy Blood and Tissue Kit (Qiagen). After purification, ~5 µg of genomic DNA was recovered from each sample. Following DNA extraction, samples were sheared and size selected for ~500 bp using a Blue Pippin Prep machine (Sage Science). Following size selection, sequencing libraries were created using a KAPA HTP Library Prep kit (Roche) and subjected to paired-end sequencing on an Illumina NextSeq 500. Scolanthus DNA samples for library preparation were aliquoted from high molecular weight extractions, described below.
High molecular weight DNA extraction and library prep
Nematostella high molecular weight DNA was extracted at Dovetail Genomics. Samples were quantified using Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA). The PacBio SMRTbell library (~20 kb) for PacBio Sequel was constructed using SMRTbell Template Prep Kit 1.0 (PacBio, Menlo Park, CA, USA) using the manufacturer recommended protocol. The pooled library was bound to polymerase using the Sequel Binding Kit 2.0 (PacBio) and loaded onto PacBio Sequel using the MagBead Kit V2 (PacBio). Sequencing was performed on the PacBio Sequel SMRT cell, using Instrument Control Software Version 5.0.0.6235, Primary analysis software Version 5.0.0.6236 and SMRT Link Version 5.0.0.6792, yielding 24.67 Gb over 3,050,403 subreads.
High molecular weight DNA from a single Scolanthus callimorphus adult animal was extracted using a modified Urea-based DNA extraction protocol76,77. A whole animal was flash frozen and ground with mortar and pestle. While frozen, drops of buffer UEB1 (7 M Urea, 312.5 mM NaCl, 50 mM Tris-HCl pH 8, 20 mM EDTA pH 8.1% w:v N-Lauroylsarcosine sodium salt) were added and crushed with the tissue. Tissue was incubated in a final volume of 10 mL UEB1 at RT for 10 min. Three rounds of phenol-chloroform extraction were performed, followed by DNA precipitation by addition of 0.7 volume isopropanol. The pellet was transferred to a fresh tube and washed twice in 70% EtOH and twice more in 100% EtOH, dried, and resuspended in TE buffer.
A library for PacBio sequencing was then prepared from the high molecular weight sample using the SMRTbell® Express Template Prep Kit v1. The libraries were then sequenced on a PacBio Sequel machine over 3 SMRT Cells, yielding a total of 22.85 Gb over 1,474,285 subreads. An aliquot of the same sample was used to prepare a library using the NEBNext® Ultra™ II DNA Library Prep Kit for Illumina. This was then subjected to 50 cycles of single-end sequencing in one flow cell lane using an Illumina HiSeq 2500 system.
Chicago libraries
Two Chicago libraries were prepared as described previously78. For each library, ~500 ng of HMW gDNA (mean fragment length = 100 kbp) was reconstituted into chromatin in vitro and fixed with formaldehyde. Fixed chromatin was digested with DpnII, the 5’ overhangs filled in with biotinylated nucleotides, and then free blunt ends were ligated. After ligation, crosslinks were reversed and the DNA purified from protein. Purified DNA was treated to remove biotin that was not internal to ligated fragments. The DNA was then sheared to ~350 bp mean fragment size and sequencing libraries were generated using NEBNext Ultra enzymes and Illumina-compatible adapters. Biotin-containing fragments were isolated using streptavidin beads before PCR enrichment of each library. The libraries were sequenced on an Illumina HiSeq 2500 (rapid run mode). The number and length of read pairs produced for each library was: 116 million, 2 × 101 bp for library 1; 35 million, 2 × 101 bp for library 2. Together, these Chicago library reads provided 125 × sequence coverage of the genome (1–100 kb pairs).
Chromatin was extracted from a single Nematostella vectensis adult male and Scolanthus callimorphus adult (unknown sex) nuclei using the Phase Genomics Proximo Hi-C animal protocol. After proximity ligation and purification, 16 ng and 9 ng of DNA was recovered, respectively. For library preparation 1 µl of Library Reagent 1 was added 12 PCR cycles were performed. The final library was subjected to 150 total cycles of paired-end sequencing using an Illumina NextSeq 550 machine yielding a total of 13.5 gigabases.
Hi-C sequencing, Scolanthus callimorphus PacBio library preparation and sequencing, Scolanthus Illumina DNA library preparation and sequencing and adult Nematostella vectensis RNA library preparation and sequencing was performed at the Vienna Biocenter Core Facility (VBCF) NGS Unit (https://www.viennabiocenter.org/facilities). Nematostella vectensis DNA size selection, library preparation, and sequencing were performed by the Molecular Biology Core at the Stowers Institute for Medical Research.
Developmental and adult Nematostella RNA sequencing was performed as follows. Nematostella were spawned and eggs were de-jellied and fertilized as previously described75. Spawning and embryo development took place at 18 °C. Eggs and embryos from different stages were collected (300 per sample) in duplicate as indicated: eggs (within 30 min of spawn), blastula (7.5 hpf), gastrula (23.5 hpf) and planula (72 hpf). Eggs and embryos were collected in eppendorf tubes and centrifuged to a pellet at 21,000 × g for 1 min. All seawater was quickly removed and pellets were resuspended in 150 ml lysis buffer (RLT buffer supplied by the Qiagen RNeasy kit (#74104), supplemented with β-mercaptoethanol). The samples were homogenized with an electric pestle (1 min continuous drilling) and further supplemented with 200 ml of the above lysis buffer. Homogenized samples were then transferred into QIAshredder columns (Qiagen #79654) and centrifuged at 21,000 × g for 2 min. The flow throughs were supplemented with 1 ml 70% ethanol and transferred to RNeasy columns and were processed according to the Qiagen RNeasy protocol. Quality and integrity of the RNA was evaluated using the Agilent RNA 600 pico kit (Agilent Technologies) and RNA samples were stored at −80 °C until further processing. cDNA libraries were then constructed for polyA stranded sequencing. The resulting libraries were sequenced on Illumina HiSeq using paired end runs (RapidSeq- 2 × 150bp).
Genome Assembly
Size estimates for Nematostella vectensis and Scolanthus callimorphus were derived using Genomescope79, taking the result of the highest k (56 and 18) which converged under the model.
Initial assemblies based on PacBio sequencing of Nematostella and Scolanthus were generated using canu version 1.880 with the parameters rawErrorRate=0.3 correctedErrorRate=0.045.
Nematostella haplotigs were removed using Purge Haplotigs81. First, the source PacBio reads were aligned onto the canu assembly using minimap282 using the parameters -ax map-pb --secondary=no. Following this a coverage histogram was generated using the Purge Haplotigs script readhist. Per the documented Purge Haplotigs protocol, lower, mid, and high coverage limits were found by manual inspection of the plotted histogram to be 12, 57, and 130, respectively. All initial contigs marked as suspect or artifactual were removed from further analysis with the Purge Haplotigs script purge.
Due to lower sequencing coverage of Scolanthus, diploid per-scaffold coverage could not be deconvolved from haploid, and therefore Purge Haplotigs could not be used. Removal of redundant contigs was performed with Redundans version 0.14a83 using the parameters --noscaffolding --nogapclosing --overlap 0.66. Only contigs marked in the reduced version of the genome were used in further analysis.
The input de novo assembly, shotgun reads, and Chicago library reads were used as input data for HiRise, a software pipeline designed specifically for using proximity ligation data to scaffold genome assembly78. Shotgun and Chicago library sequences were aligned to the draft input assembly using a modified SNAP read mapper (http://snap.cs.berkeley.edu). The separations of Chicago read pairs mapped within draft scaffolds were analyzed by HiRise to produce a likelihood model for genomic distance between read pairs, and the model was used to identify and break putative misjoins, to score prospective joins, and make joins above a threshold. After scaffolding, shotgun sequences were used to close gaps between contigs.
Repetitive DNA and Hi-C scaffolding
Repetitive DNA was found using two strategies. First, known repeats found in repbase84 were searched in the assemblies using RepeatMasker85 using the parameters -s -align -e ncbi in addition to -species nematostella for Nematostella and -species edwardsiidae for Scolanthus. Second, novel repeat sequences were found using RepeatModeler version 2.086. After generating the repeat library, genomes’ repeat regions were detected with the corresponding library using the same parameters in RepeatMasker.
Hi-C sequences were aligned to the reduced and repbase masked genomes of Nematostella and Scolanthus using bwa mem87,88 using the parameters -5SP. For Nematostella, an additional candidate assembly was generated by mapping Hi-C sequences to the Chicago library scaffolded sequences using repbase masking (dovetail_standardmask) in addition to the contig-based scaffolding (contig_standardmask). Duplicate reads were marked with the samblaster utility89, and duplicate, secondary and supplementary mappings were removed with samtools. These mappings were used to generate initial chromosomal assemblies using Lachesis90, specifying the restriction site GATC. Assemblies were manually reviewed using Juicebox Assembly Tools version 1.11.0891. Candidate assemblies were compared using the nucmer aligner with default parameters and visualized using mummerplot92. Assemblies were converted over to Juicebox format using juicebox_scripts (https://github.com/phasegenomics/juicebox_scripts). In the case of Scolanthus, duplicate regions were clipped, and the resulting contigs were subjected to another round of alignment, assembly and review.
Nematostella scaffold correctness was assessed using REAPR93. Nematostella assembly nemVec1 was downloaded from the JGI website1. Sequences from the adult male and adult female (see Sequencing) were aligned to nemVec1 and the Nematostella genome after scaffolding with Chicago libraries using SMALT as well as the REAPR tool perfectmap using an expected insert size of 400, as determined from fragment analysis. Error-free bases and contiguity after breaking the genome were extracted from the results.
Genome and gene model set assembly and completeness was assessed using BUSCO version 3.0.233, using the gene set metazoa_odb9 as the standard.
Gene models
Nematostella, Scolanthus and M. senile sequences obtained from previous studies34,94 and publicly available data (see Data Availability for details) were used to generate de novo assembled transcripts.
Trinity version 5.0.295 was run on each library using the flags --min_contig_length 200 --min_kmer_cov 2. For those which had a strand-specific library preparation, the flag --SS_lib_type RF was applied. To reduce redundancy, cd-hit version 4.6.896,97 was applied with the flags -M 0 -c 1. Transdecoder version 5.0.698 was used to detect open reading frames in the resulting reduced set of transcripts. Transcript abundance was quantified using salmon version 1.2.199 using the flags --seqBias --useVBOpt --discardOrphansQuasi --softclip.
For PacBio Iso-seq, 12 Nematostella RNA samples were collected over the course of multiple developmental stages, adult tissues and regeneration time points. For developmental stages, zygotes spawned by a single batch of wildtype colony were kept at 22 °C, and collected at 0, 24, 48, 72, and 7 dpf. Adult tissues were collected from sex-sorted, sexually mature wildtype individuals kept at 22 °C. The male and female mesenteries were harvested separately by surgically opening the body column and carefully peeling off the attached body column tissues. Adult oral discs were collected by surgical removal of tentacles as well as the attached pharyngeal regions. Regeneration was induced by amputating the oral part of a sexually mature individual at the mid-pharyngeal level. Regenerating tissues close to the wound were collected at 4 hpa and 12 hpa, respectively. All the samples were deep-frozen and lysed using TRIzolTM reagent (Invitrogen). Phenol-chloroform extraction was performed to remove undissolved mesoglea from adult tissues. DirectzolTM RNA Miniprep Plus Kit (Zymo) was then used to purify total RNA from the aqueous phase. For each sample, 2 μg of total RNA with RIN > 7 was submitted to UC Berkeley for Iso-seq library construction.
RNA Libraries were sequenced at UC Berkeley using PacBio Sequel-II system. Raw subreads bams were processed and demultiplexed using PacBio’s isoseq v3.2 conda pipeline. The steps include consensus generation, primer demultiplexing, polyA refinement and data clustering using default parameters. This resulted in the generation of 406,317 high quality HIFI reads and used to build Nvec200 transcriptome.
HIFI reads were mapped to the Nematostella genome using minimap282 using parameters (-ax splice -uf --secondary=no) to obtain the primary best alignments. Reads were then grouped and collapsed down to potential transcripts using PacBio’s cDNA_Cupcake toolkit and TAMA100. Based on PacBio’s guideline, transcripts with degraded 5’ reads and have less than 10 FL counts were removed. Chimeric transcripts were then analyzed to find potential fusion genes. For reads that didn’t map to the genome, de novo transcriptome assembly was performed using graph-based tool Cogent with kmer size equals 30. Cupcake and TAMA results were merged into non-redundant gene models using stringtie v2101. Deep RNA-seq reads from 4 developmental stages: egg, gastrula, pos-gastrula and planula were aligned to the genome using STAR v 2.7.3a102. Read alignments outside of the isoseq gene models were extracted with bedtools v2.29.2103 and used for reference-guided transcriptome assembly using Stringtie2. Final gene models were obtained by merging Isoseq models and RNAseq models and manually corrected using previously cloned full length CDS from Nematostella vectensis in NCBI (Supplementary data file 7). Finally, transdecoder v5.5.0 was used to produce CDS annotation using a minimum protein length of 50 amino acids and prioritizing ORFs with significant similarity to any family in the PFAM database104. Alignment of the protein candidates to the PFAM database was done using Hmmer v3.1b2105. RNAseq libraries from the N. vectensis developmental time series were downloaded and aligned to the new genome using STAR v2.7.3a102 with standard parameters. Mapping and assignment efficiency was measured using featureCounts from the subread package106 with the “-p” flag for paired-end libraries.
Evidence for Scolanthus gene models were taken from RNA-sequencing and repeats. Scolanthus RNA-seq reads (see Sequencing) were mapped to the Scolanthus contigs using STAR version 2.7.3a102. These mappings were used as evidence for intron junctions to generate putative gene models and estimating hidden Markov model parameters using BRAKER2107,108. Gene models were then refined using Augustus version 3.3.3109 using extrinsic evidence from STAR splice junctions and the location of repeats from RepBase (see Genome Assembly) as counter-evidence for transcription. These models were filtered with the following criteria: 1) genes completely covered by RepeatModeler repeats (see Genome Assembly) were removed 2) predicted gene models were required to be either supported by external RNA-seq evidence as reported by Augustus or have a predicted ortholog as reported by Eggnog-mapper110. This resulted in a set of 24,625 gene models. Transcription factor identity was inferred by aligning the predicted protein sequences to Pfam A domains version 32.0111 using hmmer version 3.3105. Transcription factor families were based on domains curated in a previous work112.
Extended Hox cluster, NK cluster and ParaHox genes were found with BLAT113 matches of published models52,61,114–119 to the nv1 genome, taking the best hits. If an NVE gene model34 corresponded to the matched genomic region, its location in the nv2 genome was then determined for macrosynteny analysis. In cases where no published gene was known, reciprocal BLAST hits between the bilaterian and cnidarian counterpart were taken as evidence for orthology.
Divergence estimates
Single copy orthologs were detected by collecting common complete and duplicated BUSCO genes present in the Scolanthus and Nematostella genomes. Where duplicated BUSCOs were present, the transcript with the highest score was taken. This resulted in a total of 541 orthologs. BUSCOs found in genomes obtained from previous studies2,21,22,26,27,32,41,44,47,120–123 were used to generate multiple alignments. Genes were aligned with mafft version 7.427 using the E-INS-i model and a maximum 1000 refinement iterations124. Alignments were trimmed using trimAl version 1.4.rev15 using the “gappyout” criteria125. A maximum likelihood tree was inferred using iqtree version 2.0.6, using the model finder partitioned on each gene, constrained to nuclear protein models126. Divergence estimates were determined using r8s version 1.8.1 using the Langley-Fitch likelihood method127. Age ranges were estimated by fixing the split between Bilateria and Cnidaria at 595.7 and 688.3 Mya128.
Single copy orthologs were detected by collecting common complete BUSCO genes present in the A. millepora, A. digitifera, E. pallida, M. senile, Scolanthus and Nematostella genomes. This resulted in a total of 229 orthologs.
Ultraconserved elements
In order to determine noncoding elements conserved between Scolanthus and Nematostella, genomes repeat-masked from both de novo and repbase repeats were blasted using NCBI BLAST+ version 2.10.0129, using the flags -evalue 1E-10 -max_hsps 100000000 -max_target_seqs 100000000 -task megablast -perc_identity 0 -template_length 16 -penalty -2 -word_size 11 -template_type coding_and_optimal. Additionally, the -dbsize parameter was set to the estimated genome size. Candidate hits were then filtered using criteria loosely based on previous work130: for each high-scoring pair, a sliding window method was used to determine subsections of the alignment with at least 95 % identity, and extending these windows as long as the identity remains at this level. Nematostella elements mapping to more than one locus in the Scolanthus genome were reduced to the longest locus pair in both genomes. Elements mostly mapping to coding sequence were removed, and the remaining elements were classified as intron or non-coding, depending on location. Recurring UCE sequences that were not identified by RepeatModeler or RepeatMasker were detected with blastclust version 2.2.26 requiring the length of hit to cover at least 90 % of either sequence for linkage.
Macrosynteny analysis
Branchiostoma floridae gene models and sequences were retrieved from the published study41. All against all comparisons were performed with OMA standalone version 2.5131. Genomes were downloaded from previous studies2,3,20,22,26,32,44–47,120–123,132–136.
Ancestral genome reconstruction was carried out using a graph based approach. In brief, genes were summarized into multi-species orthologous groups, and these comprised the nodes, and orthology groups occurring on the same chromosome or scaffold of two different species were linked together. A consensus approach to community detection based on the Leiden algorithm137 was used to determine ancestral linkage groups from this graph. For details, see the Supplementary Notes.
TAD sliding window analysis
Hi-C maps were generated for multiple species’ genomes (g:) and reads (r:), including those of the ctenophore Hormiphora californensis (g: GCA_020137815.1, r: SRR13784181, SRR13784182)138, the cnidarian Hydra vulgaris strain 105 (g: GCA_022113875.1, r: SRR14099165)40, the cnidarian Haliclystus octoradiatus (g: GCA_916610825.1, r: ERR6745733)139, the Nematostella assembly and reads presented in this manuscript (also, Hi-C reads from the Darwin Tree of Life project, ERR857169956), the cnidarian Diadumene lineata (g: GCA_918843875.1, r: ERR6688655)140, the fire jellyfish Rhopilema esculentum (g: 10.5524/10072039, r: SRR1164908521), the fly Drosophila melanogaster (g: assembly Release 6122, r: SRR10512944141), the scallop Pecten maximus (g: 10.6084/m9.figshare.10311068, r: SRR10119404)142, the cephalochordate Branchiostoma floridae (g: GCA_000003815.2 Bfl_VNyyK, r: SRR12007919, SRR12059951)41, and Mus musculus (g: GRCm39143, r: SRR1771322-SRR1771324144).
The Hi-C maps were generated by mapping the raw paired-end Hi-C reads to the genome assemblies using chromap v0.2.3145. The pairs files were normalized using hicExplorer v3.6146, and balanced using Cooler v0.8.10147. The insulation scores were then calculated using FanC v 0.9.23b73 using 100 kb bin sizes. The peaks and valleys in the insulation scores were also called with FanC v0.9.24148. We defined the delta between each peak and valley as the transition from a highly-interacting region to a poorly-interaction region as the strength of transitions of topological boundaries. We compared the distributions of these deltas with an uncorrected Mann-Whitney U two-sided test to test for significant differences in the median values.
Phylogenetic analysis of NK-like and SuperHox genes
Sequences of the NK class proteins were MUSCLE-aligned with default settings in MEGA11149, and trimmed using the Automated1 setting in TrimAl (v. 1.3)125. Trimmed sequences were used for calculating the NJ, and ML trees. The NJ tree was calculated in MEGA11149, and the ML tree was calculated in IQ-TREE2126 using the automatic model selection algorithm. Sequences of the SuperHox cluster genes were aligned as above but not trimmed. NJ and ML trees were calculated as above.
Nematostella gastrula ATAC-seq
Embryos were raised at 21 °C until gastrula stage (24 hpf). Ten embryos were collected and washed with cold PBS, then cold lysis buffer (10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% NP-40, 0.1% Tween-20, 0.01% Digitonin) was added. Embryos were disintegrated by pipetting and incubated in lysis buffer for 6.5 minutes on ice. Nuclei were then pelleted by centrifugation (0.5 rcf, 10 min, 4 °C). The pellet was resuspended in 50 µl tagmentation mix (5× TD buffer, 0.1% Tween-20, 0.01% Digitonin, 2.5 µl Tn5 transposase). TD buffer and Tn5 transposase were kindly provided by the David Garfield lab. The transposition reaction was incubated at 37 °C for 30 min with agitation at 500 × g and then cooled down on ice for 5 min. DNA purification was carried out with the QIAquick PCR kit (QIAGEN, #28104) according to the manufacturer’s instructions, with an elution in 20 µl elution buffer. PCR amplification was performed for 12 cycles according to Buenrostro150. A final purification step was performed with the QIAquick PCR kit (QIAGEN, #28104), elution in 15 µl EB. The quality of the library was validated with an agarose gel, from which DNA was extracted with a peqlab gel extraction kit, eluted in 20 µl EB, and stored at −20 °C. Library preparation and sequencing were done at Novogene and VBCF.
Comparative ATAC-seq analysis
ATAC-seq peak calling was performed for the following species using publicly available datasets (see Supplementary data file 14 for details).
All peak callings were done following the Encode guideline for ATAC-seq. Owenia and Nematostella peak calls were done following the unreplicated data guidelines (https://www.encodeproject.org/pipelines/ENCPL344QWT/) while Lytechinus and Petinopectin following the replicated data guidelines (https://www.encodeproject.org/pipelines/ENCPL787FUN/). Briefly, raw reads were trimmed, aligned to their reference genome and filtered for mapq >=20. Psudoreplicates were produced for each library as well as for the merged alignment of multiple libraries and peak calling was done on the initial alignment, the merged alignment and all the pseudoreplicates. For unreplicated data, peaks were filtered based on p-signal <=0.05 and similar peak calls between the original alignment and at least one pseudoreplicate. For replicated data, final peaks were selected based on IDR scores between the merged library peaks and each replicate library.
ATAC-seq peaks for Human (hg19), mouse (mm10), Caenohabditis (Ce11), and Drosophila (dm6) were downloaded from the ChIP-atlas database (https://chip-atlas.org/151,152). Only embryonic stage datasets were used in this study. Repetitive regions were identified using RepeatModeler and RepeatMasker for all genomes and peaks overlapping repeat regions were removed for downstream analysis. A custom R script was used to find the closest gene for each ATAC-seq peak and measure the distances. All peaks overlapping genes were not considered in downstream analysis. Data points 1.5 times the interquartile range above the third quartile or below the first quartile were removed as outliers before plotting and regression. A simple linear regression was used to correlate genome size and mean intergenic distance for all species.
Microsynteny analysis
Orthofinder version 2.5.4 was run to obtain orthogroups for the selected species. The microsynteny pipeline from ref. 49 was run for each of the three clades (deuterostomes, spiralians, cnidarians) separately. For this a subset of orthogroups that had at least one gene per species in a given clade (e.g., every cnidarian species had an ortholog for the cnidarian clade micro-synteny analysis) was generated, to correct for any missing genes. The micro-synteny analysis was thus based on 6016, 4832, and 6539 orthogroups for cnidarians, spiralians, and deuterostomes, respectively. For the most strict profiling of gene block length, we run the micro-synteny pipeline without allowance for any intervening genes. The resulting blocks were then filtered to remove micro-syntenies composed of paralogous genes. The resulting total numbers of micro-syntenic blocks given different minimal required lengths (at least 2, 3, or 4 genes) are listed in Supplementary data file 12.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Acknowledgements
We thank Matthew Nicotra for providing us with the HMW DNA extraction protocol used for Scolanthus. We thank Robert Reischl for the photo of Scolanthus and Patrick R.H. Steinmetz and Hanna Kraus for the photo of Nematostella vectensis (Fig. 1). Special thanks to Tatiana Lebedeva for the cartoon drawings of Nematostella (Fig. 4). We are grateful to the Stowers Institute Molecular Biology Core facility, particularly Amanda Lawlor, Michael Peterson and Anoja Perera. This work was supported by grants of the Austrian Science Fund FWF (P24858; P21108) to U.T., support from the Stowers Institute for Medical Research to M.G. and an NIH Ruth L. Kirschstein NRSA (F32 GM131522) to E.M.H. We are also grateful for the support of the CNRS Marine Station in Roscoff and the Assemble grant 227799 to U.T. for collecting Scolanthus.
Author contributions
B.Z. generated the DNA libraries for PacBio sequencing and for HiC, carried out most of the bioinformatic analyses and wrote the paper. J.D.M. generated the gene models and carried out other bioinformatics analyses. SMCR and WJF set up the SIMR base browser and carried out bioinformatic analyses, L.W., D.F., D.S., O.S., J.L.-W. carried out bioinformatic analyses. S.H., S.C., E.M.H.,. C.C., K.R., D.P., Y.M., G.G. generated experimental data and contributed to the bioinformatic analyses. MCG oversaw the project, contributed Isoseq data and edited the paper, U.T. designed the study, collected Scolanthus polyps, contributed various sequencing data, and wrote the paper, further edited by all authors.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Code availability
A description of the tools and algorithms used for this work are described in the methods section and the supplement. Custom code used to analyze the data are available at https://github.com/nijibabulu/cnidariangenomes153,154.
Data availability
All raw data and assembled genomes are available via the National Center for Biotechnology Information under the accessions PRJNA667495, PRJNA1036184 and PRJNA430035. The assembled genomes can be downloaded, browsed and searched on publicly available browsers at https://simrbase.stowers.org/starletseaanemone and https://simrbase.stowers.org/wormanemone. Publicly available data that were used in this manuscript include SRR1771322-SRR1771324 [https://www.ncbi.nlm.nih.gov/bioproject/PRJNA273476] (mouse Hi-C), SRR10512944 (D. melanogaster Hi-C), and ERR8571699 (Wellcome Sanger N. vectensis Hi-C). Data necessary to reproduce the analyses together with the code is available for download via figshare [10.6084/m9.figshare.24258598.v2].
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Bob Zimmermann, Juan D. Montenegro.
Contributor Information
Matthew C. Gibson, Email: mg2@stowers.org
Ulrich Technau, Email: ulrich.technau@univie.ac.at.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-023-44080-7.
References
- 1.Putnam NH, et al. Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science. 2007;317:86–94. doi: 10.1126/science.1139158. [DOI] [PubMed] [Google Scholar]
- 2.Chapman JA, et al. The dynamic genome of Hydra. Nature. 2010;464:592–596. doi: 10.1038/nature08830. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Srivastava M, et al. The Amphimedon queenslandica genome and the evolution of animal complexity. Nature. 2010;466:720–726. doi: 10.1038/nature09201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Schwaiger M, et al. Evolutionary conservation of the eumetazoan gene regulatory landscape. Genome Res. 2014;24:639–650. doi: 10.1101/gr.162529.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Gaiti F, et al. Landscape of histone modifications in a sponge reveals the origin of animal cis-regulatory complexity. Elife. 2017;6:e22194. doi: 10.7554/eLife.22194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Nanty L, et al. Comparative methylomics reveals gene-body H3K36me3 in Drosophila predicts DNA methylation and CpG landscapes in other invertebrates. Genome Res. 2011;21:1841–1850. doi: 10.1101/gr.121640.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Lieberman-Aiden E, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Dixon JR, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–380. doi: 10.1038/nature11082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Acemel RD, Maeso I, Gómez‐Skarmeta JL. Topologically associated domains: a successful scaffold for the evolution of gene regulation in animals. WIREs Dev. Biol. 2017;6:e265. doi: 10.1002/wdev.265. [DOI] [PubMed] [Google Scholar]
- 10.Anderson E, Devenney PS, Hill RE, Lettice LA. Mapping the Shh long-range regulatory domain. Development. 2014;141:3934–3943. doi: 10.1242/dev.108480. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Krefting J, Andrade-Navarro MA, Ibn-Salem J. Evolutionary stability of topologically associating domains is associated with conserved gene regulation. BMC Biol. 2018;16:87. doi: 10.1186/s12915-018-0556-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Rodríguez-Carballo E, Lopez-Delisle L, Yakushiji-Kaminatsui N, Ullate-Agote A, Duboule D. Impact of genome architecture on the functional activation and repression of Hox regulatory landscapes. BMC Biol. 2019;17:55. doi: 10.1186/s12915-019-0677-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Cartwright P, et al. Exceptionally preserved jellyfishes from the middle cambrian. PLoS One. 2007;2:e1121. doi: 10.1371/journal.pone.0001121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Technau, U., Genikhovich, G. & Kraus, J. E. M. Cnidaria. in Evolutionary Developmental Biology of Invertebrates 1 (ed. Wanninger, A.) 115–163 (Springer Vienna, 2015).
- 15.Zapata F, et al. Phylogenomic analyses support traditional relationships within Cnidaria. PLoS One. 2015;10:e0139068. doi: 10.1371/journal.pone.0139068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Technau U, Steele RE. Evolutionary crossroads in developmental biology: Cnidaria. Development. 2011;138:1447–1458. doi: 10.1242/dev.048959. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Röttinger, E. Nematostella vectensis, an emerging model for deciphering the molecular and cellular mechanisms underlying whole-body regeneration. Cells10, 2692 (2021). [DOI] [PMC free article] [PubMed]
- 18.Rentzsch F, Technau U. Genomics and development of Nematostella vectensis and other anthozoans. Curr. Opin. Genet. Dev. 2016;39:63–70. doi: 10.1016/j.gde.2016.05.024. [DOI] [PubMed] [Google Scholar]
- 19.Layden MJ, Rentzsch F, Röttinger E. The rise of the starlet sea anemone Nematostella vectensis as a model system to investigate development and regeneration: overview of starlet sea anemone Nematostella vectensis. Wiley Interdiscip. Rev. Dev. Biol. 2016;5:408–428. doi: 10.1002/wdev.222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hu M, Zheng X, Fan C-M, Zheng Y. Lineage dynamics of the endosymbiotic cell type in the soft coral Xenia. Nature. 2020;582:534–538. doi: 10.1038/s41586-020-2385-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Nong W, et al. Jellyfish genomes reveal distinct homeobox gene clusters and conservation of small RNA processing. Nat. Commun. 2020;11:3051. doi: 10.1038/s41467-020-16801-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Leclère L, et al. The genome of the jellyfish Clytia hemisphaerica and the evolution of the cnidarian life-cycle. Nat. Ecol. Evol. 2019;3:801–810. doi: 10.1038/s41559-019-0833-2. [DOI] [PubMed] [Google Scholar]
- 23.Khalturin K, et al. Medusozoan genomes inform the evolution of the jellyfish body plan. Nat. Ecol. Evol. 2019;3:811–822. doi: 10.1038/s41559-019-0853-y. [DOI] [PubMed] [Google Scholar]
- 24.Ohdera, A. et al. Box, stalked, and upside-down? Draft genomes from diverse jellyfish (Cnidaria, Acraspeda) lineages: Alatina alata (Cubozoa), Calvadosia cruxmelitensis (Staurozoa), and Cassiopea xamachana (Scyphozoa). Gigascience8, giz069 (2019). [DOI] [PMC free article] [PubMed]
- 25.Kim H-M, et al. The genome of the giant Nomura’s jellyfish sheds light on the early evolution of active predation. BMC Biol. 2019;17:28. doi: 10.1186/s12915-019-0643-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Gold DA, et al. The genome of the jellyfish Aurelia and the evolution of animal complexity. Nat. Ecol. Evol. 2019;3:96. doi: 10.1038/s41559-018-0719-8. [DOI] [PubMed] [Google Scholar]
- 27.Shinzato C, et al. Using the Acropora digitifera genome to understand coral responses to environmental change. Nature. 2011;476:320–323. doi: 10.1038/nature10249. [DOI] [PubMed] [Google Scholar]
- 28.Wilson, T. J., Lazner, F., Kola, I. & Hertzog, P. J. The mouse and the genomic era. in Comparative Genomics (ed. Clark, M. S.) 97–121 (Springer US, 2000).
- 29.Gosse, P.H. Notes on some new or little known marine animals. The Annals and Magazine of Natural History, ser. 2. 12, 153–159 (1853).
- 30.Wilson, E. Scolanthus callimorphus Worm anemone. in Marine Life Information Network: Biology and Sensitivity Key Information Reviews, [on-line] (2005).
- 31.Heger P, Marin B, Bartkuhn M, Schierenberg E, Wiehe T. The chromatin insulator CTCF and the emergence of metazoan diversity. Proc. Natl Acad. Sci. 2012;109:17507–17512. doi: 10.1073/pnas.1111941109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Baumgarten S, et al. The genome of Aiptasia, a sea anemone model for coral symbiosis. Proc. Natl Acad. Sci. USA. 2015;112:11893–11898. doi: 10.1073/pnas.1513318112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Seppey, M., Manni, M. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness. in Gene Prediction: Methods and Protocols (ed. Kollmar, M.) 227–245 (Springer, 2019). [DOI] [PubMed]
- 34.Fredman, D., Schwaiger, M., Rentzsch, F. & Technau, U. Nematostella vectensis transcriptome and gene models v2.0. Preprint at https://figshare.com/articles/Nematostella_vectensis_transcriptome_and_gene_models_v2_0/807696 (2013).
- 35.Cole AG, et al. Muscle cell-type diversification is driven by bHLH transcription factor expansion and extensive effector gene duplications. Nat. Commun. 2023;14:1747. doi: 10.1038/s41467-023-37220-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Steger J, et al. Single-cell transcriptomics identifies conserved regulators of neuroglandular lineages. Cell Rep. 2022;40:111370. doi: 10.1016/j.celrep.2022.111370. [DOI] [PubMed] [Google Scholar]
- 37.Guo L, et al. An adaptable chromosome preparation methodology for use in invertebrate research organisms. BMC Biol. 2018;16:25. doi: 10.1186/s12915-018-0497-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Zacharias H, Anokhin B, Khalturin K, Bosch TCG. Genome sizes and chromosomes in the basal metazoan Hydra. Zoology. 2004;107:219–227. doi: 10.1016/j.zool.2004.04.005. [DOI] [PubMed] [Google Scholar]
- 39.Li, Y. et al. Chromosome-level reference genome of the jellyfish Rhopilema esculentum. Gigascience9, giaa036 (2020). [DOI] [PMC free article] [PubMed]
- 40.Simakov O, et al. Deeply conserved synteny and the evolution of metazoan chromosomes. Sci. Adv. 2022;8:eabi5884. doi: 10.1126/sciadv.abi5884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Simakov O, et al. Deeply conserved synteny resolves early events in vertebrate evolution. Nat. Ecol. Evol. 2020;4:820–830. doi: 10.1038/s41559-020-1156-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Dehal P, Boore JL. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 2005;3:e314. doi: 10.1371/journal.pbio.0030314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Pollard SL, Holland PWH. Evidence for 14 homeobox gene clusters in human genome ancestry. Curr. Biol. 2000;10:1059–1062. doi: 10.1016/S0960-9822(00)00676-X. [DOI] [PubMed] [Google Scholar]
- 44.Braasch I, et al. The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons. Nat. Genet. 2016;48:427–437. doi: 10.1038/ng.3526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Wang S, et al. Scallop genome provides insights into evolution of bilaterian karyotype and development. Nat. Ecol. Evol. 2017;1:1–12. doi: 10.1038/s41559-017-0120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Qu Z, et al. Millipede genomes reveal unique adaptations during myriapod evolution. PLoS Biol. 2020;18:e3000636. doi: 10.1371/journal.pbio.3000636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Kenny NJ, et al. Tracing animal genomic evolution with the chromosomal-level assembly of the freshwater sponge Ephydatia muelleri. Nat. Commun. 2020;11:3676. doi: 10.1038/s41467-020-17397-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Kapli P, Telford MJ. Topology-dependent asymmetry in systematic errors affects phylogenetic placement of Ctenophora and Xenacoelomorpha. Sci. Adv. 2020;6:eabc5162. doi: 10.1126/sciadv.abc5162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Simakov O, et al. Insights into bilaterian evolution from three spiralian genomes. Nature. 2013;493:526–531. doi: 10.1038/nature11696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Butts T, Holland PWH, Ferrier DEK. The urbilaterian Super-Hox cluster. Trends Genet. 2008;24:259–262. doi: 10.1016/j.tig.2007.09.006. [DOI] [PubMed] [Google Scholar]
- 51.Ferrier, D. E. K. Evolution of homeobox gene clusters in animals: the giga-cluster and primary vs. secondary clustering. Front. Ecol. Evol.4, 36 (2016).
- 52.Chourrout D, et al. Minimal ProtoHox cluster inferred from bilaterian and cnidarian Hox complements. Nature. 2006;442:684–687. doi: 10.1038/nature04863. [DOI] [PubMed] [Google Scholar]
- 53.Simakov O, et al. Hemichordate genomes and deuterostome origins. Nature. 2015;527:459–465. doi: 10.1038/nature16150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Larroux C, et al. The NK homeobox gene cluster predates the origin of hox genes. Curr. Biol. 2007;17:706–710. doi: 10.1016/j.cub.2007.03.008. [DOI] [PubMed] [Google Scholar]
- 55.Friedman N, Rando OJ. Epigenomics and the structure of the living genome. Genome Res. 2015;25:1482–1490. doi: 10.1101/gr.190165.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Fletcher C, et al. The genome sequence of the starlet sea anemone, Nematostella vectensis (Stephenson, 1935) Wellcome Open Res. 2023;8:79. doi: 10.12688/wellcomeopenres.18991.1. [DOI] [Google Scholar]
- 57.Rodríguez-Carballo E, et al. The HoxD cluster is a dynamic and resilient TAD boundary controlling the segregation of antagonistic regulatory landscapes. Genes Dev. 2017;31:2264–2281. doi: 10.1101/gad.307769.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Cazet, J. F. et al. A chromosome-scale epigenetic map of the Hydra genome reveals conserved regulators of cell state. Genome Res. 33, 283–298 (2023). [DOI] [PMC free article] [PubMed]
- 59.He S, et al. An axial Hox code controls tissue segmentation and body patterning in Nematostella vectensis. Science. 2018;361:1377–1380. doi: 10.1126/science.aar8384. [DOI] [PubMed] [Google Scholar]
- 60.McGinnis W, Krumlauf R. Homeobox genes and axial patterning. Cell. 1992;68:283–302. doi: 10.1016/0092-8674(92)90471-N. [DOI] [PubMed] [Google Scholar]
- 61.Ryan JF, et al. Pre-bilaterian origins of the Hox cluster and the Hox code: evidence from the sea anemone, Nematostella vectensis. PLoS One. 2007;2:e153. doi: 10.1371/journal.pone.0000153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Finnerty JR, Pang K, Burton P, Paulson D, Martindale MQ. Origins of BILATERAL SYMMETRY: Hox and Dpp expression in a sea anemone. Science. 2004;304:1335–1337. doi: 10.1126/science.1091946. [DOI] [PubMed] [Google Scholar]
- 63.Lazar NH, et al. Epigenetic maintenance of topological domains in the highly rearranged gibbon genome. Genome Res. 2018;28:983–997. doi: 10.1101/gr.233874.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Rowley MJ, Corces VG. Organizational principles of 3D genome architecture. Nat. Rev. Genet. 2018;19:789–800. doi: 10.1038/s41576-018-0060-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Szabo Q, Bantignies F, Cavalli G. Principles of genome folding into topologically associating domains. Sci. Adv. 2019;5:eaaw1668. doi: 10.1126/sciadv.aaw1668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Sacerdot C, Louis A, Bon C, Berthelot C, Roest Crollius H. Chromosome evolution at the origin of the ancestral vertebrate genome. Genome Biol. 2018;19:166. doi: 10.1186/s13059-018-1559-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Clarence T, et al. Robust 3D modeling reveals spatiosyntenic properties of animal genomes. iScience. 2023;26:106136. doi: 10.1016/j.isci.2023.106136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Sinigaglia C, Busengdal H, Leclère L, Technau U, Rentzsch F. The bilaterian head patterning gene six3/6 controls aboral domain development in a cnidarian. PLoS Biol. 2013;11:e1001488. doi: 10.1371/journal.pbio.1001488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Schwaiger M, et al. An ancestral Wnt-Brachyury feedback loop in axial patterning and recruitment of mesoderm-determining target genes. Nat. Ecol. Evol. 2022;6:1921–1939. doi: 10.1038/s41559-022-01905-w. [DOI] [PubMed] [Google Scholar]
- 70.Kusserow A, et al. Unexpected complexity of the Wnt gene family in a sea anemone. Nature. 2005;433:156–160. doi: 10.1038/nature03158. [DOI] [PubMed] [Google Scholar]
- 71.Lengfeld T, et al. Multiple Wnts are involved in Hydra organizer formation and regeneration. Dev. Biol. 2009;330:186–199. doi: 10.1016/j.ydbio.2009.02.004. [DOI] [PubMed] [Google Scholar]
- 72.Marlétaz, F. et al. The little skate genome and the evolutionary emergence of wing-like fin appendages. bioRxiv 2022.03.21.485123 (2022) 10.1101/2022.03.21.485123.
- 73.Crane E, et al. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature. 2015;523:240–244. doi: 10.1038/nature14450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Kim, J. et al. Condensin DC loads and spreads from recruitment sites to create loop-anchored TADs in C. elegans. Elife11, e68745 (2022). [DOI] [PMC free article] [PubMed]
- 75.Fritzenwanker JH, Technau U. Induction of gametogenesis in the basal cnidarian Nematostella vectensis (Anthozoa) Dev. Genes Evol. 2002;212:99–103. doi: 10.1007/s00427-002-0214-7. [DOI] [PubMed] [Google Scholar]
- 76.Chen, J. & Dellaporta, S. Urea-based plant DNA Miniprep. in The Maize Handbook (eds. Freeling, M. & Walbot, V.) 526–527 (Springer, 1994).
- 77.Sanders SM, et al. CRISPR/Cas9-mediated gene knockin in the hydroid Hydractinia symbiolongicarpus. BMC Genom. 2018;19:649. doi: 10.1186/s12864-018-5032-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Putnam NH, et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 2016;26:342–350. doi: 10.1101/gr.193474.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Vurture GW, et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 2017;33:2202–2204. doi: 10.1093/bioinformatics/btx153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Koren S, et al. Canu: scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation. Genome Res. 2017;27:722–736. doi: 10.1101/gr.215087.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Roach MJ, Schmidt SA, Borneman AR. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinform. 2018;19:460. doi: 10.1186/s12859-018-2485-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Pryszcz LP, Gabaldón T. Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res. 2016;44:e113–e113. doi: 10.1093/nar/gkw294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Bao W, Kojima KK, Kohany O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob. DNA. 2015;6:11. doi: 10.1186/s13100-015-0041-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Smit, A., Hubley, R. & Green, P. RepeatMasker Open-4.0. (2013).
- 86.Flynn JM, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. 2020;117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997 [q-bio] (2013).
- 88.Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Faust GG, Hall IM. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics. 2014;30:2503–2505. doi: 10.1093/bioinformatics/btu314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Burton JN, et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 2013;31:1119–1125. doi: 10.1038/nbt.2727. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Durand NC, et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 2016;3:99–101. doi: 10.1016/j.cels.2015.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 9, R12 (2004). [DOI] [PMC free article] [PubMed]
- 93.Hunt M, et al. REAPR: a universal tool for genome assembly evaluation. Genome Biol. 2013;14:R47. doi: 10.1186/gb-2013-14-5-r47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Praher D, et al. Conservation and turnover of miRNAs and their highly complementary targets in early branching animals. Proc. R. Soc. B Biol. Sci. 2021;288:20203169. doi: 10.1098/rspb.2020.3169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Grabherr MG, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011;29:644–652. doi: 10.1038/nbt.1883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–3152. doi: 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–1659. doi: 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
- 98.Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity. Nat. Protoc. 8, 1494–1512 (2013). [DOI] [PMC free article] [PubMed]
- 99.Srivastava, A., Malik, L., Sarkar, H. et al. Alignment and mapping methodology influence transcript abundance estimation. Genome Biol21, 239 (2020). [DOI] [PMC free article] [PubMed]
- 100.Kuo RI, et al. Illuminating the dark side of the human transcriptome with long read transcript sequencing. BMC Genom. 2020;21:751. doi: 10.1186/s12864-020-07123-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Kovaka S, et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 2019;20:278. doi: 10.1186/s13059-019-1910-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Dobin A, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Mistry J, et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 2021;49:D412–D419. doi: 10.1093/nar/gkaa913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Eddy SR. Accelerated profile HMM searches. PLoS Comput. Biol. 2011;7:e1002195. doi: 10.1371/journal.pcbi.1002195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30:923–930. doi: 10.1093/bioinformatics/btt656. [DOI] [PubMed] [Google Scholar]
- 107.Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. BRAKER1: unsupervised RNA-seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics. 2016;32:767–769. doi: 10.1093/bioinformatics/btv661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: incorporating protein homology information into gene prediction with GeneMark-EP and AUGUSTUS. 1 (2018).
- 109.Stanke M, Schöffmann O, Morgenstern B, Waack S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinforma. 2006;7:62. doi: 10.1186/1471-2105-7-62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Huerta-Cepas J, et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol. Biol. Evol. 2017;34:2115–2122. doi: 10.1093/molbev/msx148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.El-Gebali S, et al. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47:D427–D432. doi: 10.1093/nar/gky995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.de Mendoza A, et al. Transcription factor evolution in eukaryotes and the assembly of the regulatory toolkit in multicellular lineages. Proc. Natl Acad. Sci. 2013;110:E4858–E4866. doi: 10.1073/pnas.1311818110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Kent WJ. BLAT—The BLAST-like alignment tool. Genome Res. 2002;12:656–664. doi: 10.1101/gr.229202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Matus DQ, Pang K, Daly M, Martindale MQ. Expression of Pax gene family members in the anthozoan cnidarian, Nematostella vectensis: Pax gene expression in Nematostella vectensis. Evol. Dev. 2007;9:25–38. doi: 10.1111/j.1525-142X.2006.00135.x. [DOI] [PubMed] [Google Scholar]
- 115.Ryan JF, et al. The cnidarian-bilaterian ancestor possessed at least 56 homeoboxes: evidence from the starlet sea anemone, Nematostella vectensis. Genome Biol. 2006;7:R64. doi: 10.1186/gb-2006-7-7-r64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Mazza ME, Pang K, Martindale MQ, Finnerty JR. Genomic organization, gene structure, and developmental expression of three Clustered otx genes in the sea anemone Nematostella vectensis. J. Exp. Zool. B Mol. Dev. Evol. 2007;308B:494–506. doi: 10.1002/jez.b.21158. [DOI] [PubMed] [Google Scholar]
- 117.Matus DQ, Thomsen GH, Martindale MQ. Dorso/ventral genes are asymmetrically expressed and involved in germ-layer demarcation during cnidarian gastrulation. Curr. Biol. 2006;16:499–505. doi: 10.1016/j.cub.2006.01.052. [DOI] [PubMed] [Google Scholar]
- 118.Hudry B, et al. Molecular insights into the origin of the Hox-TALE patterning system. Elife. 2014;3:e01939. doi: 10.7554/eLife.01939. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Mazza ME, Pang K, Reitzel AM, Martindale MQ, Finnerty JR. A conserved cluster of three PRD-class homeobox genes (homeobrain, rx and orthopedia) in the Cnidaria and Protostomia. Evodevo. 2010;1:3. doi: 10.1186/2041-9139-1-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Ying H, et al. The Whole-genome sequence of the coral Acropora millepora. Genome Biol. Evol. 2019;11:1374–1379. doi: 10.1093/gbe/evz077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Consortium*, The C. elegans Sequencing. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science. 1998;282:2012–2018. doi: 10.1126/science.282.5396.2012. [DOI] [PubMed] [Google Scholar]
- 122.Hoskins RA, et al. The Release 6 reference sequence of the Drosophila melanogaster genome. Genome Res. 2015;25:445–458. doi: 10.1101/gr.185579.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.Hydra 2.0 Web Portal. https://research.nhgri.nih.gov/hydra.
- 124.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 2013;30:772–780. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 125.Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–1973. doi: 10.1093/bioinformatics/btp348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126.Minh BQ, et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 2020;37:1530–1534. doi: 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127.Sanderson MJ. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics. 2003;19:301–302. doi: 10.1093/bioinformatics/19.2.301. [DOI] [PubMed] [Google Scholar]
- 128.dos Reis M, et al. Uncertainty in the timing of origin of animals and the limits of precision in molecular timescales. Curr. Biol. 2015;25:2939–2950. doi: 10.1016/j.cub.2015.09.066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 129.Camacho C, et al. BLAST+: architecture and applications. BMC Bioinforma. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130.Dimitrieva S, Bucher P. UCNEbase—a database of ultraconserved non-coding elements and genomic regulatory blocks. Nucleic Acids Res. 2013;41:D101–D109. doi: 10.1093/nar/gks1092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 131.Roth ACJ, Gonnet GH, Dessimoz C. Algorithm of OMA for large-scale orthology inference. BMC Bioinforma. 2008;9:518. doi: 10.1186/1471-2105-9-518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132.Moya A, et al. Whole transcriptome analysis of the coral Acropora millepora reveals complex responses to CO2-driven acidification during the initiation of calcification. Mol. Ecol. 2012;21:2440–2454. doi: 10.1111/j.1365-294X.2012.05554.x. [DOI] [PubMed] [Google Scholar]
- 133.Zakas, C., Harry, N. D., Scholl, E. H. & Rockman M. V. The Genome of the Poecilogonous Annelid Streblospio benedicti. Genome Biol Evol.14, evac008 (2022). [DOI] [PMC free article] [PubMed]
- 134.Shingate P, et al. Chromosome-level assembly of the horseshoe crab genome provides insights into its genome evolution. Nat. Commun. 2020;11:2322. doi: 10.1038/s41467-020-16180-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135.Harris TW, et al. WormBase: a modern model organism information resource. Nucleic Acids Res. 2020;48:D762–D767. doi: 10.1093/nar/gkz920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 136.Srivastava M, et al. The Trichoplax genome and the nature of placozoans. Nature. 2008;454:955–960. doi: 10.1038/nature07191. [DOI] [PubMed] [Google Scholar]
- 137.Traag VA, Waltman L, van Eck NJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 2019;9:5233. doi: 10.1038/s41598-019-41695-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 138.Schultz DT, et al. A chromosome-scale genome assembly and karyotype of the ctenophore Hormiphora californensis. G3. 2021;11:jkab302. doi: 10.1093/g3journal/jkab302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139.Blaxter M, et al. The genome sequence of the spotted kaleidoscope jellyfish, Haliclystus octoradiatus (James-Clark, 1863) Wellcome Open Res. 2023;8:6. doi: 10.12688/wellcomeopenres.18669.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 140.Wood C, et al. The genome sequence of the orange-striped anemone, Diadumene lineata (Verrill, 1869) Wellcome Open Res. 2022;7:93. doi: 10.12688/wellcomeopenres.17763.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 141.Adams M, et al. One fly-one genome: chromosome-scale genome assembly of a single outbred Drosophila melanogaster. Nucleic Acids Res. 2020;48:e75. doi: 10.1093/nar/gkaa450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 142.Kenny, N. J. et al. The gene-rich genome of the scallop Pecten maximus. Gigascience9, giaa037 (2020). [DOI] [PMC free article] [PubMed]
- 143.Mouse Genome Sequencing Consortium. et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–562. doi: 10.1038/nature01262. [DOI] [PubMed] [Google Scholar]
- 144.Vietri Rudan M, et al. Comparative Hi-C reveals that CTCF underlies evolution of chromosomal domain architecture. Cell Rep. 2015;10:1297–1309. doi: 10.1016/j.celrep.2015.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 145.Zhang H, et al. Fast alignment and preprocessing of chromatin profiles with Chromap. Nat. Commun. 2021;12:1–6. doi: 10.1038/s41467-021-26865-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 146.Ramírez F, et al. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. 2018;9:189. doi: 10.1038/s41467-017-02525-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 147.Abdennur N, Mirny LA. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics. 2020;36:311–316. doi: 10.1093/bioinformatics/btz540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 148.Kruse K, Hug CB, Vaquerizas JM. FAN-C: a feature-rich framework for the analysis and visualisation of chromosome conformation capture data. Genome Biol. 2020;21:303. doi: 10.1186/s13059-020-02215-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 149.Tamura K, Stecher G, Kumar S. MEGA11: molecular evolutionary genetics analysis version 11. Mol. Biol. Evol. 2021;38:3022–3027. doi: 10.1093/molbev/msab120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 150.Buenrostro JD, Wu B, Chang HY, Greenleaf WJ. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 2015;109:21.29.1–21.29.9. doi: 10.1002/0471142727.mb2129s109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 151.Zou Z, Ohta T, Miura F, Oki S. ChIP-Atlas 2021 update: a data-mining suite for exploring epigenomic landscapes by fully integrating ChIP-seq, ATAC-seq and Bisulfite-seq data. Nucleic Acids Res. 2022;50:W175–W182. doi: 10.1093/nar/gkac199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152.Oki, S. et al. ChIP-Atlas: a data-mining suite powered by full integration of public ChIP-seq data. EMBO Rep. 19, e46255 (2018). [DOI] [PMC free article] [PubMed]
- 153.Zimmermann, B., Simakov, O. & Montenegro, J. D. Nijibabulu/cnidariangenomes: v1.2.1 Zenodo. 10.5281/ZENODO.8407554 (2023).
- 154.Zimmermann, B. Source data for sea anemone genomes paper. 10.6084/m9.figshare.24258598.v1 (2023).
- 155.Voolstra CR, et al. Comparative analysis of the genomes of Stylophora pistillata and Acropora digitifera provides evidence for extensive differences between species of corals. Sci. Rep. 2017;7:1–14. doi: 10.1038/s41598-017-17484-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 156.Cunning R, Bay RA, Gillette P, Baker AC, Traylor-Knowles N. Comparative analysis of the Pocillopora damicornis genome highlights role of immune system in coral evolution. Sci. Rep. 2018;8:16134. doi: 10.1038/s41598-018-34459-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 157.Jiang, J. B. et al. A hybrid de novo assembly of the sea pansy (Renilla muelleri) genome. Gigascience8, giz026 (2019). [DOI] [PMC free article] [PubMed]
- 158.Jeon Y, et al. The draft genome of an octocoral, Dendronephthya gigantea. Genome Biol. Evol. 2019;11:949–953. doi: 10.1093/gbe/evz043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 159.Marlétaz F, et al. Amphioxus functional genomics and the origins of vertebrate gene regulation. Nature. 2018;564:64–70. doi: 10.1038/s41586-018-0734-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
A description of the tools and algorithms used for this work are described in the methods section and the supplement. Custom code used to analyze the data are available at https://github.com/nijibabulu/cnidariangenomes153,154.
All raw data and assembled genomes are available via the National Center for Biotechnology Information under the accessions PRJNA667495, PRJNA1036184 and PRJNA430035. The assembled genomes can be downloaded, browsed and searched on publicly available browsers at https://simrbase.stowers.org/starletseaanemone and https://simrbase.stowers.org/wormanemone. Publicly available data that were used in this manuscript include SRR1771322-SRR1771324 [https://www.ncbi.nlm.nih.gov/bioproject/PRJNA273476] (mouse Hi-C), SRR10512944 (D. melanogaster Hi-C), and ERR8571699 (Wellcome Sanger N. vectensis Hi-C). Data necessary to reproduce the analyses together with the code is available for download via figshare [10.6084/m9.figshare.24258598.v2].