Abstract
Deuterostomes are a monophyletic group of animals that includes Hemichordata, Echinodermata (together called Ambulacraria), and Chordata. The diversity of deuterostome body plans has made it challenging to reconstruct their ancestral condition and to decipher the genetic changes that drove the diversification of deuterostome lineages. Here, we generate chromosome-level genome assemblies of 2 hemichordate species, Ptychodera flava and Schizocardium californicum, and use comparative genomic approaches to infer the chromosomal architecture of the deuterostome common ancestor and delineate lineage-specific chromosomal modifications. We show that hemichordate chromosomes (1N = 23) exhibit remarkable chromosome-scale macrosynteny when compared to other deuterostomes and can be derived from 24 deuterostome ancestral linkage groups (ALGs). These deuterostome ALGs in turn match previously inferred bilaterian ALGs, consistent with a relatively short transition from the last common bilaterian ancestor to the origin of deuterostomes. Based on this deuterostome ALG complement, we deduced chromosomal rearrangement events that occurred in different lineages. For example, a fusion-with-mixing event produced an Ambulacraria-specific ALG that subsequently split into 2 chromosomes in extant hemichordates, while this homologous ALG further fused with another chromosome in sea urchins. Orthologous genes distributed in these rearranged chromosomes are enriched for functions in various developmental processes. We found that the deeply conserved Hox clusters are located in highly rearranged chromosomes and that maintenance of the clusters are likely due to lower densities of transposable elements within the clusters. We also provide evidence that the deuterostome-specific pharyngeal gene cluster was established via the combination of 3 pre-assembled microsyntenic blocks. We suggest that since chromosomal rearrangement events and formation of new gene clusters may change the regulatory controls of developmental genes, these events may have contributed to the evolution of diverse body plans among deuterostomes.
The diversity of deuterostome body plans has made it challenging to reconstruct their ancestral condition and to understand their diversification. This study uses chromosome-level genome assemblies of two hemichordates to help infer the genomic architecture of the deuterostome common ancestor and subsequent lineage-specific rearrangement events.
Introduction
The evolutionary events that gave rise to the diverse body plans of deuterostomes remains one of the major mysteries in biology. It is widely accepted that the Deuterostomia includes Echinodermata, Hemichordata, and Chordata, as these animals are characterized by several unique developmental and morphological features, including radial cleavage, deuterostomy, enterocoely formation of the mesoderm, mesoderm-derived skeletal tissues, and pharyngeal openings/slits [1–3]. Despite these common characters, the different deuterostome lineages have evolved distinct body plans. Chordates are defined by their dorsal tubular central nervous system, notochord, and segmented somites [4], while echinoderms evolved a pentaradially symmetrical adult body, calcitic endoskeleton, and a water vascular system [5]; and hemichordates are characterized by a tripartite body organization, which includes a proboscis, collar, and trunk [6]. Molecular phylogenetic analyses have supported a sister group relationship between Echinodermata and Hemichordata, forming a clade called Ambulacraria [3,7,8] (Fig 1A). While subsequent phylogenomic studies have reinforced support for the ambulacrarian clade, some have suggested a sister group relationship between Ambulacraria and Xenacoelomorpha (a group of marine worms lacking definitive coeloms) and even questioned the monophyletic grouping of the Deuterostomia [9–12]. Due to the long evolutionary history of deuterostome lineages and the difficulties in assigning definitive stem fossils during the early diversification of the group, it remains challenging to postulate the ancestral condition of their common ancestor, let alone to decipher the genomic basis underlying the origins of diverse body plans and phylogenetic affiliations. To address these issues, it is helpful to reconstruct the ancestral genome architectures at major nodes of the animal tree using species that occupy key phylogenetic positions, and trace the subsequent evolutionary trajectories along each lineage.
Comparison of diverse metazoan genomes has revealed extensive conservation of chromosome-scale linkage (i.e., “macrosynteny”) across animals [13–17] and enabled the reconstruction of ancestral chromosome-scale units (chromosomes or chromosome arms) [18–21]. These reconstructions have been used to identify shared and derived synteny patterns that can help to resolve long-standing evolutionary questions, infer lineage-specific chromosomal rearrangements, and clarify animal phylogenetic relationships that have been difficult to resolve using conventional phylogenetic approaches [18–23]. For example, identifications of synapomorphic traits of chromosomal fusion-with-mixing events among sponge, cnidarian, and bilaterian genomes provide strong evidence to support the hypothesis that ctenophores are the sister group to all other animals [18].
Among deuterostomes, vertebrates show extensive genomic duplications [20], but comparisons of sea urchin with other bilaterians [19], and analysis of sub-chromosomal assemblies of hemichordates [15] (1) implied that the chromosomes of the deuterostome ancestor retained the 24 bilaterian ancestral linkage groups (BALGs); and (2) identified subsequent rearrangement in the sea urchin and chordate lineages [19]. Assembling a complete picture of deuterostome genome evolution, however, requires comparisons including chromosome-scale assemblies of hemichordates. Analyses of karyotype evolution including all deuterostome phylum-level lineages could yield important insights into deuterostome ancestry and the evolution of their diverse body plans.
Hemichordates comprise 2 groups, the solitary enteropneusts and the colonial pterobranchs. In this study, we generated chromosome-level genome assemblies for 2 enteropneusts, the ptychoderid Ptychodera flava and spengelid Schizocardium californicum. Phylogenomic data showed that Ptychoderidae and Spengelidae are sister groups, together with Harrimaniidae constituting Enteropneusta [7,24]. Our comparative genomic analysis showed remarkable macro-syntenic conservation among deuterostome species. Based on the principle of parsimony and comparative analyses with outgroups, we deduced that the last common ancestor of deuterostomes possessed 24 ancestral linkage group (ALGs) that match the BALGs as previously proposed [19]. We also discovered lineage-specific rearrangements that reflect the temporal progression towards the chromosomal architectures of extant deuterostomes. While our phylogenetic analysis using synteny-based characters supports a monophyletic deuterostome grouping, we did not identify shared derived macrochromosomal rearrangements that distinguish deuterostomes from other bilaterians. Our results confirm that the genomic architectures of deuterostomes retain more ancestral traits than those of protostomes, consistent with a very short evolutionary distance from the last common ancestor of bilaterians to the origin of deuterostomes. Our study thus provides a roadmap for understanding chromosomal evolution and contributes to deciphering the possible developmental genetic changes underlying the emergence of diverse body plans in deuterostomes.
Results and discussion
Chromosome-level genome assemblies of 2 hemichordates
Deuterostomes are composed of 3 major phyla, including hemichordates, echinoderms, and chordates, with the former 2 constituting a group called Ambulacraria (Fig 1A). Previous short read-based genome sequencing of 2 hemichordate species, Saccoglossus kowalevskii and Ptychodera flava, provided a cornerstone for studies on deuterostome evolution [15]. The fragmented nature of these genome assemblies, however, limits our understanding of chromosome evolution among deuterostome lineages. To address this issue, we employed PacBio long-read and HiC technologies to sequence genomes of 2 enteropneust hemichordates P. flava (PFL) and Schizocardium californicum (SCA) (S1 Fig). The long read-based genome assemblies of PFL and SCA consist of 1.16 Gbp and 0.93 Gbp, respectively (S1 Fig). After consideration of HiC contacts (S2 Fig), 23 chromosome-scale scaffolds were obtained for both genomes, which matches the 2N = 46 karyotype of PFL [15]. Protein-coding genes were annotated in the 2 genome assemblies using transcriptome data and ab initio prediction approaches, resulting in 35,856 (PFL) and 27,463 (SCA) annotated genes with high BUSCO scores (S1 Fig). Therefore, these 2 hemichordate genome assemblies reached chromosome level with high completeness in gene annotation.
The 23 chromosomes of the 2 hemichordate species generally exhibit a one-to-one correspondence based on pairwise comparisons of the positions of orthologous genes (Figs 1B and S3A). This correspondence further supports the chromosomal-scale accuracy of the independently conducted genome assemblies, since conserved syntnies are unlikely to be generated spuriously by assembly errors. Extending this analysis to sea urchin (Strongylocentrotus purpuratus, SPU) and amphioxus (Branchiostoma floridae, BFL), which are representative echinoderm and chordate species, we confirmed chromosome-scale syntenic conservation (macrosynteny) among deuterostomes (Figs 1B and S3B). Given that macro-syntenic conservation has been used to reconstruct ancestral genome architectures and identify lineage-specific chromosomal rearrangement events [19,20], we broadened the synteny analysis by including additional species within and outside the deuterostome superphylum. This approach allowed us to confirm the genomic architecture of the last common ancestor (LCA) of deuterostomes and explore how it evolved among deuterostome lineages.
Reduction of chromosome numbers during deuterostome evolution
To reconstruct the ancestral chromosomal architectures at key phylogenetic nodes in deuterostomes and investigate the evolutionary history of chromosomal changes, we carried out pairwise genome comparisons of multiple deuterostomes (S4 Fig). To identify orthologous chromosomes between species in an unbiased fashion, we employed Fisher’s exact test with Bonferroni correction and risk difference to designate chromosome pairs containing orthologous genes (see Methods). Following refs. 19 and 20, we reasoned that the syntenic units that are conserved between genomes are most likely descended from a common ALG in the LCA of the 2 species under investigation. We used the scallop (Patinopecten yessoensis, PYE) genome as an outgroup (S5 Fig) due to its slow evolution compared with other protostomes [25] and previously demonstrated conserved syntenies with other animals [19]. Using this comparative approach, we inferred ancestral chromosomal architectures at major nodes of the deuterostome phylogeny.
In order to reconstruct the ambulacrarian ancestral chromosomes, we compared the hemichordate PFL genome with the genomes of 2 echinoderm species (sea urchin SPU and sea star Pisaster ochraceus, POC), with the amphioxus or scallop genome serving as an outgroup (S6–S9 Figs). The dot plot between hemichordate (PFL) and sea urchin (SPU) showed 17 one-to-one corresponding chromosomes (S4A Fig), suggesting that (1) these chromosome pairs are homologous; and (2) the LCA of PFL and SPU (i.e., the ambulacrarian LCA) already possessed these 17 ALGs. We also identified several one-to-two and one-to-three corresponding chromosomes between PFL and SPU, implying that large-scale chromosomal rearrangement events occurred after the lineages diverged from the ambulacrarian LCA. We polarized the direction of chromosomal change and identified the likely ancestral state by comparing to the outgroup species. For example, P. flava PFL11 and PFL17 together correspond to S. purpuratus SPU8 (S8D Fig), implying that either PFL11 and PFL17 arose by a split of an ancestral ambulacrarian chromosome or SPU8 arose by the fusion of 2 ancestral chromosomes. Comparison with amphioxus chromosomes, however, showed that PFL11 and PFL17 respectively correspond to amphioxus BFL18 and BFL17 (S8G Fig), indicating that these 2 chromosome pairs evolved from 2 distinct ALGs in the deuterostome LCA. Based on the parsimony principle, we reasoned that hemichordates inherited the 2 ALGs directly as PFL11 and PFL17, while sea urchin SPU8 was fused from the 2 distinct ancestral chromosomes, as also noted in ref. 19 using a different sea urchin species Lytechinus variegatus (S8A Fig). By reiterating such comparisons (S6–S9 Figs), we find that the LCA of deuterostomes possessed 24 ALGs (DALGs). Importantly, these 24 DALGs correspond to the 24 BALGs deduced by Simakov and colleagues [19], confirming that the deuterostome LCA and the bilaterian LCA possessed very similar chromosomal architectures. Our notation for the deuterostome ALGs therefore follows those of the bilaterian ALGs [19]. Among the 24 DALGs, 9 remain intact in all 5 deuterostome species we investigated, while 15 have undergone lineage-specific changes (Fig 2).
Fig 2 illustrates chromosomal rearrangement events with color boxes: interspersed boxes represent chromosomal fusions followed by translocations, while checkerboards depict chromosomal fusions followed by extensive mixing, which is a common feature of deep chromosome evolution [19] (Fig 2); rearrangements were determined based on pairwise conserved syntenies between target species (S6–S9 Figs). These illustrations correspond to the chromosomal rearrangement events defined by Simakov and colleagues [19], with algebraic symbols indicating end-end fusion (●), centric insertion (↘), and fusion-with-mixing (⊗) [19]. Notably, 4 interspersed boxes correspond to end-end fusions and 5 correspond to centric insertions followed by chromosomal translocations (e.g., BFL4 and BFL2 in Fig 2B). From the 24 DALGs, we inferred that the numbers of chromosomes were reduced in a lineage-specific manner. In the lineage leading to ambulacrarians (node A in Fig 2B), DALGs B2 and C2 fused and mixed extensively to become the ambulacrarian ALG B2⊗C2, while other DALGs remained relatively intact, resulting in 23 ambulacrarian ALGs (AALGs). In the hemichordate lineage (node H in Fig 2B), AALG B2⊗C2 split into 2 chromosomes (B2⊗C2-a and B2⊗C2-b), while AALGs R and B1 fused and mixed (AALG R⊗B1) to become a single chromosome (PFL9 and SCA5, respectively), resulting in 1N = 23 chromosomes in both hemichordate species. The split of the AALG B2⊗C2 can be understood as a possible Robertsonian (i.e., centric) fission in which a presumably metacentric chromosome is transformed into 2 acrocentrics. Whether the shared chromosomal linkages of PFL and SCA represents the ancestral hemichordate state can only be determined by analysis of pterobranch hemichordate genomes, but it is clear from the pairwise comparison (S3A Fig) that no large-scale macrosyntenic changes have occurred since the last common ancestor of PFL and SCA, which lived more than 370 mya [15].
Similarly, the echinoderm LCA (node E in Fig 2B) likely possessed 23 ALGs (EALGs), with the same chromosomal architecture as the ambulacrarian LCA; subsequently, different fusion events occurred in the sea star and sea urchin lineages. In the sea star, EALGs O2 and B3 fused (O2⊗B3) and evolved into POC6, resulting in a 1N = 22 karyotype. In the sea urchin S. purpuratus, SPU8 chromosome arose through the fusion of EALGs J1 and B3, via central insertion (J1↘B3), while SPU1 arose by fusion with extensive mixing from EALGs E and B2⊗C2, denoted as E⊗(B2⊗C2), resulting in 1N = 21 chromosomes. The three-way fusion E⊗(B2⊗C2), is also shared by Lytechinus variegatus [19] and Paracentrotus lividus [26], and is therefore likely a shared derived character of the superorder Echinacea, a hypothesis that can be tested by sequencing other members of this group.
In the chordate lineage (node C in Fig 2B), orthologous genes located on DALG R were dispersed into many chromosomes [19], leading to 23 chordate ALGs (CALGs). This dispersion was inferred from the observation that no particular amphioxus chromosomes show significant enrichment of syntenic blocks corresponding to DALG R-derived chromosomes in echinoderms (SPU3 or POC12, Figs 2C, S4E and S4F). Similarly, no concentration of R was found in vertebrates or ascidians [19]. In the amphioxus B. floridae, 4 chromosomal fusion events occurred (J2↘C1, A1⊗A2, O1●I, and C2●Q), reducing the number of chromosomes to 1N = 19 [20]. The inferred chordate-specific chromosomal dispersion and the 4 chromosomal fusion events in amphioxus BFL are consistent with previous findings [19]. One of these fusion events (A1⊗A2) was also observed in the sea urchin Paracentrotus lividus [26], suggesting that A1 and A2 were arms of a metacentric chromosome that fused independently in urchin and amphioxus. From the 23 chordate ALGs, previous studies [19,20] deduced that the lineage leading to vertebrates had undergone 4 chromosomal fusion events (J1⊗J2, C1⊗C2, O1⊗O2, and B1⊗B2⊗B3), reducing the 23 CALGs to 18 vertebrate ALGs. These chromosomal rearrangement events and the evolutionary history of genomic architectures among deuterostomes are summarized in Fig 2.
Stepwise changes in chromosomal architectures within the sea urchin lineage
We expect that chromosomal fusion-with-mixing events would occur in a stepwise process as evolution proceeds. As such, 2 distinct chromosomes (at t0) would fuse (at t1), either by end-end fusion or centric insertion, and this event would be followed by rounds of intrachromosomal inversions and translocations (at t2) until the fused chromosome became scrambled (at ts) (as illustrated in S10 Fig). We therefore postulate that comparing chromosome architectures between species with a relatively short divergence time should allow us to identify the evolutionary state of individual chromosomes during this stepwise process. We thus analyzed 2 additional sea urchin species, L. variegatus (LVA) and L. pictus (LPI), for which chromosomal-level genome assemblies are available for syntenic comparison [16,27]. LVA and LPI are within the genus Lytechinus, which share a common ancestor with S. purpuratus 50 million years ago (mya) [28]. By analyzing syntenic conservation of these 3 sea urchin species (S11 Fig), we inferred that their LCA (tentatively assumed to be sea urchin LCA) possessed 21 ALGs (SALGs) due to 2 shared chromosomal fusion events, J1↘B3 and E⊗(B2⊗C2) (node S in S10 Fig). These 2 fusions were also observed in the recently decoded sea urchin P. lividus genome [26], indicating a common genomic trait of currently available sea urchin genomes. We also deduced 20 ALGs (LALGs) in the Lytechinus LCA, owing to a Lytechinus-specific chromosomal fusion event (G●D) (node L in S10 Fig). Descending from the Lytechinus LCA, L. variegatus and L. pictus each underwent a distinct chromosomal fusion event, F●(J1⊗B3) into L. variegatus LVA1 and F●C1 into L. pictus LPI5, independently resulting in 1N = 19 chromosomes for both species.
Based on the phylogenetic relationships and deduced chromosomal architectures (S10 Fig), we construct a putative history of several chromosomal fusion events. For example, 2 echinoderm ALGs (J1 and B3 at t0) fused via centric insertion after which a translocation event resulted in the sea urchin ALG J1↘B3 (at t2). This chromosome then underwent extensive recombinations to become the Lytechinus ALG J1⊗B3 (at ts). In the lineage leading to L. variegatus, but not L. pictus, end-end fusion of Lytechinus ALGs F and J1⊗B3 resulted in the extant LVA1 chromosome (at t1). Within the LVA1 chromosome, we observed no obvious translocation between regions descended from LALGs J1⊗B3 and F, suggesting that the end-end fusion likely occurred recently in the lineage leading to L. variegatus. In L. pictus, chromosome LPI5 was derived from end-end fusion of LALGs F and C1 followed by a translocation event. Intriguingly, the independent, species-specific fusion event of the 2 Lytechinus species involved the same chromosome (LALG F). Such recent chromosomal fusions may alter recombination rate and cause reproductive isolation, as observed during nematode speciation [29]. Together, the fusion events in sea urchins clearly illustrate how stepwise changes may occur in chromosomal architectures.
In several fusion-with-mixing cases, we did not observe transitional states (e.g., SALG E⊗(B2⊗C2) resulted from EALGs E and B2⊗C2, S10 Fig), implying that these fusion events occurred at a relatively ancient time. Assuming that intrachromosomal rearrangements occurred at a constant rate, we postulate the order of fusion events based on synteny patterns. For example, in comparison with the centric insertion pattern of SALG J1↘B3, SALG E⊗(B2⊗C2) exhibits fusion-with-mixing, suggesting that the fusion of EALGs E and B2⊗C2 occurred earlier than that of EALGs J1 and B3. Therefore, from the echinoderm LCA that possessed 23 ALGs to the sea urchin LCA (or more specifically, the LCA of the 3 sea urchin species under investigation) that contained 21 ALGs, there may have been a transitional state with 1N = 22 chromosomes, when EALGs E and B2⊗C2 were already fused but J1 and B3 remained separated. Intriguingly, it has been reported that the haploid genomes of Cidaris cidaris and Arbacia punctulata, which respectively belong to an early branching sea urchin group and an euechinoid outgroup of Lytechinus and S. purpuratus, each contain 22 chromosomes [30,31], suggesting that only 1 fusion event occurred in early branching sea urchins. Thus, we hypothesize that EALGs E and B2⊗C2 fused before the divergence of the sister subclasses of sea urchins, cidaroids, and euechinoids, at least 268 mya [32]. The second fusion event, involving EALGs J1 and B3, possibly occurred later, after the emergence of Arbacia and before the divergence of Lytechinus and S. purpuratus (i.e., between ~185 and 50 mya) [33]. If that is the case, the LCA of all living sea urchins would have possessed 1N = 22 chromosomes, instead of the presumed 21 ancestral chromosomes illustrated in S10 Fig. Future synteny analyses and chromosomal architecture reconstructions using genomes of early branching sea urchins will help to resolve this question.
Lineage-specific chromosomal fusion events in major animal groups
To understand whether the deuterostome chromosomal architectures differ from those of protostomes, we extended our analysis to include several recently published chromosome-level genome assemblies of protostomes. Consistent with previous observations [17,25], we found that the chromosomes of most protostome species are highly rearranged. Nevertheless, we were able to identify genomes of 5 spiralian species [25,34–37], including 3 bivalves (2 clam species, Ruditapes philippinarum and Sinonovacula constricta, and the aforementioned scallop P. yessoensis) and 2 polychaete annelids (Paraescarpia echinospica and Streblospio benedicti), which are more conserved and comparable to the presumed bilaterian ALGs and extant deuterostome genomes. Our syntenic analysis shows that all the 5 spiralian species share 4 specific fusion-with-mixing events (S12 and S13 Figs), as predicted previously based on 4 syntenic synapopmorphies of spiralians identified using different datasets [19,38]. Comparisons of 6 chromosome-scale ecdysozoan genomes, however, showed that they are highly reorganized relative to the bilaterian ancestor [19], making it difficult to reconstruct the chromosomal architecture of their LCA. The 4 spiralian fusions, however, are clearly absent in ecdysozoan, consistent with their status as spiralian syntenic synapomorphies [19]. For example, these 4 fusion events are clearly absent in 2 butterfly genomes (S14 Fig). Based on these pairwise syntenic comparisons, we inferred that the LCA of protostomes most likely also possessed 24 ALGs that correspond to the 24 BALGs (S12 Fig). This correspondence suggests that the genomic architecture of the deuterostome LCA and protostome LCA did not undergo large-scale inter-chromosomal fusions when they initially diverged from the bilaterian LCA. However, during subsequent evolution, protostome lineages appear to have accumulated much more extensive changes in their chromosomal architectures than deuterostome lineages.
After chromosomal fusion with extensive mixing, it is unlikely that genes in a fused chromosome would be sorted to reassemble back into individual chromosomes with the original makeup [18,19], and, such irreversible chromosomal fusion-with-mixing events can be used as polarized traits for probing deep phylogenetic relationships of animals [18,19]. Recent molecular phylogenomic studies have provided evidence to support the sister group relationship between Ambulacraria and Xenacoelomorpha, and some even questioned the monophyletic grouping of Deuterostomia [9–12] (Fig 3A). We asked whether the identified chromosomal fusion-with-mixing traits could help to resolve this issue. We coded chromosomal status into category data, which was then converted into a binary matrix (Fig 3B and 3C). Bayesian phylogenetic and clustering analyses based on these synteny-based characters united the deuterostomes as a clade to the exclusion of other animals (Fig 3D and 3E). Notably, all the 5 deuterostome species we analyzed retain 9 one-to-one matching chromosomes corresponding to the ancestral deuterostome state, however, no common chromosomal fusion (i.e., syntenic synapomorphy [18]) was identified.
Regarding derived chromosomal changes within deuterostomes, we identified an ambulacrarian-specific chromosomal fusion (B2⊗C2) and a chordate-specific chromosomal dispersion (originated from ALG R). Four spiralian-specific chromosomal fusion events have been described (L⊗J2, O2⊗K, Q⊗H, and O1⊗R) (S16 Fig). We also noted that the bilaterian chromosomal rearrangement events were not observed in the jellyfish (Rhopilema esculentum, RES) genome [19] (S15 and S16 Figs). Therefore, the 5 major extant animal groups (i.e., ambulacrarians, chordates, spiralians, ecdysozoans, and cnidarians) do not share common derived traits in terms of inter-chromosomal rearrangement events, and the observed chromosomal fusion events appear to be lineage-specific and have occurred before the diversification of each of these major animal groups.
Xenacoelomorpha, a group comprising xenoturbellids and acoelomorphs, have been placed as either early branching bilaterians (Nephrozoa hypothesis) or as a sister group of ambulacrarians (Xenambulacraria hypothesis) [11,12,39,40]. To test these hypotheses, we examined the recently available chromosome-level genome assembly of the xenoturbellid Xenoturbella bocki [41]. We found no evidence of the ambulacrarian-specific chromosomal fusion (B2⊗C2) in the X. bocki genome. This fusion event therefore appears to be specific to ambulacrarians and does not provide evidence supporting the Xenambulacraria hypothesis. However, the Xenambulacrarian hypothesis could not be ruled out by the current data, as the fusion could have occurred in the ambulacrarian lineage after Ambulacraria diverged from Xenacoelomorpha. Overall, our results reinforce the idea that the branch length between bilaterian LCA and deuterostome LCA is likely very short [9], and our analyses also show that deuterostome lineages experienced fewer chromosomal fusion events than protostomes during early bilaterian evolution.
GO enrichment analyses of lineage-specific chromosomal rearrangement events
Chromosomal fusion-with-mixing has the potential to disrupt long-range promoter-enhancer interactions and/or topological association domains (TADs) to cause changes in gene regulation. The genes present on chromosomes that underwent lineage-specific fusions could therefore provide hints as to the origins of lineage-specific novelties. To assess the potential biological consequences of specific chromosomal changes in deuterostome species, we performed gene ontology (GO) enrichment analyses on genes located on the corresponding chromosomes of extant deuterostomes. The ambulacrarian-specific chromosomal fusion-with-mixing resulted in the inferred AALG B2⊗C2, which has remained as a single chromosome POC9 in the sea star (Figs 2B and 4). We found that genes located in POC9 are enriched in several GO terms related to development, including germ layer formation, neural development, axial patterning, gastrulation and regulation of BMP and Wnt signaling pathways (Figs 4A and S17E). This observation suggests that in the lineage leading to ambulacrarians, many developmental regulatory genes would have experienced extensive shuffling in their relative positions via chromosomal fusion-with-mixing (B2⊗C2), which could have altered their expression patterns. The fused AALG B2⊗C2 further underwent distinct chromosomal fusion and splitting events in sea urchins and hemichordates, respectively (Figs 2B and 4).
In all the 3 sea urchin genomes we analyzed, a single chromosome (e.g., SPU1) was derived from the fusion of EALGs E and B2⊗C2 (S10 Fig). GO analysis revealed that genes related to development are also enriched in SPU1 (Figs 4B and S18E). Intriguingly, genes involved in bone and otolith development are also enriched in this sea urchin-specific fusion chromosome. Further analysis on genomes of other sea urchin species and functional experiments will be required to determine whether the rearrangement of these genes is related to the emergence of the unique skeletogenic lineage of sea urchins.
In both hemichordate species, we inferred that 2 chromosomes (PFL18 and PFL23 of P. flava and SCA11 and SCA23 of S. californicum) were split from the fused AALG B2⊗C2, resulting in HALGs B2⊗C2-a and B2⊗C2-b in the LCA of hemichordates (Fig 2B). GO enrichment analysis revealed that genes located on PFL18 (descendant of either HALG B2⊗C2-a or B2⊗C2-b) are enriched in biological processes associated with immune response and chemotaxis, suggesting that distinct interactions with environmental factors could have emerged during hemichordate evolution via chromosomal rearrangement (Figs 4C and S19C). Additional lineage-specific fusion events observed in deuterostomes include the echinoderm O2⊗B3 and J1↘B3 (resulting in the sea star POC6 and the sea urchin SPU8, respectively) and the hemichordate-specific fusion R⊗B1 (corresponding to PFL9 and SCA5) (Fig 2B). The top GO terms enriched in POC6, SPU8, and PFL9 include neuronal regulation, thyroid hormone transport, and germ cell migration, respectively (Figs 4D and S17–S19).
All chordates appear to share a dispersal of deuterostome/bilaterian ALG R [19], but this ALG is retained as individual chromosomes in ambulacrarians (e.g., POC12 and SPU3) (Fig 2B and 2C). Intriguingly, we found that POC12 and SPU3 are enriched for genes involved in DNA integration, including several transposase genes (Figs 4E, S17A and S18A). This result suggests that the dispersion of DALG R in the chordate lineage could have been due to the misregulation of transposase genes or rearrangements induced by such sequences. Taken together, our GO enrichment analyses provide a global view of possible regulatory and functional changes related to the lineage-specific chromosomal rearrangements. Such rearrangement events are in agreement with levels of divergence in gene expression profiles [42], supporting the hypothesis that at least some of these potential changes are plausibly associated with the evolution of distinct lineage-specific features and diverse body plans in deuterostomes.
Hox clusters in rearranged chromosomes
Hox genes are typically arranged in clusters and specify bilaterian body regions along the anteroposterior axis [43]. Contrary to their structural and functional conservation, we find that Hox clusters are located in chromosomes that underwent fusion with extensive mixing among the 10 bilaterian species we examined, with the sole exception of amphioxus BFL16 (S16 Fig). In the LCA of bilaterians, the Hox cluster was inferred to be positioned in BALG B2. The descendant of this ALG (DALG B2) contributed to the ambulacrarian-specific fusion with DALG C2 to form AALG B2⊗C2. Subsequently, its descendant in echinoderms further underwent an additional fusion-with-mixing with ALG E to give rise to a chromosome resembling SPU1 in sea urchins. Meanwhile, in hemichordates, AALG B2⊗C2 split into HALGs B2⊗C2-a and B2⊗C2-b (represented by the extant PFL18 and PFL23, S16 Fig). Intriguingly, this splitting event in the hemichordate ancestor separated the Hox cluster and the distalless gene, which are commonly linked in vertebrate genomes [44]. This genetic feature appears to be unique to hemichordates, as the Hox cluster and distalless gene are located in the same chromosome in all other deuterostome species we examined (i.e., amphioxus BFL16, sea star POC9, and sea urchin SPU1). Nevertheless, it remains unclear whether the separation of the Hox cluster and distalless gene during the hemichordate-specific chromosomal split would have resulted in functional consequences related to the origin of the hemichordate body plan. BALG B2 is also involved in different fusion-with-mixing events in the 5 spiralian species, with the spiralian Hox clusters located on the highly rearranged RPH14, SCO9, PYE1, PEC4, and SBE9 (S16 Fig). It is tempting to speculate that these chromosome rearrangement events may have changed the regulatory landscape of Hox genes and contributed to the evolution of lineage-specific body plans. Further studies would certainly be required to test this hypothesis.
While intrachromosomal rearrangement events are highly associated with the accumulation of transposable elements (TEs) [45,46], Hox clusters are known to be largely devoid of TEs in chordates [47,48]. The exclusion of TEs from Hox clusters is thought to be chordate-specific, as this trend was not detected in 5 protostome species that have been analyzed (including 4 insects and the nematode Caenorhabditis elegans) [48]. The observation that most Hox clusters are situated in chromosomes that underwent fusion-with-mixing prompted us to analyze TE densities in the Hox-bearing chromosomes. We observed a clear drop-off of TE densities (including DNA transposons (DNA), long terminal repeats (LTR), long interspersed nuclear elements (LINE), and short interspersed nuclear elements (SINE)) within Hox clusters compared with the non-Hox regions of the same chromosomes; this trend was observed in all 9 bilaterian species we examined (Figs 5A and S20–S22). The overall TE densities in Hox-bearing chromosomes were similar to the densities observed across entire genomes (S23 Fig). The exclusion of TEs in Hox clusters is particularly apparent in amphioxus BFL and hemichordate PFL (approximately 77% less than the density of non-Hox regions) in which the Hox clusters are relatively intact (Fig 4). Therefore, the trend of lower TE densities in Hox clusters is broadly observed across bilaterians and is not limited to chordates. The mechanism that suppresses TE invasion (either by selection against insertions or inhibition of such mutations) remains in effect even when Hox clusters are situated in otherwise highly rearranged chromosomes.
We also noticed that many genes neighboring Hox clusters, except for the evx genes, are highly rearranged and their orthologous genes are commonly found in different chromosomes (S24 and S25 Figs). This result is consistent with the observation that TEs exist at higher densities outside of Hox clusters, where they can promote intrachromosomal rearrangements. Further characterizations of TE distributions within Hox clusters revealed a higher density of TEs around the posterior Hox genes (between Hox9 and Hox15) within the amphioxus BFL Hox cluster. This higher density is consistent with a previous observation of repeat islands between the amphioxus posterior Hox genes that may contribute to the highly derived posterior region of the amphioxus Hox cluster [47,49,50]. Despite the generally low TE density across the Hox cluster of hemichordate PFL, we noticed that the inversion of Hox13b and Hox13c coincides with the presence of more TEs near the posterior end of the Hox cluster (Fig 5B, PFL). Similarly, the numbers, positions, and orientations of Hox genes between Hox5 and Hox11/13 in the 3 sea urchin species (SPU, LVA, and LPI) have undergone notable changes, which is in line with the higher densities of TEs detected in these regions (Fig 5B).
Taken together, these results indicate that exclusion of TEs from Hox clusters appears to be a conserved feature in bilaterians. Nevertheless, TE invasions sometimes occur in the posterior regions of deuterostome Hox clusters, and these invasions have likely contributed to local rearrangements of Hox genes. Our observations are reminiscent of the proposed “deuterostome posterior flexibility” model, which explains how the posterior Hox genes evolved faster in deuterostomes than in protostomes [50,51]. In conclusion, the distributions of TEs both outside and within certain regions of Hox clusters coincide with intrachromosomal gene rearrangements, which may modify TAD structures of Hox clusters and alter the transcriptional regulation of Hox genes.
Evolutionary history of the pharyngeal gene cluster
The pharyngeal gene cluster contains 4 transcription factor genes (in the order of nkx2.1, nkx2.2, pax1/9, and foxa) and 2 non-transcription factor genes (slc25a21a and mipol1), and their expression in the pharyngeal slits and surrounding endoderm is considered to be a deuterostome-specific feature [15]. Three additional genes, msx, cnga, and egln3, which respectively encode a homeobox transcription factor, a subunit of cyclic nucleotide-gated channels and Egl-9 family hypoxia inducible factor 3, are also linked to the cluster in some deuterostome species [9,15,52]. The complete pharyngeal gene cluster has so far only been found in deuterostomes, but some of the genes are also linked in protostomes [9]. It has thus been proposed that rather than being a deuterostome-specific trait, an intact cluster may have already been present in the LCA of bilaterians and was later dispersed in protostome lineages [9].
To gain insight into the evolutionary history of the pharyngeal cluster, we analyzed gene complements of the cluster in several bilaterian and non-bilaterian genomes (Fig 6A and 6B). In all the deuterostome genomes we analyzed, we found that xrn2, which encodes a 5′ to 3′ exoribonuclease, is associated with the aforementioned pharyngeal genes and usually located upstream and adjacent to nkx2.1. Based on the gene repertoire and linkage relationships in the deuterostome genomes, we deduced that the complete complement of the pharyngeal cluster in the LCA of deuterostomes included 10 genes. The complement began with xrn2, followed by 3 transcription factor genes (nkx2.1, nkx2.2, and msx), then cnga, pax1/9, slc25a21, mipol1, and foxa, and finally egln3. Several lineage-specific changes then took place within the pharyngeal clusters of deuterostomes (Figs 6B and S26). In the hemichordate PFL, cnga was duplicated, and ghrA genes invaded the pharyngeal cluster between the cnga and pax1/9 genes. In the sea urchin SPU, the pharyngeal cluster is broken into 3 parts, although the 3 parts are still located on the same chromosome (SPU5), and the second part (including msx, cnga, pax1/9, and slc25a21) is inverted.
In all 6 spiralian genomes we analyzed, orthologs of xrn2 were found to be adjacent to nkx2.1, and mipol1 and foxa genes were also linked (Figs 6B and S26). In a previous study [9], paired gene linkages of nkx2.1 and nkx2.2, pax1/9 and slc25a21, and mipol1 and foxa were also identified in various protostomes. These results support the existence of 3 microsyntenic blocks, including (1) xrn2 and nkx2 genes; (2) pax1/9 and slc25a21; and (3) mipol1 and foxa, as conserved features of bilaterians. Intriguingly, most of the orthologous genes of the pharyngeal cluster are located on the same chromosome, regardless of whether the microsyntenic relationships are maintained.
Based on these observations, we considered 2 scenarios for the evolution of the pharyngeal gene cluster: (1) the LCA of bilaterians (similar to the LCA of deuterostomes) possessed a complete pharyngeal gene cluster that later broke up into 3 microsyntenic blocks in protostomes; (2) the LCA of bilaterians (similar to the LCA of protostomes) had the pharyngeal genes arranged in 3 microsyntenic blocks in the same chromosome that became closely linked to form a compact cluster in deuterostomes. To find evidence supporting or excluding these scenarios, we analyzed the genomic positions of the orthologous genes in outgroups to the bilaterians, including several cnidarians and sponges (S26 Fig). In the coral AMI, we observed a syntenic block containing xrn2, nkx2, msx-related, and cgna genes. Other cnidarian species either had preserved parts of this syntenic block (e.g., xrn2 and nkx2 are adjacent in the coral XSP; msx-related and cgna are linked in the sea anemone SCAL) or they lacked the syntenic relationships (S26 Fig). Additionally, slc25a21 was absent in all 6 cnidarian genomes we analyzed. This gene was likely lost in cnidarians, because an ortholog of slc25a21 was identified in the sponge genomes. Moreover, except for the pax genes, orthologs of the other pharyngeal genes are located on the same chromosome of most cnidarian genomes we analyzed. In the 2 sponge genomes, orthologs of the pharyngeal genes are mostly located on different chromosomes or scaffolds, and no microsyntenic blocks were identified. We can therefore infer using the parsimony principle that one microsyntenic block (composed of xrn2, nkx2, msx-related, and cgna genes) was already present in the LCA of bilaterians and cnidarians, and the other pharyngeal genes were located on the same chromosomes but had not yet formed microsyntenic blocks. The 2 additional microsyntenic pairs (pax1/9-slc25a21 and mipol1-foxa) were established in the bilaterian LCA and persist in extant protostomes and deuterostomes. In the lineage leading to the examined spiralian species, the more ancient syntenic block was likely partially disrupted, with only xrn2 and nkx2 genes remaining tightly associated. During the evolution of deuterostomes, the 3 microsyntenic blocks became linked and the egln gene was added at the end, forming the complete pharyngeal gene cluster.
Our data therefore support a scenario in which the compact pharyngeal gene cluster of deuterostomes was gradually established from preexisting bilaterian microsyntenic blocks on the deuterostome stem. We cannot, however, rule out the scenario in which individual genes or small blocks distributed along an ancestral chromosome assembled into an ordered cluster in the bilaterian ancestor before breaking into 3 microsyntenic blocks in protostomes. Assembly of the 3 microsyntenic blocks into the deuterostome pharyngeal gene cluster plausibly contributes to the co-regulation of the genes. Indeed, similar temporal expression profiles of the pharyngeal cluster genes are observed among deuterostomes, while orthologs of these genes in protostome and non-bilaterian species display more divergent expression profiles [42]. These results support the idea that clustering of the pharyngeal genes in deuterostomes likely contributes to their co-regulation.
Conclusions
In this study, we generated chromosome-level genome assemblies for 2 hemichordate species. The hemichordate chromosomes (1N = 23) exhibit remarkable chromosome-scale macrosynteny when compared to other deuterostomes, including several echinoderm and chordate species. This high level of conservation allows us to infer that the LCA of deuterostomes possessed 24 ALGs, the same complement as inferred for the bilaterian ancestor [19]. We further deduced lineage-specific chromosomal rearrangement events that resulted in reduced numbers of chromosomes during deuterostome evolution. Genes distributed in chromosomes that underwent lineage-specific fusions are enriched for functions in developmental processes, immune responses and chemotaxis. Changes to the regulatory control of these genes may be related to the evolution of distinct lineage-specific features in deuterostome lineages. One example of this concept is the deeply conserved Hox cluster, which is commonly situated in a chromosome that is highly rearranged. Nevertheless, Hox genes in deuterostomes generally remain tightly linked with the posterior Hox genes showing higher flexibility, consistent with the distribution pattern of TEs within the Hox clusters. Another conserved gene cluster, the deuterostome pharyngeal gene cluster, appears to have been established gradually by combining three pre-assembled microsyntenic blocks present in the LCA of bilaterians. Complete clustering likely contributes to the co-regulation of the pharyngeal genes. In summary, these results showcase how the global view provided by comparative genomics can contribute to our understanding of genome evolution. Moreover, the lineage-specific genomic changes identified herein may help to delineate molecular mechanisms driving the evolution of the diverse body plans of deuterostomes.
Methods
Sample preparation and sequencing
High molecular weight (HMW) genomic DNA of Ptychodera flava (PFL) was extracted using DNAzol (Thermo Fisher Scientific) from the sperm of a single male individual collected from Penghu Islands, Taiwan. The size of the purified HMW genomic DNA was examined using a pulsed-field gel electrophoresis system (BIO-RAD). The genomic DNA was then sequenced by the Dresden Genome Center using the PacBio platform with 60× coverage. For Schizocardium californicum (SCA), HMW DNA was extracted from a ripe male Schizocardium. To keep secretion of mucus to a minimum, animals were washed several times and kept in ice cold seawater during the sperm extraction process. Male spermaducts were opened with forceps and sperm was pipetted with a glass pasteur pipette and transferred to an Eppendorf tube. Tubes were spun down and excess seawater was removed before being placed on ice. The DNA extraction protocol was adapted from Stefanik and colleagues [53] with a combination of pouring between Eppendorf tubes instead of pipetting and avoiding any vortexing. The genomic DNA was then sequenced using the PacBio platform with 63× coverage.
Chromosome-level genome assembly
For PFL, the initial genome assembly was generated using MARVEL assembler [54] with PacBio reads. Purge Haplotigs (version 1.1.0) [55] was used to phase the diploid genome assembly onto the haploid assembly. The phased haploid genome assembly was then scaffolded using HiRISE with a HiC library (Dovetail Genomics). The sequences of the genome assemblies were further curated using Pilon (version 1.23.2) [56] with the Illumina short reads. For SCA, the raw read error correction, read trimming and assembly were performed with the Canu assembler (version 1.5) [57]. Canu was configured to run with a genomeSize parameter set at approximately 1.8 GBp or roughly twice the expected genome size due to high heterozygosity. After assembly, 2 rounds of polishing were performed with the Arrow consensus calling algorithm [58]. The completeness of the polished genome assemblies was evaluated by using BUSCO (version 5.1.2) [59] with the dataset metazoa_odb10, which contains 954 BUSCO gene groups.
Gene prediction and functional annotation
For Ptychodera flava, gene models were predicted using a combination of ab initio gene prediction, homology support, and transcriptome sequencing. First, ab initio gene prediction was conducted using the MAKER2 pipeline [60] (Dovetail Genomics). Second, the protein sequences from other species, including mouse, chick, zebrafish, spotted gar, sea lamprey, amphioxus, ascidian, sea urchin, and sea anemone, were aligned to the PFL genome assembly using GeMoMa (version 1.7) [61]. Third, the Illumina RNA-seq short reads from PFL at 16 stages [42] were mapped using STAR aligner (version 2.7.6a) [62]. The subsequent genome-guided transcript reconstruction was conducted with StringTie (version 2.1.4) [63] and CLASS2 (version 2.1.7) [64]. The transcripts were also assembled de novo using Trinity (version 2.11.0) [65] and then mapped to the genome assembly by minimap2 aligner (version 2.17-r941) [66]. Fourth, the full-length transcripts were generated with PacBio technology (Iso-seq), and IsoSeq3 (version 3.3.0, https://github.com/PacificBiosciences/IsoSeq) was used to cluster the IsoSeq transcripts. LoRDEC (version 0.9) [67] was used to curate the Isoseq transcripts with the Illumina RNA-seq short reads. The polished IsoSeq transcripts were then mapped to genome assembly using minimap2. Gene models based on Iso-seq data were then reconstructed with cDNA_Cupcake (version 9.1.1, https://github.com/Magdoll/cDNA_Cupcake). Finally, the reconstructed transcripts from the different shreds of evidence were merged and filtered by EvidenceModeler (version 1.1.1) [68]. The combined gene models were further updated by PASA (version 2.4.1) [69]. The amino acid sequences were predicted from the transcripts using TransDecoder (version 5.5.0, https://github.com/TransDecoder/TransDecoder). Each amino acid sequence was aligned against NCBI metazoa subset of the nr database using Blast2GO/OmicsBox (version 1.3.11) [70] with blastp-fast for gene description. The GO (gene ontology) term for each gene was annotated using Blast2GO/OmicsBox [70–72].
For S. californicum, gene prediction was performed as in Marlétaz and colleagues [26]. Briefly, hints for de novo prediction using Augustus [73] were derived from transcriptome and protein alignments. Particularly, proteins from S. kowalevskii were aligned using Exonerate (version 2.2.0) [74]. A custom repeat library was constructed and annotated using Repeatmodeler and subsequently used to mask repeated regions in the S. californicum genome using Repeatmasker (v.4.0.7, http://www.repeatmasker.org). We filtered out gene models that extensively overlapped with mobile elements. Isoforms and UTR regions were added using PASA [69] leveraging the alignment of the assembled transcriptome.
The genomic datasets for other species
The genome assemblies and gene annotation files across metazoans were collected from public domains, including human Homo sapiens (HSA), amphioxus Branchiostoma floridae (BFL); sea urchins Strongylocentrotus purpuratus (SPU), Lytechinus pictus (LPI), and Lytechinus variegatus (LVA); sea stars Patiria miniata (PMI), Acanthaster planci (APL), and Pisaster ochraceus (POC); scallop Patinopecten yessoensis (PYE); clams Ruditapes philippinarum (RPH) and Sinonovacula constricta (SCO); oyster Crassostrea gigas (CGI); annelids Streblospio benedicti (SBE) and Paraescarpia echinospica (PEC); argus Erebia aethiops (EAE) and Aricia agestis (AAG); prawn Penaeus chinensis (PCH); horseshoe crabs Tachypleus tridentatus (TTR) and Carcinoscorpius rotundicauda (CRO); nematode Heterodera glycines (HGL); corals Acropora millepora (AMI) and Xenia sp. (XSP); jellyfish Rhopilema esculentum (RES), Sanderia malayensis (SMA), and Clytia hemisphaerica (CHE); sea anemones Nematostella vectensis (NVE) and Scolanthus callimorphus (SCAL); and sponges Ephydatia muelleri (EMU) and Amphimedon queenslandica (AQU). S1 Table lists the sources and other information on the genome data used in this study. The Braker2 pipeline (version 2.1.6) [75–81], including GeneMark (version 3.62) [82] and AUGUSTUS (version 3.4.0) [73], was used for gene prediction for genomes lacking gene model annotations.
Genome comparison
Pairwise syntenic comparisons between species were conducted using MCscan (Python version) of JCVI (version 1.0.9) [83,84]. The jcvi.compara.catalog module with the LAST aligner of MCscan was used to identify orthologous gene pairs between 2 species. The parameter C-score was set to 0.99 for filtering the LAST hit to contain the reciprocal best hit. The minimum number of gene pairs in a cluster was set to 1 without a restricted window size. The synteny dot plots were visualized using jcvi.graphics.dotplot module. Chromosomes used in the syntenic comparison were labeled with an abbreviation of the species names and ordered according to size (BFL, PFL, SCA, SPU, POC, PYE, RPH, SBE, and PEC) or the existing names (LPI, LVA, EAE, AAG, PCH, TTR, CRO, HGL, SCO, and RES).
To assign corresponding chromosome pairs between species, Fisher’s exact test with Bonferroni correction in the R software environment (version 3.6.3) was used to calculate the quantitative significance of orthologs located on the chromosome pairs. Risk difference was used to judge significantly higher or lower than others. For example, in S3A Fig, the number of ortholog pairs in PFL1 and SPU15 is 202 (a); in PFL1 and non-SPU15 it is 99 (b); in non-PFL1 and SPU15 it is 45 (c); and in non-PFL1 and non-SPU15 it is 8,528 (d). These 4 numbers were subjected to the Fisher’s exact test. The significance levels of all chromosomal pairs were examined; the Bonferroni correction was used for multiple comparisons. Subsequently, the risk difference was calculated as a/(a+b)–c/(c+d). The criterion for corresponding chromosome pairs between 2 species was an adjusted p-value smaller than 1E-10 and a risk difference value greater than 0. Adjusted p-values between 1E-2 and 1E-10 with positive risk differences were considered to be small-scale chromosomal rearrangement events and are not presented in figures describing the evolutionary history of chromosomal architectures.
Macrosyntenic conservation analysis on the 4 deuterostome species (BFL, PFL, SCA, and SPU) shown in Fig 1B was visualized using the jcvi.graphics.karyotype module of MCscan. The syntenic block was set to a minimum of 4 gene pairs with a maximum distance of 75 genes between 2 matches.
Clustering and Bayesian phylogenetic analyses
Distinct chromosomal rearrangement events of the 10 bilaterian species were manually recorded into the category data based on changes deviated from the 1N = 24 bilaterian ancestral chromosomes (ALGs). The category data was subsequently converted into a binary data matrix (S1 Data) and visualized by using the heatmap.2 function of the gplots R package (version 3.1.3). Notably, most species have only 1 category per ALG. However, in some species, an earlier fusion event was also recorded due to the stepwise process during chromosomal evolution. Taking PEC chromosome 2 as an example, the fusion of Protostome ALGs L and J2 occurred, resulting in Spiralian ALG L⊗J2. Subsequently, Spiralian ALGs L⊗J2 and C2 were further fused leading to PEC chromosome 2. As a result, both categories, L⊗J2 and C2⊗(L⊗J2), for PEC were recorded as “1.” The redundant categories were then removed to avoid double counting before clustering analysis. The distance matrix among the 10 bilaterian species was then calculated based on the binary data matrix using the dist function with the binary method in R. The clustering result was visualized with the pheatmap R package (version 1.0.12). Bayesian phylogenetic analysis was conducted using BEAST (version 1.10.4) [85]. First, the manually converted NEXUS file of binary code matrix (S1 Data) was transformed into an XML file using BEAUti with default parameters. After 10,000 randomly sampled trees were generated using BEAST, the consensus tree was generated using TreeAnnotator with 25% burnin and visualized using FigTree (https://github.com/rambaut/figtree, version 1.4.4).
GO enrichment analysis
The gene list for each selected chromosome was subjected to GO enrichment analyses using Blast2GO/OmicsBox (version 1.3.11) with an adjusted p-value (FDR) of 0.05. The REVIGO algorithm (http://revigo.irb.hr/) [86] was then used to remove redundant GO terms based on the semantics. Finally, the enriched GO terms were clustered and visualized by Gephi (version 0.9.5, https://gephi.org/).
Hox gene cluster
The genome assemblies and gene model files of bilaterians for Hox gene analysis were downloaded from the public domain (S1 Table). Some misannotated Hox genes were manually curated. Repetitive elements for each species were identified de novo using RepeatModeler (version 2.0.1) [87]. RepeatMasker (version 4.1.2-p1, http://www.repeatmasker.org) was then used for searching and quantifying the identified repeats on each genome assembly, including 4 transposable elements: DNA transposons (DNA), LTR, LINE, and SINE. The numbers of the different transposable elements were calculated with a bin size of 10 kilobases or 50 kilobases using BEDTools (version 2.30.0) [88] and deepTools (version 3.5.1) [89]. The genome sequences and transposable element tracks were subjected to visualization using a local genome browser, JBrowse (version 1.16.10) [90]. The silhouettes were downloaded from PhyloPic (https://www.phylopic.org/).
Pharyngeal gene cluster
The genome assemblies across metazoans were collected from the public domain (S1 Table). For the genome lacking annotations, the Braker2 (version 2.1.6) pipeline, including GeneMark (version 3.62) and AUGUSTUS (version 3.4.0), was used to predict gene models. Protein sequences of known pharyngeal-related genes were used as query sequences to blast the genome assemblies, and the hits were further confirmed by searching the NCBI nr database.
Supporting information
Acknowledgments
The authors wish to thank the staff at the core facility of the Institute of Cellular and Organismic Biology, and NGS Genomics core facility of the Biodiversity Research Center, Academia Sinica for technical assistance. We appreciate the valuable discussions with Dr. Mei-Yeh Lu. We also thank Marcus Calkins for English editing. We thank Dr. Sanjit Singh Batra for assistance with the S. californicum genome assembly.
Abbreviations
- AAG
Aricia agestis
- ALG
ancestral linkage group
- AMI
Acropora millepora
- APL
Acanthaster planci
- AQU
Amphimedon queenslandica
- BALG
bilaterian ancestral linkage group
- BFL
Branchiostoma floridae
- CALG
chordate ALG
- CHE
Clytia hemisphaerica
- CGI
Crassostrea gigas
- CRO
Carcinoscorpius rotundicauda
- EAE
Erebia aethiops
- EMU
Ephydatia muelleri
- GO
gene ontology
- HSA
Homo sapiens
- HGL
Heterodera glycines
- HMW
high molecular weight
- LCA
last common ancestor
- LINE
long interspersed nuclear elements
- LPI
Lytechinus pictus
- LTR
long terminal repeats
- LVA
Lytechinus variegatus
- mya
million years ago
- NVE
Nematostella vectensis
- PCH
Penaeus chinensis
- PEC
Paraescarpia echinospica
- PFL
Ptychodera flava
- PMI
Patiria miniata
- POC
Pisaster ochraceus
- PYE
Patinopecten yessoensis
- RES
Rhopilema esculentum
- RPH
Ruditapes philippinarum
- SBE
Streblospio benedicti
- SCA
Schizocardium californicum
- SCAL
Scolanthus callimorphus
- SCO
Sinonovacula constricta
- SINE
short interspersed nuclear elements
- SMA
Sanderia malayensis
- SPU
Strongylocentrotus purpuratus
- TAD
topological association domain
- TE
transposable element
- TTR
Tachypleus tridentatus
- XSP
Xenia sp
Data Availability
P. flava genome assembly used in this work is publicly available: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA747109. The version described in this paper is version JASXRY010000000 (https://submit.ncbi.nlm.nih.gov/api/2.0/files/z1apzwkx/po1410_ptychodera_flava.repeatmasked.fasta/?format=attachment). Genome assembly and gene annotation files can be downloaded from https://figshare.com/projects/Hemichordate_Genomes/168110.
Funding Statement
This work was supported by grants 112-2326-B-001-004 (Y.H.S.) and 110-2621-B-001-001-MY3 (J.K.Y.) from the National Science and Technology Council, Taiwan (https://www.nstc.gov.tw/?l=en), grant AS-GC-111-L01 from Academia Sinica, Taiwan (https://www.sinica.edu.tw/en/) (Y.H.S. and J.K.Y.), and grant PID2019-103921GB-I00 from Ministerio de Economía y Competitividad, Spain (https://portal.mineco.gob.es/en-us/Pages/index.aspx) (J.J.T.). P.M.M.G. was funded by a postdoctoral fellowship from Junta de Andalucía (https://www.juntadeandalucia.es/) (DOC_00397). F.M. is supported by the Royal Society Fellowship (https://royalsociety.org/) URF\R1\191161 and the BBSRC grant BB/V01109X/1 (https://www.ukri.org/councils/bbsrc/). D.S.R. was supported by the Molecular Genetics Unit at the Okinawa Institute for Science and Technology (https://www.oist.jp/), and is grateful for support from the Marthella Foskett Brown Chair in Biological Sciences at UC Berkeley (https://www.berkeley.edu/). D.S.R. and C.J.L. were supported by the Chan Zuckerberg BioHub (https://www.czbiohub.org/). The sponsors or funders play no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Lowe CJ, Clarke DN, Medeiros DM, Rokhsar DS, Gerhart J. The deuterostome context of chordate origins. Nature. 2015;520(7548):456–65. doi: 10.1038/nature14434 . [DOI] [PubMed] [Google Scholar]
- 2.Nanglu K, Cole SR, Wright DF, Souto C. Worms and gills, plates and spines: the evolutionary origins and incredible disparity of deuterostomes revealed by fossils, genes, and development. Biol Rev Camb Philos Soc. 2023;98(1):316–51. Epub 20221018. doi: 10.1111/brv.12908 . [DOI] [PubMed] [Google Scholar]
- 3.Cameron CB, Garey JR, Swalla BJ. Evolution of the chordate body plan: new insights from phylogenetic analyses of deuterostome phyla. Proc Natl Acad Sci U S A. 2000;97(9):4469–74. doi: 10.1073/pnas.97.9.4469 ; PubMed Central PMCID: PMC18258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Satoh N. Chordate Origins and Evolution: The Molecular Evolutionary Road to Vertebrates. Chordate Origins and Evolution: The Molecular Evolutionary Road to Vertebrates. 2016:1–206. WOS:000404599900015. [Google Scholar]
- 5.McClay DR. Evolutionary crossroads in developmental biology: sea urchins. Development. 2011;138(13):2639–48. doi: 10.1242/dev.048967 ; PubMed Central PMCID: PMC3109595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Rottinger E, Lowe CJ. Evolutionary crossroads in developmental biology: hemichordates. Development. 2012;139(14):2463–75. doi: 10.1242/dev.066712 . [DOI] [PubMed] [Google Scholar]
- 7.Cannon JT, Kocot KM, Waits DS, Weese DA, Swalla BJ, Santos SR, et al. Phylogenomic resolution of the hemichordate and echinoderm clade. Curr Biol. 2014;24(23):2827–32. Epub 20141106. doi: 10.1016/j.cub.2014.10.016 . [DOI] [PubMed] [Google Scholar]
- 8.Dunn CW, Giribet G, Edgecombe GD, Hejnol A. Animal Phylogeny and Its Evolutionary Implications. Annu Rev Ecol Evol Syst. 2014;45(1):371–95. doi: 10.1146/annurev-ecolsys-120213-091627 [DOI] [Google Scholar]
- 9.Kapli P, Natsidis P, Leite DJ, Fursman M, Jeffrie N, Rahman IA, et al. Lack of support for Deuterostomia prompts reinterpretation of the first Bilateria. Sci Adv. 2021;7(12). Epub 2021/03/21. doi: 10.1126/sciadv.abe2741 ; PubMed Central PMCID: PMC7978419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Marletaz F. Zoology: Worming into the Origin of Bilaterians. Curr Biol. 2019;29(12):R577–R9. doi: 10.1016/j.cub.2019.05.006 WOS:000471783100012. [DOI] [PubMed] [Google Scholar]
- 11.Mulhair PO, McCarthy CGP, Siu-Ting K, Creevey CJ, O’Connell MJ. Filtering artifactual signal increases support for Xenacoelomorpha and Ambulacraria sister relationship in the animal tree of life. Curr Biol. 2022;32(23):5180–+. doi: 10.1016/j.cub.2022.10.036 WOS:000901508800012. [DOI] [PubMed] [Google Scholar]
- 12.Philippe H, Poustka AJ, Chiodin M, Hoff KJ, Dessimoz C, Tomiczek B, et al. Mitigating Anticipated Effects of Systematic Errors Supports Sister-Group Relationship between Xenacoelomorpha and Ambulacraria. Curr Biol. 2019;29(11):1818–+. doi: 10.1016/j.cub.2019.04.009 WOS:000470902000041. [DOI] [PubMed] [Google Scholar]
- 13.Putnam NH, Butts T, Ferrier DE, Furlong RF, Hellsten U, Kawashima T, et al. The amphioxus genome and the evolution of the chordate karyotype. Nature. 2008;453(7198):1064–71. doi: 10.1038/nature06967 . [DOI] [PubMed] [Google Scholar]
- 14.Putnam NH, Srivastava M, Hellsten U, Dirks B, Chapman J, Salamov A, et al. Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science. 2007;317(5834):86–94. doi: 10.1126/science.1139158 . [DOI] [PubMed] [Google Scholar]
- 15.Simakov O, Kawashima T, Marletaz F, Jenkins J, Koyanagi R, Mitros T, et al. Hemichordate genomes and deuterostome origins. Nature. 2015;527(7579):459–65. Epub 20151118. doi: 10.1038/nature16150 ; PubMed Central PMCID: PMC4729200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Warner JF, Lord JW, Schreiter SA, Nesbit KT, Hamdoun A, Lyons DC. Chromosomal-Level Genome Assembly of the Painted Sea Urchin Lytechinus pictus: A Genetically Enabled Model System for Cell Biology and Embryonic Development. Genome Biol Evol. 2021;13(4). doi: 10.1093/gbe/evab061 ; PubMed Central PMCID: PMC8085125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Martin-Duran JM, Vellutini BC, Marletaz F, Cetrangolo V, Cvetesic N, Thiel D, et al. Conservative route to genome compaction in a miniature annelid. Nat Ecol Evol. 2021;5(2):231–42. Epub 20201116. doi: 10.1038/s41559-020-01327-6 ; PubMed Central PMCID: PMC7854359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Schultz DT, Haddock SHD, Bredeson JV, Green RE, Simakov O, Rokhsar DS. Ancient gene linkages support ctenophores as sister to other animals. Nature. 2023;618(7963):110–7. Epub 20230517. doi: 10.1038/s41586-023-05936-6 ; PubMed Central PMCID: PMC10232365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Simakov O, Bredeson J, Berkoff K, Marletaz F, Mitros T, Schultz DT, et al. Deeply conserved synteny and the evolution of metazoan chromosomes. Sci Adv. 2022;8(5):eabi5884. Epub 20220202. doi: 10.1126/sciadv.abi5884 ; PubMed Central PMCID: PMC8809688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Simakov O, Marletaz F, Yue JX, O’Connell B, Jenkins J, Brandt A, et al. Deeply conserved synteny resolves early events in vertebrate evolution. Nat Ecol Evol. 2020;4(6):820–30. Epub 20200420. doi: 10.1038/s41559-020-1156-z ; PubMed Central PMCID: PMC7269912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Technau U, Robb S, Genikhovich G, Montenegro J, Fropf W, Weinguny L, et al. Sea anemone genomes reveal ancestral metazoan chromosomal macrosynteny. Research Square. 2021. doi: 10.21203/rs.3.rs-796229/v1 [DOI] [Google Scholar]
- 22.Muffato M, Louis A, Nguyen NTT, Lucas J, Berthelot C, Roest Crollius H. Reconstruction of hundreds of reference ancestral genomes across the eukaryotic kingdom. Nat Ecol Evol. 2023;7(3):355–66. Epub 20230116. doi: 10.1038/s41559-022-01956-z ; PubMed Central PMCID: PMC9998269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Sacerdot C, Louis A, Bon C, Berthelot C, Roest Crollius H. Chromosome evolution at the origin of the ancestral vertebrate genome. Genome Biol. 2018;19(1):166. Epub 20181017. doi: 10.1186/s13059-018-1559-1 ; PubMed Central PMCID: PMC6193309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Tagawa K. Hemichordate models. Curr Opin Genet Dev. 2016;39:71–8. Epub 20160618. doi: 10.1016/j.gde.2016.05.023 . [DOI] [PubMed] [Google Scholar]
- 25.Wang S, Zhang J, Jiao W, Li J, Xun X, Sun Y, et al. Scallop genome provides insights into evolution of bilaterian karyotype and development. Nat Ecol Evol. 2017;1(5):120. Epub 20170403. doi: 10.1038/s41559-017-0120 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Marletaz F, Couloux A, Poulain J, Labadie K, Da Silva C, Mangenot S, et al. Analysis of the P. lividus sea urchin genome highlights contrasting trends of genomic and regulatory evolution in deuterostomes. Cell Genom. 2023;3(4):100295. Epub 20230405. doi: 10.1016/j.xgen.2023.100295 ; PubMed Central PMCID: PMC10112332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Arshinoff BI, Cary GA, Karimi K, Foley S, Agalakov S, Delgado F, et al. Echinobase: leveraging an extant model organism database to build a knowledgebase supporting research on the genomics and biology of echinoderms. Nucleic Acids Res. 2022;50(D1):D970–D9. doi: 10.1093/nar/gkab1005 ; PubMed Central PMCID: PMC8728261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Cameron RA, Kudtarkar P, Gordon SM, Worley KC, Gibbs RA. Do echinoderm genomes measure up? Mar Genomics. 2015;22:1–9. Epub 20150217. doi: 10.1016/j.margen.2015.02.004 ; PubMed Central PMCID: PMC4489978. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Yoshida K, Rodelsperger C, Roseler W, Riebesell M, Sun S, Kikuchi T, et al. Chromosome fusions repatterned recombination rate and facilitated reproductive isolation during Pristionchus nematode speciation. Nat Ecol Evol. 2023;7(3):424–39. Epub 20230130. doi: 10.1038/s41559-022-01980-z ; PubMed Central PMCID: PMC9998273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Auclair W. The Chromosomes of Sea Urchins, Especially Arbacia punctulata; A Method for Studying Unsectioned Eggs at First Cleavage. Biol Bull. 1965;128:169–76. [Google Scholar]
- 31.Colombera D, Vitturi R, Zanirato L. Chromosome-Number of Cidaris-Cidaris-(Cidaridae-Echinoidea). Acta Zool-Stockholm. 1977;58(4):185–6. doi: 10.1111/j.1463-6395.1977.tb00254.x WOS:A1977EF75600002. [DOI] [Google Scholar]
- 32.Thompson JR, Petsios E, Davidson EH, Erkenbrack EM, Gao F, Bottjer DJ. Reorganization of sea urchin gene regulatory networks at least 268 million years ago as revealed by oldest fossil cidaroid echinoid. Sci Rep-Uk. 2015;5. ARTN 15541. WOS:000363122100003. doi: 10.1038/srep15541 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Kroh A, Smith AB. The phylogeny and classification of post-Palaeozoic echinoids. J Syst Palaeontol. 2010;8(2):147–212. Pii 922467612. WOS:000278007400001. [Google Scholar]
- 34.Ran Z, Li Z, Yan X, Liao K, Kong F, Zhang L, et al. Chromosome-level genome assembly of the razor clam Sinonovacula constricta (Lamarck, 1818). Mol Ecol Resour. 2019;19(6):1647–58. doi: 10.1111/1755-0998.13086 . [DOI] [PubMed] [Google Scholar]
- 35.Sun Y, Sun J, Yang Y, Lan Y, Ip JC, Wong WC, et al. Genomic Signatures Supporting the Symbiosis and Formation of Chitinous Tube in the Deep-Sea Tubeworm Paraescarpia echinospica. Mol Biol Evol. 2021;38(10):4116–34. doi: 10.1093/molbev/msab203 ; PubMed Central PMCID: PMC8476170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Yan X, Nie H, Huo Z, Ding J, Li Z, Yan L, et al. Clam Genome Sequence Clarifies the Molecular Basis of Its Benthic Adaptation and Extraordinary Shell Color Diversity. iScience. 2019;19:1225–37. Epub 20190830. doi: 10.1016/j.isci.2019.08.049 ; PubMed Central PMCID: PMC6831834. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Zakas C, Harry ND, Scholl EH, Rockman MV. The Genome of the Poecilogonous Annelid Streblospio benedicti. Genome Biol Evol. 2022;14(2). doi: 10.1093/gbe/evac008 ; PubMed Central PMCID: PMC8872972. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Martin-Zamora FM, Liang Y, Guynes K, Carrillo-Baltodano AM, Davies BE, Donnellan RD, et al. Annelid functional genomics reveal the origins of bilaterian life cycles. Nature. 2023. Epub 20230125. doi: 10.1038/s41586-022-05636-7 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Cannon JT, Vellutini BC, Smith J, Onquist FR, Jondelius U, Hejnol A. Xenacoelomorpha is the sister group to Nephrozoa. Nature. 2016;530(7588):89–+. doi: 10.1038/nature16520 WOS:000369304500038. [DOI] [PubMed] [Google Scholar]
- 40.Rouse GW, Wilson NG, Carvajal JI, Vrijenhoek RC. New deep-sea species of Xenoturbella and the position of Xenacoelomorpha. Nature. 2016;530(7588):94–+. doi: 10.1038/nature16545 WOS:000369304500039. [DOI] [PubMed] [Google Scholar]
- 41.Schiffer PH, Natsidis P, Leite DJ, Robertson HE, Lapraz F, Marlétaz F, et al. The slowly evolving genome of the xenacoelomorph worm Xenoturbella bocki. bioRxiv. 2023. doi: 10.1101/2022.06.24.497508 [DOI] [Google Scholar]
- 42.Perez-Posada A, Lin C-Y, Lin C-Y, Chen Y-C, Gómez Skarmeta JL, Yu J-K, et al. Insights into deuterostome evolution from the biphasic transcriptional programmes of hemichordates. bioRxiv. 2022. doi: 10.1101/2022.06.10.495707 [DOI] [Google Scholar]
- 43.Duboule D. The (unusual) heuristic value of Hox gene clusters; a matter of time? Dev Biol. 2022;484:75–87. doi: 10.1016/j.ydbio.2022.02.007 WOS:000790954600008. [DOI] [PubMed] [Google Scholar]
- 44.Stock DW, Ellies DL, Zhao ZY, Ekker M, Ruddle FH, Weiss KM. The evolution of the vertebrate Dlx gene family. Proc Natl Acad Sci USA. 1996;93(20):10858–63. doi: 10.1073/pnas.93.20.10858 WOS:A1996VL33300062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Bourque G, Burns KH, Gehring M, Gorbunova V, Seluanov A, Hammell M, et al. Ten things you should know about transposable elements. Genome Biol. 2018;19. ARTN 199. WOS:000451147300001. doi: 10.1186/s13059-018-1577-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Klein SJ, O’Neill RJ. Transposable elements: genome innovation, chromosome diversity, and centromere conflict. Chromosome Res. 2018;26(1–2):5–23. Epub 20180113. doi: 10.1007/s10577-017-9569-5 ; PubMed Central PMCID: PMC5857280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Amemiya CT, Prohaska SJ, Hill-Force A, Cook A, Wasserscheid J, Ferrier DE, et al. The amphioxus Hox cluster: characterization, comparative genomics, and evolution. J Exp Zool B Mol Dev Evol. 2008;310(5):465–77. doi: 10.1002/jez.b.21213 . [DOI] [PubMed] [Google Scholar]
- 48.Fried C, Prohaska SJ, Stadler PF. Exclusion of repetitive DNA elements from gnathostome Hox clusters. J Exp Zool B Mol Dev Evol. 2004;302(2):165–73. doi: 10.1002/jez.b.20007 . [DOI] [PubMed] [Google Scholar]
- 49.Holland LZ, Albalat R, Azumi K, Benito-Gutierrez E, Blow MJ, Bronner-Fraser M, et al. The amphioxus genome illuminates vertebrate origins and cephalochordate biology. Genome Res. 2008;18(7):1100–11. Epub 20080618. doi: 10.1101/gr.073676.107 ; PubMed Central PMCID: PMC2493399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Pascual-Anaya J, Adachi N, Alvarez S, Kuratani S, D’Aniello S, Garcia-Fernandez J. Broken colinearity of the amphioxus Hox cluster. Evodevo. 2012;3(1):28. Epub 20121203. doi: 10.1186/2041-9139-3-28 ; PubMed Central PMCID: PMC3534614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Ferrier DE, Minguillon C, Holland PW, Garcia-Fernandez J. The amphioxus Hox cluster: deuterostome posterior flexibility and Hox14. Evol Dev. 2000;2(5):284–93. doi: 10.1046/j.1525-142x.2000.00070.x . [DOI] [PubMed] [Google Scholar]
- 52.Zhang XJ, Sun LN, Yuan JB, Sun YM, Gao Y, Zhang LB, et al. The sea cucumber genome provides insights into morphological evolution and visceral regeneration. PLoS Biol. 2017;15(10). ARTN e2003790 doi: 10.1371/journal.pbio.2003790 WOS:000414060400012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Stefanik DJ, Wolenski FS, Friedman LE, Gilmore TD, Finnerty JR. Isolation of DNA, RNA and protein from the starlet sea anemone Nematostella vectensis. Nat Protoc. 2013;8(5):892–9. Epub 20130411. doi: 10.1038/nprot.2012.151 . [DOI] [PubMed] [Google Scholar]
- 54.Nowoshilow S, Schloissnig S, Fei JF, Dahl A, Pang AWC, Pippel M, et al. The axolotl genome and the evolution of key tissue formation regulators. Nature. 2018;554(7690):50–5. Epub 20180124. doi: 10.1038/nature25458 . [DOI] [PubMed] [Google Scholar]
- 55.Roach MJ, Schmidt SA, Borneman AR. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics. 2018;19(1):460. Epub 20181129. doi: 10.1186/s12859-018-2485-7 ; PubMed Central PMCID: PMC6267036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE. 2014;9(11):e112963. Epub 20141119. doi: 10.1371/journal.pone.0112963 ; PubMed Central PMCID: PMC4237348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27(5):722–36. Epub 20170315. doi: 10.1101/gr.215087.116 ; PubMed Central PMCID: PMC5411767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10(6):563–9. Epub 20130505. doi: 10.1038/nmeth.2474 . [DOI] [PubMed] [Google Scholar]
- 59.Manni M, Berkeley MR, Seppey M, Zdobnov EM. BUSCO: Assessing Genomic Data Quality and Beyond. Curr Protoc. 2021;1(12):e323. doi: 10.1002/cpz1.323 . [DOI] [PubMed] [Google Scholar]
- 60.Holt C, Yandell M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics. 2011;12:491. Epub 20111222. doi: 10.1186/1471-2105-12-491 ; PubMed Central PMCID: PMC3280279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Keilwagen J, Hartung F, Paulini M, Twardziok SO, Grau J. Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi. BMC Bioinformatics. 2018;19(1):189. Epub 20180530. doi: 10.1186/s12859-018-2203-5 ; PubMed Central PMCID: PMC5975413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21. Epub 20121025. doi: 10.1093/bioinformatics/bts635 ; PubMed Central PMCID: PMC3530905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33(3):290–5. Epub 20150218. doi: 10.1038/nbt.3122 ; PubMed Central PMCID: PMC4643835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Song L, Sabunciyan S, Florea L. CLASS2: accurate and efficient splice variant annotation from RNA-seq reads. Nucleic Acids Res. 2016;44(10):e98. Epub 20160314. doi: 10.1093/nar/gkw158 ; PubMed Central PMCID: PMC4889935. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–52. Epub 20110515. doi: 10.1038/nbt.1883 ; PubMed Central PMCID: PMC3571712. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–100. doi: 10.1093/bioinformatics/bty191 ; PubMed Central PMCID: PMC6137996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Salmela L, Rivals E. LoRDEC: accurate and efficient long read error correction. Bioinformatics. 2014;30(24):3506–14. Epub 20140826. doi: 10.1093/bioinformatics/btu538 ; PubMed Central PMCID: PMC4253826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 2008;9(1):R7. Epub 20080111. doi: 10.1186/gb-2008-9-1-r7 ; PubMed Central PMCID: PMC2395244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr., Hannick LI, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31(19):5654–66. doi: 10.1093/nar/gkg770 ; PubMed Central PMCID: PMC206470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Conesa A, Gotz S. Blast2GO: A comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics. 2008;2008:619832. doi: 10.1155/2008/619832 ; PubMed Central PMCID: PMC2375974. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Cantalapiedra CP, Hernandez-Plaza A, Letunic I, Bork P, Huerta-Cepas J. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol Biol Evol. 2021;38(12):5825–9. doi: 10.1093/molbev/msab293 ; PubMed Central PMCID: PMC8662613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Huerta-Cepas J, Szklarczyk D, Heller D, Hernandez-Plaza A, Forslund SK, Cook H, et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 2019;47(D1):D309–D14. doi: 10.1093/nar/gky1085 ; PubMed Central PMCID: PMC6324079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006;34(Web Server issue):W435–9. doi: 10.1093/nar/gkl200 ; PubMed Central PMCID: PMC1538822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Slater GS, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005;6:31. Epub 20050215. doi: 10.1186/1471-2105-6-31 ; PubMed Central PMCID: PMC553969. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Bruna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform. 2021;3(1):lqaa108. Epub 20210106. doi: 10.1093/nargab/lqaa108 ; PubMed Central PMCID: PMC7787252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015;12(1):59–60. Epub 20141117. doi: 10.1038/nmeth.3176 . [DOI] [PubMed] [Google Scholar]
- 77.Gotoh O. A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence. Nucleic Acids Res. 2008;36(8):2630–8. Epub 20080315. doi: 10.1093/nar/gkn105 ; PubMed Central PMCID: PMC2377433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. Bioinformatics. 2016;32(5):767–9. Epub 20151111. doi: 10.1093/bioinformatics/btv661 ; PubMed Central PMCID: PMC6078167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Iwata H, Gotoh O. Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features. Nucleic Acids Res. 2012;40(20):e161. Epub 20120730. doi: 10.1093/nar/gks708 ; PubMed Central PMCID: PMC3488211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Lomsadze A, Ter-Hovhannisyan V, Chernoff YO, Borodovsky M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 2005;33(20):6494–506. Epub 20051128. doi: 10.1093/nar/gki937 ; PubMed Central PMCID: PMC1298918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008;24(5):637–44. Epub 20080124. doi: 10.1093/bioinformatics/btn013 . [DOI] [PubMed] [Google Scholar]
- 82.Bruna T, Lomsadze A, Borodovsky M. GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genom Bioinform. 2020;2(2):lqaa026. Epub 20200513. doi: 10.1093/nargab/lqaa026 ; PubMed Central PMCID: PMC7222226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH. Synteny and collinearity in plant genomes. Science. 2008;320(5875):486–8. doi: 10.1126/science.1153917 . [DOI] [PubMed] [Google Scholar]
- 84.Wang Y, Tang H, Debarry JD, Tan X, Li J, Wang X, et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40(7):e49. Epub 20120104. doi: 10.1093/nar/gkr1293 ; PubMed Central PMCID: PMC3326336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018;4(1):vey016. Epub 20180608. doi: 10.1093/ve/vey016 ; PubMed Central PMCID: PMC6007674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Supek F, Bosnjak M, Skunca N, Smuc T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS ONE. 2011;6(7):e21800. Epub 20110718. doi: 10.1371/journal.pone.0021800 ; PubMed Central PMCID: PMC3138752. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci U S A. 2020;117(17):9451–7. Epub 20200416. doi: 10.1073/pnas.1921046117 ; PubMed Central PMCID: PMC7196820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2. Epub 20100128. doi: 10.1093/bioinformatics/btq033 ; PubMed Central PMCID: PMC2832824. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Ramirez F, Dundar F, Diehl S, Gruning BA, Manke T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 2014;42(Web Server issue):W187–91. Epub 20140505. doi: 10.1093/nar/gku365 ; PubMed Central PMCID: PMC4086134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Buels R, Yao E, Diesh CM, Hayes RD, Munoz-Torres M, Helt G, et al. JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol. 2016;17. ARTN 66. WOS:000374281100001. doi: 10.1186/s13059-016-0924-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Servant N, Varoquaux N, Lajoie BR, Viara E, Chen CJ, Vert JP, et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16. ARTN 259. WOS:000365571000001. doi: 10.1186/s13059-015-0831-x [DOI] [PMC free article] [PubMed] [Google Scholar]