The Hydractinia Genome Project Portal: multi-omic annotation and visualization of Hydractinia genomic datasets

R Travis Moreland; Christine E Schnitzler; Suiyuan Zhang; Sumeeta Singh; Tyra G Wolfsberg; Andreas D Baxevanis

doi:10.1093/bioadv/vbaf215

. 2025 Oct 15;5(1):vbaf215. doi: 10.1093/bioadv/vbaf215

The Hydractinia Genome Project Portal: multi-omic annotation and visualization of Hydractinia genomic datasets

R Travis Moreland ¹, Christine E Schnitzler ^2,³, Suiyuan Zhang ⁴, Sumeeta Singh ⁵, Tyra G Wolfsberg ⁶, Andreas D Baxevanis ^7,^✉

Editor: Michael DeGiorgio

PMCID: PMC12624445 PMID: 41262968

Abstract

Motivation

The colonial hydroid Hydractinia exhibits several unique biological properties, including its remarkable regenerative capacity and the ability to distinguish self from non-self, characteristics that make them valuable models for studying human disease and aging. The availability of well-annotated multi-omic data, as well as tools to visualize these data, is essential for advancing the use of these model organisms to enhance our understanding of the relationship between genomic and morphological complexity, the evolution of multicellularity, and the emergence of novel cell types.

Results

We present the Hydractinia Genome Project Portal, a comprehensive resource providing genomic, transcriptomic, and proteomic datasets for two widely studied Hydractinia species. The portal provides extensive sequence, structure, and functional annotation resources that are not available elsewhere, including genome browsers, a single-cell gene expression atlas, a protein structure viewer, and a custom BLAST implementation. We demonstrate the portal’s utility for biological discovery and have used a subset of Hydractinia-specific stem cell gene markers to explore known gaps in annotation transfer methods, illustrating how structure-based deep learning methods such as DeepFRI can significantly improve the functional annotation of heretofore unannotated i-cell markers.

Availability and implementation

The Hydractinia Genome Project Portal is freely available at https://research.nhgri.nih.gov/hydractinia

1 Introduction

The colonial marine hydroid Hydractinia is one of a very small number of organisms that can regenerate its entire body throughout their lifetime. This remarkable ability to self-renew is made possible by adult migratory stem cells known as i-cells. Recent studies have demonstrated that these i-cells and early progenitor cells are part of a conserved toolkit shared with all animals, positioning Hydractinia as a promising model organism for the study of stem cell biology and regenerative medicine (Schnitzler et al. 2024). Moreover, Hydractinia colonies are capable of allorecognition solely through cell-cell contact (Nicotra 2019), providing valuable insights into the evolution of adaptive immunity in bilaterians. Importantly, Hydractinia has been found to encode more orthologs to genes associated with human disease than do other traditional invertebrate models, making them particularly valuable models for studying various types of human diseases (Maxwell et al. 2014). Given their potential for studying key questions in regeneration, allorecognition, stem cell biology, and aging, we recently reported on the sequencing and analysis of the genomes of two Hydractinia species, H. symbiolongicarpus, and H. echinata (Schnitzler et al. 2024).

With the increased emphasis on the development of new animal models for the study of human health, it is vital that genomic data generated using emerging research organisms such as Hydractinia be disseminated to the biomedical research community in an accessible fashion. Although high-quality whole-genome sequence data from Hydractinia are freely available through GenBank as BioProjects PRJNA807936 (Schnitzler et al. 2024) and PRJNA935535 (Kon-Nanjo et al. 2023), the standardized annotation approach used by GenBank can often result in the loss of unique biological insights as its automated high-throughput annotation pipelines are typically optimized for more established model organisms. In turn, this potentially leads to inaccurate gene predictions that overlook unique genomic features such as novel genes and species-specific adaptations (Steinegger et al. 2019); it can also result in propagation errors with some degree of frequency (Salzberg 2019). While GenBank continues to be a valuable resource to the research community, annotations for non-model organisms often require the application of additional tools and some degree of manual curation to produce more accurate and informative results.

To address these concerns, and for the benefit of investigators actively pursuing research aimed at advancing our understanding of the molecular innovations that drove the surge of diversity in early animal evolution, we have developed the Hydractinia Genome Project Portal (HGP Portal) which serves as a resource providing genomic, transcriptomic, and proteomic datasets for researchers focused on questions in the realm of comparative and evolutionary biology. Its suite of data visualization tools includes browsers for both Hydractinia genome assemblies, an interactive interface for exploring the H. symbiolongicarpus single-cell gene expression atlas, a protein structure viewer for examining proteome-scale structure predictions for both species, and a custom BLAST implementation. Additionally, the portal features Wiki-like gene pages that provide sequence and functional annotation data for each gene model, including protein domain composition, developmental expression data, associated OMIM human disease genes, and orthology predictions.

2 Multi-omic annotation and visualization

Hydractinia sequences and functional annotation are stored in a MariaDB (v.10.5.22) database and can be downloaded using Perl CGI (v.5.16.3) and PHP (v.5.4.16) scripts via an HTML interface from an Apache Web server. The HGP Portal features several interactive visualization tools for exploring these multi-omic datasets as described below.

2.1 Sequence data and annotation

All datasets in the HGP Portal are accessible through navigation links in the left sidebar. Users can retrieve an individual full or partial genomic scaffold sequence for either Hydractinia species by entering a scaffold identifier (e.g. HyS0001) in the search box on the “Fetch a Scaffold” page, with the option of specifying the start and end scaffold coordinates. Sequences may also be formatted as reverse-complemented sequence or as six-frame translated amino acids. Full genome assembly sequences for both H. symbiolongicarpus (4840 scaffolds) and H. echinata (7767 scaffolds) can be downloaded using the “Download Sequences” sidebar link.

Predicted gene models for each species (H. symbiolongicarpus, 22 022 genes; H. echinata, 28 825 genes) can be found on individual Wiki-like gene pages containing comprehensive annotations. The information on these pages includes nucleotide and protein sequences, coding exonic genomic coordinates, pre-computed BLAST hits displaying the top hits for each protein in both UniProt and NCBI’s nr database, InterProScan (Jones et al. 2014), and Pfam domains (Punta et al. 2012), functional annotations [gene ontology derived from Argot^2.5 (Falda et al. 2012) and PANNZER2 (Törönen et al. 2018)], developmental expression plots, related human disease genes from the Online Mendelian Inheritance in Man (OMIM) database (Amberger et al. 2019), and ortholog clusters formed using the OrthoFinder2 pipeline, a phylogenetically informed clustering method (Emms and Kelly 2015, 2019). Users can search for a gene by entering a Hydractinia gene identifier (e.g. HyS0001.1) in the text box on the “View a Gene Page.” Complete datasets of predicted gene models, protein models, and functional annotation can be retrieved in bulk from the applicable download links in the left sidebar.

Pfam domains have been identified for all proteins in both Hydractinia genomes using hmmscan from the HHMER suite against the Pfam-A database (version 31). These data, derived from both protein models generated using AUGUSTUS and six-frame translations of genomic scaffolds, can be searched by entering a known Pfam-A domain name or accession number. The results of these searches are displayed in tabular format, indicating the number of times the domain of interest is found in each protein. From here, the user can download either the retrieved domain sequences or the full-length proteins containing those domain sequences.

Several other types of datasets are available from the “Download Sequences” links in the left sidebar, including assembled transcripts from strand-specific transcriptome assemblies and bulk cell sorted RNA-seq data, full mitochondrial genomes, gene and protein datasets, predicted miRNAs for H. echinata, and 6 mA methylation SMRT-seq data from H. symbiolongicarpus. Compressed files containing scripts and additional data are also provided here.

2.2 BLAST

SequenceServer 2.0.0 (Priyam et al. 2019) was used to implement a local version of the BLAST tool, allowing users to upload query FASTA sequences and perform searches against several local H. echinata and H. symbiolongicarpus sequence databases not available through GenBank. At the nucleotide level, searches can be conducted against the complete genomes for both species, assembled transcripts, gene models, mitochondrial genomes, and various transcriptomes. At the protein level, searches can be performed against protein models, including mitochondrial protein models.

All BLAST results can be visualized as standard pairwise alignment graphs, Circos-style overview plots, and length distribution histograms. The Links column on each search result page redirects users to corresponding entries within the portal’s Gene Pages and Genome Browser, providing direct access to additional information on each gene. Results can be exported as standard tabular reports, full tabular reports, or full XML reports. These local sequence databases can also be downloaded in their entirety from the “Download Sequences” section of the website for local use.

2.3 Genome browser

The HGP Portal’s genome browser was developed using JBrowse (v.1.15.3; Buels et al. 2016) as the primary tool for visualizing annotations for Hydractinia species. The browser, which is accessible from the left sidebar on the Portal’s home page, is organized and searchable by genomic scaffold (e.g. HyS0001) and gene identifier (e.g. HyS0001.1). Customized JBrowse data tracks are available for viewing, including aligned RNA-seq data, assembled transcripts, functional annotations, gene models, non-coding RNA data, predicted functional domains, results from protein-based BLAST searches, reference sequences, identified repetitive elements (RepeatMasker; Smit et al. 2015), and both masked and un-masked scaffolds. Hydractinia gene models (AUGUSTUS) are presented as default tracks. Descriptions of all genome browser tracks can be found by following the “Track Descriptions” link located above the left sidebar.

2.4 Developmental expression

Bulk RNA-seq samples were previously collected in triplicate at four different time points during H. symbiolongicarpus development: at the 4-cell, 16-cell, and 64-cell stages, as well as from 24-hour larvae. Each replicate consisted of pools of embryos or larvae from a single spawning event resulting from a cross of male (291–10) and female (295–10) Hydractinia strains. RNA-seq libraries were constructed for each sample yielding a total of 12 libraries. These libraries were then sequenced on the Illumina HiSeq 4000 platform using indexed 75-base paired-end reads. Sequence reads were processed and aligned to the H. symbiolongicarpus gene models, counted, and normalized. Read count values can be viewed as time-course expression plots by entering a gene identifier (e.g. HyS0001.1) in the search box on the “Developmental Expression” page. Developmental expression plots can be viewed as box plots, violin plots, or various other plot types, with or without trend lines.

2.5 Single-cell browser

A single-cell transcriptomic browser was developed to visualize specific gene markers in well-annotated H. symbiolongicarpus cell-type clusters (Schnitzler et al. 2024). The browser, implemented in ShinyCell (v.2.1; Ouyang et al. 2021) and using the R package Seurat (v.4.3.0), creates UMAP graphs to visualize gene expression by cell type. The single-cell browser is accessible by clicking the link in the left sidebar, which then launches a ShinyCell application window. Users can choose from various visual representations of the RNA-seq data, specifying cell information and gene expression data to be displayed. Scripts, data files, and detailed descriptions of the analyses are available under the “Scripts and Data” link in the left sidebar.

2.6 Predicted protein structures

Using the NIH’s Biowulf supercomputing resource, proteome-scale structures were predicted for H. symbiolongicarpus (21 930 proteins) and H. echinata (28 729 proteins) using ColabFold (v.1.5.5, Mirdita et al. 2022) batch scripts to run AlphaFold2 (v.2.3.2; Jumper et al. 2021). This process consumed over 220 000 GPU hours. The “Predicted Protein Structures” page, accessible via the left sidebar, allows users to perform searches by protein identifier (e.g. HyS0009.240) (Fig. 1A). Each protein structure can be visualized using the HGP Portal’s implementation of Mol* Viewer, an advanced web-based tool that supports the interactive manipulation of 3D models, provides extensive customization options, and includes detailed annotation and analysis tools (v.3.37.1; Sehnal et al. 2021). Here, protein structures are color-coded based on their per-residue model confidence score (pLDDT). Full sets of Hydractinia PDB files can be downloaded from the “Download Protein Structures” link in the left sidebar.

Figure 1. — Protein structure visualization and functional annotation prediction. (A) The “Predicted Protein Structures” page, accessible via the left sidebar, allows users to search for a structure by protein identifier. Each protein structure can be visualized using our implementation of Mol* Viewer. Depicted here is the protein structure of the i-cell marker gene HyS0009.240. (B) Heatmap plot showing the presence (color) or absence (white) of GO annotations (Biological Process) for 15 *Hydractinia*-specific i-cell marker genes (*H. symbiolongicarpus*) as predicted by DeepFRI. The plot distinguishes between sequence-based (yellow), structure-based (red), and overlapping (orange; both sequence-based and structure-based) annotations for each prediction. All 15 gene markers now have predicted functional annotations, with structure-based annotation the most common (43.45%), followed by overlapping annotation (37.50%) and sequence-based annotations (19.05%).

3 Example use case: structure-based functional prediction of Hydractinia i-cell markers

Hydractinia genomes appear to contain an abundance of evolutionarily novel genes. Determining the functional roles of these newly discovered proteins is challenging, especially in the case of lineage-specific proteins. Traditionally, computational biologists have used homology-based sequence alignment methods to annotate newly sequenced genomes, and while these inference methods perform with relatively high accuracy, capitalizing on the availability of 3D structural information improves the accuracy of these annotations given that structure tends to be more conserved than sequence, particularly at extreme evolutionary distances (Illergård et al. 2009). To advance structure-based comparisons and experimental design, a section of the HGP Portal is devoted to providing proteome-scale predicted protein structures not currently available elsewhere. The availability of this vast amount of structural data will enable more accurate characterization of these proteins in the absence of experimentally determined structures, as well as with subsequent functional annotation.

As a case study, we focus here on a subset of 15 Hydractinia-specific i-cell gene markers identified via OrthoFinder2 that are unique to this lineage (Schnitzler et al. 2024) to explore known gaps in annotation transfer from homology-based sequence alignment prediction methods (e.g. BLAST). Here, we have used DeepFRI (Gligorijević et al. 2021), a method that, unlike homology-based techniques, uses a two-feature deep learning approach, utilizing a long short-term memory language model (LSTM-LM) to extract residue-level features from protein sequences, followed by a graph convolutional network (GCN) layer that merges contact maps derived from amino acid interactions within structures (Boadu et al. 2025). DeepFRI has recently been used to predict enriched GO terms in clustered structures across the entire AlphaFold database (Barrio-Hernandez et al. 2023), to predict functional annotations of representative protein domains from the Genomic Encyclopedia of Bacteria and Archaea (GEBA1003) reference genome database (Koehler Leman et al. 2023), and to predict the function of ∼1500 Drosophila de novo proteins—proteins arising from regions of genomic DNA that were previously believed to be noncoding (Middendorf et al. 2024).

First, to verify the current absence of homology-based, sequence-alignment predicted functional annotation for the 15 Hydractinia-specific i-cell marker genes identified in Schnitzler et al. (2024), we searched for “Pfam-A domains” and “Functional Annotation” using the HGP Portal “View a Gene Page” for each gene. With two exceptions—namely HyS0048.103, which encodes for proteins containing both HMG-box and Sox protein domains, and HyS0021.199, which encodes for Alba2, involved in chromatin organization (Laurens et al. 2012)—the remaining 13 encoded proteins identified by this search lack both Pfam-A domains and other functional annotations (e.g. from Argot^2.5 and PANNZER2) entirely. Next, to investigate the availability of structure-based annotations, we downloaded the PDB files for the 15 AlphaFold predicted protein structures (pLDDT range: 28.93–68.68) via the HGP Portal “Predicted Protein Structures” link. We then used FoldSeek (van Kempen et al. 2024) to query these 15 protein structures in both 3Di/AA and TM-align mode against all available databases. As expected, given the absence of experimentally solved Hydractinia protein structures in PDB and the limited number of computationally predicted Hydractinia protein structures in the AlphaFold database, FoldSeek did not produce any significant full-length protein structure hits with RMSD ≤ 2 Å or TM-score ≥ 0.5 (data not shown).

Lastly, we compiled a single .zip file containing the 15 PDB-formatted structures downloaded from the HGP Portal and utilized DeepFRI (https://github.com/flatironinstitute/DeepFRI) to predict functional annotations (deepfri_score ≥ 0.50) for the Hydractinia-specific i-cell marker dataset based on both structure (GCN-featured) and sequence (LSTM-LM-featured) functional annotations. The concatenated predicted functional annotation output (Table 1, available as supplementary data at Bioinformatics Advances online) includes columns for Protein Identifiers, GO_Domain (e.g. Biological Process, Cellular Component, Enzyme Commission number, and Molecular Function), GO_ID, GO_Name, Significance Score (≥ 0.50) and annotation type (either sequence- or structure-feature derived). For brevity, we focus here on the predicted “Biological Process” (BP) GO functional annotation. Figure 1B is a heatmap plot showing the presence (color) or absence (white) of BP annotations for each H. symbiolongicarpus i-cell gene marker. It distinguishes between sequence-based (yellow), structure-based (red), and overlapping (both sequence-based and structure-based; orange) features for each DeepFRI prediction. DeepFRI successfully predicted functional annotations for all 15 predominantly unannotated i-cell gene markers, with structure-based annotations being the most prevalent (43.45%), followed by overlapping annotations (37.50%) and sequence-based annotations (19.05%).

While the DeepFRI algorithm represents a significant advance toward accurately predicting protein function, certain cautions applicable to all predictive methods should be considered when examining these structure-based functional predictions. The performance of the method heavily depends on the quality and diversity of the training data, with biases or gaps potentially leading to inaccuracies for underrepresented or novel protein families (Gligorijević et al. 2021). The method may also overlook critical factors that influence protein structure and function such as post-translational modifications, protein-protein interactions, and environmental conditions. Additionally, accurately modeling the 3D conformation and dynamics of proteins is generally challenging, as even small structural variations can significantly impact function (Jumper et al. 2021). These considerations highlight the need to validate all such structural predictions with experimental data for a more accurate understanding of a given protein’s potential function.

4 Discussion

Here, we detail the development of the HGP Portal, a comprehensive resource featuring multi-omic datasets for two species of Hydractinia, a promising emerging model organism for the study of stem cell biology, regeneration, and allorecognition. We provide an overview of the available genomic, transcriptomic, and proteomic data, along with functional annotations and interactive visualization tools. Additionally, we demonstrate the utility of the HGP Portal by highlighting the potential of these datasets for biological discovery. This potential is enhanced through ongoing progressive curation that is enabled by novel deep learning methodologies and pipelines that improve upon existing or deficient functional annotations. Despite the relatively small size of the dataset, we observe a significant improvement in predicted functional annotation assignments using deep learning structure-based methods compared to traditional homology-based sequence alignment methods.

As with all such online resources, the value of genomic portals largely depends on the continued inclusion of additional data generated by investigators in the field. Consequently, we anticipate that future versions of the HGP Portal will expand in scope to include large-scale genomic data relevant to advancing our understanding of the natural history, geographic distribution, and population genetics of Hydractinia populations. In addition, given the rapid evolution in the field of protein structure prediction, we will continue to leverage our access to NIH’s Biowulf supercomputing facility to continually improve the structural predictions provided through this portal. These large-scale structural datasets will continue to be made freely available, with the goal of advancing our understanding of complex biological processes and protein interactions. Improved functional annotations resulting from these predicted structures also have the potential to offer crucial insights into the roles of these proteins and their possible relevance to health and disease. This translational aspect highlights the potential for these advancements to contribute significantly to human health by elucidating the molecular underpinnings of disease and guiding the development of targeted therapies.

Supplementary Material

vbaf215_Supplementary_Data

vbaf215_supplementary_data.xlsx^{(22.5KB, xlsx)}

Acknowledgements

This work utilized the computational resources of the NIH Biowulf high-performance computing cluster (https://hpc.nih.gov).

Contributor Information

R Travis Moreland, Center for Genomics and Data Science Research, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, United States.

Christine E Schnitzler, Whitney Laboratory for Marine Bioscience, University of Florida, St. Augustine, FL 32080, United States; Department of Biology, University of Florida, Gainesville, FL 32611, United States.

Suiyuan Zhang, Center for Genomics and Data Science Research, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, United States.

Sumeeta Singh, Center for Genomics and Data Science Research, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, United States.

Tyra G Wolfsberg, Center for Genomics and Data Science Research, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, United States.

Andreas D Baxevanis, Center for Genomics and Data Science Research, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, United States.

Supplementary data

Supplementary data are available at Bioinformatics Advances online.

Conflict of interest

None declared.

Funding

This work was supported by the Intramural Research Program of the National Institutes of Health [ZIA HG000140 to A.D.B.]. The contributions of the NIH authors are considered Works of the United States Government. The findings and conclusions presented in this paper are those of the authors and do not necessarily reflect the views of the NIH or the U.S. Department of Health and Human Services.

Data availability

The data underlying this article are freely available within the Hydractinia Genome Project Portal (https://research.nhgri.nih.gov/hydractinia).

References

Amberger JS, Bocchini CA, Scott AF et al. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res 2019;47:D1038–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barrio-Hernandez I, Yeo J, Jänes J et al. Clustering predicted structures at the scale of the known protein universe. Nature 2023;622:637–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
Boadu F, Lee A, Cheng J. Deep learning methods for protein function prediction. Proteomics 2025;25:e2300471. [DOI] [PMC free article] [PubMed] [Google Scholar]
Buels R, Yao E, Diesh CM et al. JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol 2016;17:66. [DOI] [PMC free article] [PubMed] [Google Scholar]
Emms DM, Kelly S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 2015;16:157. [DOI] [PMC free article] [PubMed] [Google Scholar]
Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 2019;20:238. [DOI] [PMC free article] [PubMed] [Google Scholar]
Falda M, Toppo S, Pescarolo A et al. Argot2: a large-scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms. BMC Bioinformatics 2012;13:S14. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gligorijević V, Renfrew PD, Kosciolek T et al. Structure-based protein function prediction using graph convolutional networks. Nat Commun 2021;12:3168. [DOI] [PMC free article] [PubMed] [Google Scholar]
Illergård K, Ardell DH, Elofsson A. Structure is three to ten times more conserved than sequence—a study of structural response in protein cores. Proteins 2009;77:499–508. [DOI] [PubMed] [Google Scholar]
Jones P, Binns D, Chang HY et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 2014;30:1236–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jumper J, Evans R, Pritzel A et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021;596:583–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Koehler Leman J, Szczerbiak P, Renfrew PD et al. Sequence-structure-function relationships in the microbial protein universe. Nat Commun 2023;14:2351. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kon-Nanjo K, Kon T, Horkan HR et al. Chromosome-level genome assembly of Hydractinia symbiolongicarpus. G3 Genes Genomes Genet 2023;13:jkad107. [DOI] [PMC free article] [PubMed] [Google Scholar]
Laurens N, Driessen RP, Heller I et al. Alba shapes the archaeal genome using a delicate balance of bridging and stiffening the DNA. Nat Commun 2012;3:1328. [DOI] [PMC free article] [PubMed] [Google Scholar]
Maxwell EK, Schnitzler CE, Havlak P et al. Evolutionary profiling reveals the heterogeneous origins of classes of human disease genes: implications for modeling disease genetics in animals. BMC Evol Biol 2014;14:212. [DOI] [PMC free article] [PubMed] [Google Scholar]
Middendorf L, Ravi Iyengar B, Eicholt LA. Sequence, structure, and functional space of drosophila de novo proteins. Genome Biol Evol 2024;16:evae176. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mirdita M, Schütze K, Moriwaki Y et al. ColabFold: making protein folding accessible to all. Nat Methods 2022;19:679–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nicotra ML. Invertebrate allorecognition. Curr Biol 2019;29:R463–7. [DOI] [PubMed] [Google Scholar]
Ouyang JF, Kamaraj US, Cao EY et al. ShinyCell: simple and sharable visualization of single-cell gene expression data. Bioinformatics 2021;37:3374–6. [DOI] [PubMed] [Google Scholar]
Priyam A, Woodcroft BJ, Rai V et al. SequenceServer: a modern graphical user interface for custom BLAST databases. Mol Biol Evol 2019;36:2922–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Punta M, Coggill PC, Eberhardt RY et al. The Pfam protein families database. Nucleic Acids Res 2012;40:D290–301. [DOI] [PMC free article] [PubMed] [Google Scholar]
Salzberg SL. Next-generation genome annotation: we still struggle to get it right. Genome Biol 2019;20:92. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schnitzler CE, Chang ES, Waletich J et al. The genome of the colonial hydroid Hydractinia reveals that their stem cells use a toolkit of evolutionarily shared genes with all animals. Genome Res 2024;34:498–513. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sehnal D, Bittrich S, Deshpande M et al. Mol viewer: modern web app for 3D visualization and analysis of large biomolecular structures. Nucleic Acids Res 2021;49:W431–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smit AFA, Hubley R, Green P. RepeatMasker Open-4.0. 2013. –2015. http://www.repeatmasker.org
Steinegger M, Mirdita M, Söding J. Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold. Nat Methods 2019;16:603–6. [DOI] [PubMed] [Google Scholar]
Törönen P, Medlar A, Holm L PANNZER2: a rapid functional annotation web server. Nucleic Acids Res 2018;46:W84–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
van Kempen M, Kim SS, Tumescheit C et al. Fast and accurate protein structure search with Foldseek. Nat Biotechnol 2024;42:243–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

vbaf215_Supplementary_Data

vbaf215_supplementary_data.xlsx^{(22.5KB, xlsx)}

Data Availability Statement

The data underlying this article are freely available within the Hydractinia Genome Project Portal (https://research.nhgri.nih.gov/hydractinia).

[vbaf215-B1] Amberger JS, Bocchini CA, Scott AF et al. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res 2019;47:D1038–43. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B2] Barrio-Hernandez I, Yeo J, Jänes J et al. Clustering predicted structures at the scale of the known protein universe. Nature 2023;622:637–45. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B3] Boadu F, Lee A, Cheng J. Deep learning methods for protein function prediction. Proteomics 2025;25:e2300471. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B4] Buels R, Yao E, Diesh CM et al. JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol 2016;17:66. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B5] Emms DM, Kelly S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 2015;16:157. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B6] Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 2019;20:238. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B7] Falda M, Toppo S, Pescarolo A et al. Argot2: a large-scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms. BMC Bioinformatics 2012;13:S14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B8] Gligorijević V, Renfrew PD, Kosciolek T et al. Structure-based protein function prediction using graph convolutional networks. Nat Commun 2021;12:3168. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B9] Illergård K, Ardell DH, Elofsson A. Structure is three to ten times more conserved than sequence—a study of structural response in protein cores. Proteins 2009;77:499–508. [DOI] [PubMed] [Google Scholar]

[vbaf215-B10] Jones P, Binns D, Chang HY et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 2014;30:1236–40. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B11] Jumper J, Evans R, Pritzel A et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021;596:583–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B12] Koehler Leman J, Szczerbiak P, Renfrew PD et al. Sequence-structure-function relationships in the microbial protein universe. Nat Commun 2023;14:2351. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B13] Kon-Nanjo K, Kon T, Horkan HR et al. Chromosome-level genome assembly of Hydractinia symbiolongicarpus. G3 Genes Genomes Genet 2023;13:jkad107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B14] Laurens N, Driessen RP, Heller I et al. Alba shapes the archaeal genome using a delicate balance of bridging and stiffening the DNA. Nat Commun 2012;3:1328. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B15] Maxwell EK, Schnitzler CE, Havlak P et al. Evolutionary profiling reveals the heterogeneous origins of classes of human disease genes: implications for modeling disease genetics in animals. BMC Evol Biol 2014;14:212. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B16] Middendorf L, Ravi Iyengar B, Eicholt LA. Sequence, structure, and functional space of drosophila de novo proteins. Genome Biol Evol 2024;16:evae176. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B17] Mirdita M, Schütze K, Moriwaki Y et al. ColabFold: making protein folding accessible to all. Nat Methods 2022;19:679–82. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B18] Nicotra ML. Invertebrate allorecognition. Curr Biol 2019;29:R463–7. [DOI] [PubMed] [Google Scholar]

[vbaf215-B19] Ouyang JF, Kamaraj US, Cao EY et al. ShinyCell: simple and sharable visualization of single-cell gene expression data. Bioinformatics 2021;37:3374–6. [DOI] [PubMed] [Google Scholar]

[vbaf215-B20] Priyam A, Woodcroft BJ, Rai V et al. SequenceServer: a modern graphical user interface for custom BLAST databases. Mol Biol Evol 2019;36:2922–4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B21] Punta M, Coggill PC, Eberhardt RY et al. The Pfam protein families database. Nucleic Acids Res 2012;40:D290–301. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B22] Salzberg SL. Next-generation genome annotation: we still struggle to get it right. Genome Biol 2019;20:92. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B23] Schnitzler CE, Chang ES, Waletich J et al. The genome of the colonial hydroid Hydractinia reveals that their stem cells use a toolkit of evolutionarily shared genes with all animals. Genome Res 2024;34:498–513. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B24] Sehnal D, Bittrich S, Deshpande M et al. Mol viewer: modern web app for 3D visualization and analysis of large biomolecular structures. Nucleic Acids Res 2021;49:W431–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B25] Smit AFA, Hubley R, Green P. RepeatMasker Open-4.0. 2013. –2015. http://www.repeatmasker.org

[vbaf215-B26] Steinegger M, Mirdita M, Söding J. Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold. Nat Methods 2019;16:603–6. [DOI] [PubMed] [Google Scholar]

[vbaf215-B27] Törönen P, Medlar A, Holm L PANNZER2: a rapid functional annotation web server. Nucleic Acids Res 2018;46:W84–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbaf215-B28] van Kempen M, Kim SS, Tumescheit C et al. Fast and accurate protein structure search with Foldseek. Nat Biotechnol 2024;42:243–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

The Hydractinia Genome Project Portal: multi-omic annotation and visualization of Hydractinia genomic datasets

R Travis Moreland

Christine E Schnitzler

Suiyuan Zhang

Sumeeta Singh

Tyra G Wolfsberg

Andreas D Baxevanis

Roles

Abstract

Motivation

Results

Availability and implementation

1 Introduction

2 Multi-omic annotation and visualization

2.1 Sequence data and annotation

2.2 BLAST

2.3 Genome browser

2.4 Developmental expression

2.5 Single-cell browser

2.6 Predicted protein structures

Figure 1.

3 Example use case: structure-based functional prediction of Hydractinia i-cell markers

4 Discussion

Supplementary Material

Acknowledgements

Contributor Information

Supplementary data

Conflict of interest

Funding

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

The Hydractinia Genome Project Portal: multi-omic annotation and visualization of Hydractinia genomic datasets

R Travis Moreland

Christine E Schnitzler

Suiyuan Zhang

Sumeeta Singh

Tyra G Wolfsberg

Andreas D Baxevanis

Roles

Abstract

Motivation

Results

Availability and implementation

1 Introduction

2 Multi-omic annotation and visualization

2.1 Sequence data and annotation

2.2 BLAST

2.3 Genome browser

2.4 Developmental expression

2.5 Single-cell browser

2.6 Predicted protein structures

Figure 1.

3 Example use case: structure-based functional prediction of Hydractinia i-cell markers

4 Discussion

Supplementary Material

Acknowledgements

Contributor Information

Supplementary data

Conflict of interest

Funding

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases