Abstract
Background
Coral reefs house about 25% of marine biodiversity and are critical for the livelihood of many communities by providing food, tourism revenue, and protection from wave surge. These magnificent ecosystems are under existential threat from anthropogenic climate change. Whereas extensive ecological and physiological studies have addressed coral response to environmental stress, high-quality reference genome data are lacking for many of these species. The latter issue hinders efforts to understand the genetic basis of stress resistance and to design informed coral conservation strategies.
Results
We report genome assemblies from 4 key Hawaiian coral species, Montipora capitata, Pocillopora acuta, Pocillopora meandrina, and Porites compressa. These species, or members of these genera, are distributed worldwide and therefore of broad scientific and ecological importance. For M. capitata, an initial assembly was generated from short-read Illumina and long-read PacBio data, which was then scaffolded into 14 putative chromosomes using Omni-C sequencing. For P. acuta, P. meandrina, and P. compressa, high-quality assemblies were generated using short-read Illumina and long-read PacBio data. The P. acuta assembly is from a triploid individual, making it the first reference genome of a nondiploid coral animal.
Conclusions
These assemblies are significant improvements over available data and provide invaluable resources for supporting multiomics studies into coral biology, not just in Hawaiʻi but also in other regions, where related species exist. The P. acuta assembly provides a platform for studying polyploidy in corals and its role in genome evolution and stress adaptation in these organisms.
Keywords: coral, Scleractinia, Montipora capitata, Pocillopora acuta, Pocillopora meandrina, Porites compressa, chromosome-level genome assembly, ploidy, triploid
Background
Montipora capitata (NCBI:txid46704, marinespecies.org:taxname:287697), Pocillopora acuta (NCBI:txid1491507, marinespecies.org:taxname:759099), Pocillopora meandrina (NCBI:txid46732, marinespecies.org:taxname:206964), and Porites compressa (NCBI:txid46720, marinespecies.org:taxname:207236) are species of scleractinian corals that are widespread in the Hawaiian Islands, with M. capitata and P. compressa being dominant reef builders. These species are members of cosmopolitan genera, with closely related taxa inhabiting reefs across the Great Barrier Reef and the Coral Triangle [1–3], as well as other regions, such as Pocillopora in Panama [4]. In recent years, due to their critical importance to Hawaiian reef ecosystems and the growing risks posed by climate change, these 4 species have become the subject of many stress (including thermal [5–7] and acidification [8, 9]), microbiome [10, 11], and population genomic [12–15] studies (among many others). Given this heightened interest, there is a pressing need to generate high-quality reference genome data from Hawaiian species to empower future research.
A genome assembly for M. capitata was published in 2019 by our group [16] using Pacific Biosciences (PacBio) RSII data. This assembly was significantly larger (886 Mbp) than other coral genomes available at that time (ca. 300–500 Mbp) and is larger than any Montipora species genome [17, 18] that has since been published. This initial assembly contains a high number (>18% [19]) of duplicated BUSCO genes, suggesting the presence of haplotigs (i.e., sequences derived from different homologous chromosomes) that were not removed during the assembly process. There are currently published genomes for 3 Pocillopora [4, 20, 21] species, none of which are from Hawaiʻi. One of these is a P. acuta isolate collected from Lombok, Indonesia [22], that was generated using Illumina short-read data. This genome assembly is highly fragmented, consisting of 168,465 scaffolds, and although it does have a scaffold N50 of 147 Kbp, the contig N50 is only 9,649 bp. The completeness of the genes predicted in this genome is not high, with only 56% of the core eukaryotic genes [20] identified in the reported “ab initio” predicted gene set. A second set of predicted genes inferred using RNA sequencing (RNA-seq) evidence (termed the “experimental” set) contains 93% of core eukaryotic genes, but this set does not have predicted open reading frames (i.e., it includes both coding and noncoding genes), making it difficult to make a direct comparison with other published genomes. There are currently 3 Porites species with published genomes [23–25] that are of high completeness and reasonable contiguity, but none are from Hawaiʻi.
As the cost of genome sequencing, particularly long-read methods, continues to decrease, opportunities arise to generate genome data from understudied species or species that have genomes of lower quality that would benefit from the improvement gained from newer technologies. Furthermore, methods such as Dovetail Omni-C, which provides long-range linkage information, enable the generation of genome assemblies that are at (or near) chromosomal-level resolution. In this study, we generated an improved reference genome assembly for our previously published Hawaiian M. capitata using long-read PacBio, short-read Illumina, and newly generated Omni-C data that is of chromosome-level resolution. The 14 largest scaffolds resulting from this assembly likely represent the 14 chromosomes predicted in Montipora species [26]. We also generated, using PacBio HiFi data (i.e., circular consensus corrected PacBio reads), high-quality genome assemblies for 2 Pocillopora and 1 Porites species. The P. acuta isolate is a triploid, making it the first nondiploid coral genome to be sequenced.
Data Description
Sample collection and processing
The 4 coral species targeted in this study were collected from Kāneʻohe Bay, Hawaiʻi. For M. capitata, the initial PacBio and Illumina-based assembly was generated using sperm DNA (see [16]). Input DNA for the Dovetail Genomics, Scotts Valley, California approach, using the Omni-C assay and workflow, was a bleached nubbin (a ∼5 × 5-cm fragment) from a colony that was greatly reduced in algal symbionts (GPS coordinates: 21.474465, −157.834468; SRA BioSample: SAMN21845729). This fragment was collected under Hawaiʻi Department of Aquatic Resources Special Activity Permit 2019–60, snap frozen in liquid nitrogen, and stored at −80°C before it was shipped on dry ice to Dovetail Genomics for processing using their Omni-C assay and workflow.
For P. meandrina, 1 nubbin (a ∼5 × 5-cm fragment) was collected from an adult colony from Reef 13 (GPS coordinates: 21.450803, −157.794692) on 5 September 2020 (SRA BioSample: SAMN21845732, SAMN21845733, and SAMN21845734) under DAR-2021–33, Amendment No. 1 to Hawaiʻi Institute of Marine Biology (HIMB). This nubbin was selected for DNA extraction as it was bleached and would have a greatly reduced algal symbiont density. High molecular weight DNA was extracted using the QIAGEN, Hilden, Germany Genomic-tip 100/G (cat. 10223), the QIAGEN Genomic DNA Buffer Set (cat. 19060), QIAGEN RNase A (100-mg/mL concentration: cat. 19101), QIAGEN Proteinase K (cat. 19131), and DNA lo-bind tubes (Eppendorf, Hamburg, Germany, cat. 022431021). Briefly, a clipping of the coral fragment was placed in a cleaned and sterilized mortar and pestle and ground to powder on liquid nitrogen. High molecular weight DNA was then extracted according to the manufacturer's instructions for preparation of tissue samples in the QIAGEN Genomic DNA Handbook (version 06/2015).
For P. acuta, 1 nubbin was collected from an adult colony from a reef next to the Hawaiʻi Institute of Marine Biology (GPS coordinates: 21.436056, −157.786861) on 5 September 2018 (SRA BioSample: SAMN22898959) under Special Activity Permit 2019–60. This nubbin was selected for DNA extraction because it was bleached and would have a greatly reduced algal symbiont density. High molecular weight DNA was extracted using the QIAGEN Genomic-tip 100/G approach outlined for P. meandrina above. High molecular weight DNA from P. meandrina and P. acuta was sent to the DNA Link Sequencing Lab for sequencing on their PacBio, Menlo Park, CA Sequel 2 (PacBio Sequel II System, RRID:SCR_017990) and Illumina (San Diego, CA) NovaSeq 6000 platforms (Illumina NovaSeq 6000 Sequencing System, RRID:SCR_020150).
For P. compressa, DNA was extracted from sperm released at 11 p.m. on 9 June 2017 from a single colony in Kāneʻohe Bay, Oʻahu. Total genomic DNA was extracted using the CTAB protocol and the DNeasy Blood and Tissue Kit (Qiagen) with subsequent clean-up steps. Genomic data were generated using the PacBio RS II platform (PacBio RS II Sequencing System, RRID:SCR_017988). To increase the sequence quality of the assembly, a polishing step was done using the Arrow consensus caller. To this end, we generated a total of 20 Gbp of high-throughput sequencing data (Illumina HiSeq2000; 100-bp paired-end library) as follows. The whole-genome sequencing library of P. compressa was prepared using the Truseq Nano DNA Prep Kit (550 bp) protocol following the manufacturer's instructions. Randomly sheared genomic DNA was ligated with index adapters and purified. The ligated products were size-selected for 300 to 400 bp and amplified using the adapter-specific primers. Library quality was checked using a 2100 BioAnalyzer (Agilent Technologies, Santa Clara, CA, USA).
RNA extractions
RNA was extracted by clipping a small piece of coral using clippers sterilized in 10% bleach, deionized water, isopropanol, and RNAse free water and then placed in a 2-mL Fisherbrand™ Pre-Filled Bead Mill microcentrifuge tube containing 0.5 mm glass beads (Fisher Scientific, Waltham, MA; cat. 15-340-152) with 1000 μL Zymo (Irvine, CA) DNA/RNA shield. A 2-step extraction protocol was used to extract RNA and DNA, with the first step as a “soft” homogenization to reduce shearing of RNA or DNA. Tubes were vortexed at high speed for 1 and 2 minutes for P. acuta and M. capitata fragments, respectively. The supernatant was removed and designated as the “soft extraction.” Second, an additional 500 μL Zymo DNA/RNA shield was added to the bead tubes and placed in a QIAGEN TissueLyser for 1 minute at 20 Hz. The supernatant was removed and designated as the “hard extraction.” Subsequently, 300 μL of the sample from both soft and hard homogenate was extracted with the Zymo Quick-DNA/RNA Miniprep Plus Kit (cat. D7003) Protocol with the following modifications. RNA quantity (ng/μL) was measured with a ThermoFisher (Waltham, MA) Qubit Fluorometer, DNA quality was assessed using gel electrophoresis, and RNA quality was measured with an Agilent TapeStation System.
Haploid genome assembly of Hawaiian coral species
A diagram depicting the genome assembly, gene prediction, and functional annotation workflow used for each of the Hawaiian coral species is presented in Fig. 1. The long-read genome sequencing data (PacBio) from the Hawaiian coral species were initially assembled using CANU (RRID:SCR_015880) (v2.2; default options) [27]. The PacBio reads from M. capitata (78.3 Gbp; Supplementary Table S1) and P. compressa (63.3 Gbp) were generated using the PacBio RSII platform (giving the “-pacbio” parameter to the CANU assembler). The PacBio reads for P. meandrina (311.8 Gbp; Supplementary Table S1) and P. acuta (239.1 Gbp) were generated using the PacBio HiFi platform (giving the “-pacbio-hifi” parameter to the CANU assembler). An error correction step (nucleotide correction of assembly) using the initial assemblies of M. capitata (1.2 Gbp; Supplementary Table S2), P. compressa (1.0 Gbp), P. meandrina (0.7 Gbp), and P. acuta (1.1 Gbp) was done using bowtie2 (RRID:SCR_016368) v2.4.2 [28] and the Pilon program (RRID:SCR_014731) v1.23 [29] with the Illumina short-read sequencing data (27.4 Gbp for M. capitata, 20.9 Gbp for P. compressa, 27.2 Gbp for P. meandrina, and 23.0 Gbp for P. acuta; Supplementary Table S1). Before using the Illumina data, quality trimming and adapter clipping of the raw reads were done using Trimmomatic (RRID:SCR_011848) v0.39 [30]. To remove potential contaminant sequences, assembly results were analyzed using BLASTn (RRID:SCR_001598) (e-value cutoff = 1e−10) analysis with the nr database (downloaded: February 2019). To estimate genome size and ploidy of the Hawaiian coral species, k-mer analysis was done using Jellyfish (21-mer) [31] with the Illumina short-read data.
An additional step was performed to identify any scaffolds in the coral genome assemblies that are putatively derived from the algal (Symbiodiniaceae) symbionts. Each of the 4 assemblies was compared against a custom database of all published Symbiodiniaceae genomes [23, 28, 32–35] (Supplementary Table S3) using BLASTn (v2.10.1; -max_target_seqs 2000). The resulting BLAST hits were filtered, retaining only those with an e-value <1e−20 and a bitscore >1,000. Hits to the Cladocopium sp. C15 genome [23] were also removed because this assembly is from a holobiont sequencing project (i.e., assembled from a metagenome sample) and is, therefore, more likely to be contaminated with coral sequences than the other Symbiodiniaceae data that were derived from unialgal cultures. Overlapping filtered BLAST hits were merged and their coverage of each coral scaffold was calculated using bedtools (v2.29.2) [36]. The regions covered by merged BLAST hits on scaffolds with >10% and >1% of their bases covered by BLASTn hits were extracted and compared against the NCBI nt database using the online BLASTn tool (default settings; accessed 21 July 2022). All of the regions on scaffolds with >10% and >1% hit coverage had similarity to coral ribosomal RNA sequences in the NCBI nt database (Supplementary Table S4), suggesting that their similarity to Symbiodiniaceae genomes does not represent contamination. Therefore, no additional scaffolds were removed from the coral genome assemblies.
To reconstruct haploid genomes using the initial assemblies of the Hawaiian coral species, we used the following protocol. First, we predicted repetitive DNA sequences in the initial assemblies and constructed soft-masked assemblies. Repetitive DNA elements were identified using the RepeatModeler pipeline (RRID:SCR_015027) v2.0. [37–39], which includes RECON (RRID:SCR_021170) v1.08 and RepeatScout (RRID:SCR_014653) v1.0.6 as de novo repeat finding programs. We used the default options for l-mer size and removed low-complexity and tandem repeats. To classify repeat content, the libraries were constructed from giri repbase (RRID:SCR_021169). The consensus sequences of repeat families were used to analyze corresponding repeat regions with RepeatMasker (RRID:SCR_012954) v4.1.1. The second step in the protocol was to infer assemblies as haploid genomes using the HaploMerger2 (HM2) program (the latest release, 20180603) [40] and the soft-masked assemblies. The third step was validation of duplicated eukaryotic core genes in the haploid genome assemblies using the BUSCO (RRID:SCR_015008) program (v4.1.4; genome-based analysis with eukaryota_odb10 dataset) [41]. The final step was to repeat the HM2 analysis until the number of duplicated eukaryotic core genes decreased to under 1%, or the value could not be decreased any further in the haploid assemblies (Supplementary Table S2). The purged assembly of M. capitata was sent to Dovetail Genomics along with an additional coral fragment (see above) that was used for high molecular weight DNA extraction for analysis using their Omni-C assay and HiRise v2.2.0 assembly workflow. A total of 56.5 million read-pairs of Dovetail Genomics Omni-C sequencing data (Supplementary Table S1) were generated and used for scaffolding. This step produced a final genome assembly that was at putative chromosome level resolution for M. capitata.
Gene prediction and functional annotation
Quality trimming and adapter removal from the RNA-seq data in the Hawaiian coral species (77.5 Gbp for M. capitata, 76.5 Gbp for P. compressa, 656.7 Gbp for P. acuta, and 10.6 Gbp for P. meandrina; Supplementary Table S1) were done using Trimmomatic (v0.39; default options) [30]. These data were assembled using Trinity (RRID:SCR_013048) v2.11 with the default option of de novo transcriptome assembly [42, 43]. The trimmed RNA-seq raw reads and the assembled transcriptomes were aligned to the haploid genome assemblies using the STAR (RRID:SCR_004463) aligner (v2.6.0c; default options for the raw reads) and the STARlong aligner (v2.6.0c; –runMode alignReads –alignIntronMin 10 –seedPerReadNmax 100 000 –seedPerWindowNmax 1000 –alignTranscriptsPerReadNmax 100000 –alignTranscriptsPerWindowNmax 10000), respectively [44]. Based on each alignment (i.e., bam file), gene predictions were done using the BRAKER2 pipeline v2.1.5 [45], which includes GeneMark-ET [46] and AUGUSTUS (RRID:SCR_008417) [47] with default (automatically optimized) options. When the gene models predicted in the same region of the genome by the 2 gene prediction approaches (i.e., RNA-seq and assembled transcript-based BRAKER gene models) differed, the best (e.g., longest nonchimeric) gene model was manually selected, based on the results of a web-BLASTp search (e-value cutoff = 1.e−5 cutoff). Functional annotation of gene models was done using the NCBI Conserved Domain Search (CD-Search) [48], the eggNOG-mapper [49], and the KEGG Automatic Annotation Server (KAAS) [50].
Genomes of corals used for comparative analysis
The genome assemblies and predicted genes from the 4 Montipora (M. cactus [17], M. capitata from the Hawaiian Waiopae tide pools [18], M. efflorescens [17], and the previous version of the Hawaiian M. capitata isolate [16] that we assembled in this study), 3 Pocillopora (P. damicornis [4], P. acuta [from Indonesia] [22], and P. verrucosa [21]), and 4 Porites (P. astreoides [25], P. australiensis [24], P. lutea [23], and P. rus [51]) species were retrieved from their respective repositories (Supplementary Table S5) and used for comparative analysis with the assemblies generated in this study. The M. cactus and M. efflorescens genome assemblies [17] were filtered, retaining only scaffolds identified by Yuki et al. [19] as not being haplotigs. The updated gene models from Yuki et al. [19] were used in place of those available with the original assemblies. For species where just the gene modes were provided (in gff format), gffread v0.11.6 (-S -x cdsfile -y pepfile) [52] was used to infer the protein and CDS sequences. Open reading frames (ORFs) were predicted in the RNA-seq–based “experimental” genes predicted in the Indonesian P. acuta isolate [22], using TransDecoder (RRID:SCR_017647) v5.5.0. HMMER (RRID:SCR_005305) v3.1b2 was used to query the candidate ORFs against the Pfam (RRID:SCR_004726) database (release 33.1; i-Evalue <0.001) and BLASTp (RRID:SCR_001010) (v2.10.1; -max_target_seqs 1 -evalue 1e-5) was used to search candidate ORFs against the SwissProt database (release 2020_05), with the resulting homology information used by TransDecoder (RRID:SCR_017647) to guide ORF prediction. Only the longest transcript per gene had ORFs predicted, and single-exon genes without strand information were assumed to be from the forward/positive strand (TransDecoder will change the strand of single exon genes if required, based on the results of ORF prediction).
Genome size estimation
The genome size and ploidy of the new (this study) and published Montipora, Pocillopora, and Porites species (except the Indonesian P. acuta, which does not have read data available to download; P. rus, which only had reads from the holobiont [i.e., reads from the coral, algal symbiont, and associated bacteria] available; and P. astreoides, which only had PacBio long reads available) were estimated using the GenomeScope2 and Smudgeplot tools [53]. For each species, the available short-read genome sequencing data were retrieved from NCBI SRA (Supplementary Table S5), trimmed using cutadapt (RRID:SCR_011841) v3.5 [54] (-q 20 –minimum-length 25 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT), and decomposed into k-mers using Jellyfish [31] (v2.3.0; k = 21). The k-mer frequency histogram produced by Jellyfish (using the “jellyfish histo” command) was imported into GenomeScope2 with a theoretical diploid model fitted with the data (Fig. 2C, D, F and Supplementary Fig. S1); a theoretical triploid model was fitted with the Hawaiian P. acuta data (Fig. 2E and Supplementary Fig. S1F) because it was found to be a triploid after initial analysis using Smudgeplot and GenomeScope2. Smudgeplot was run using the k-mers extracted by Jellyfish (RRID:SCR_005491), with thresholds for the lower k-mer coverage cutoff (just after the minimum between the initial error peak and the first major peak) and upper k-mer coverage cutoff (8.5 times the coverage of the first major coverage peak) chosen for each species using the GenomeScope2 profile shown in Supplementary Figure S1. The “smudge plots” shown in Supplementary Figure S1 were generated using the haploid coverage values estimated by GenomeScope2. The cutoffs used when running Smudgeplot for each species are shown in Supplementary Table S5.
Confirmation of sample ploidy
The program nQuire [55] (retrieved 7 July 2021), which uses the frequency distribution of biallelic variant sites inferred from aligned reads to model the ploidy of a sample, was used to verify the ploidy of the 4 genomes sequenced in this study. Briefly, bowtie2 (RRID:SCR_016368) v2.4.4 (“–very-sensitive –no-unal”) was used to align the trimmed (by cutadapt; described previously) Illumina short-reads against their respective genome assemblies; aligned reads were coordinate sorted using samtools (RRID:SCR_002105) v1.11 [56]. The aligned and sorted BAM files were converted into “BIN” files using nQuire (“nQuire create -q 20 -c 20 -x”), filtering for reads with a minimum mapping quality of 20 and sites with a minimum coverage of 20. Denoised BIN files were created using the “nQuire denoise” command run on the initial BIN files. The delta log-likelihood values for each ploidy model (diploid, triploid, and tetraploid) were calculated by the “nQuire lrdmodel” command for each of the initial and denoised BIN files. The lower the delta log-likelihood value of a given model, the better fit it is for the frequency distribution of the biallelic variant sites extracted from the aligned reads; the ploidy of the sample is therefore assumed to be the ploidy model with the lowest delta log-likelihood value. The nQuire results are shown in Supplementary Table S6.
Assessment of completeness using BUSCO
The “completeness” of the genome assemblies and predicted genes (published in this study and from previous studies; Supplementary Table S7) was assessed using BUSCO v5.0.0 (“–mode genome” and “–mode protein,” respectively) with the eukaryota_odb10 (release 10 September 2020) and metazoa_odb10 datasets (release 24 February 2021) [57].
Analysis of extra-chromosomal scaffolds
The proteins predicted on the extra-chromosomal scaffolds (i.e., the scaffolds that do not comprise the 14 putative chromosomes) in the M. capitata assembly were compared against the proteins from the chromosomal scaffolds using BLASTp v1.10.1 [58]; the resulting hits were filtered using an e-value cutoff <1 × 10−5. Additional filtering steps were applied to produce 2 sets of hits: for the first (lenient) set, hits were retained if they had a query coverage of >75% and an identity >75%, with the single best (e-value based) top hit kept for each query sequence; for the second (stringent) set, hits were retained if they had a query coverage of >95% and an identity >95%, with the single best (e-value based) top hit kept for each query sequence. The lenient filtered top hits were used to determine if the extra-chromosomal scaffolds tend to encode genes that have similarity to a single, or multiple, chromosomes. For this analysis, only proteins with top hits to the chromosomal scaffolds (i.e., proteins with hits that have an e-value <1 × 10−5, query coverage >75%, and an identity >75%) were considered, and only scaffolds with multiple proteins with top hits were considered.
Data Validation and Quality Control
Montipora capitata genome assemblies
The M. capitata assembly generated in the study (assembly version V3.0; hereinafter the “new” Hawaiian M. capitata genome assembly) has fewer assembled bases (781 vs. 886 Mbp) and scaffolds (1,699 vs. 3,043), as well as a vastly improved N50 (47.7 vs. 0.54 Mbp; Supplementary Table S7), compared to the assembly of the same Hawaiian M. capitata isolate (hereinafter the “old” Hawaiian M. capitata genome assembly) that was previously published by our group [16]. The 14 largest scaffolds in the new assembly, ranging in size from ∼22 to ∼69 Mbp, likely represent the 14 chromosomes predicted in other Montipora species (Fig. 2A, B) [26]. These putative chromosomes total 680 Mbp of assembled sequence, which is only slightly larger than the estimated genome size of 644 Mbp (Fig. 2C; estimated by GenomeScope2 [53] using k-mers of size 21 bp). The estimated genome size of the other published Montipora species is ∼700 Mbp, whereas the estimated genome size of the new Hawaiian M. capitata genome is 644 Mbp (although the assembly is a little larger; see discussion below). This suggests that species in the genus Montipora have genomes that are marginally smaller than 700 Mbp in size.
The M. capitata isolate that was sequenced appears to be a diploid, with a good fit between its k-mer frequency histogram and the theoretical diploid model implemented in GenomeScope2 (black line in Fig. 2C and Supplementary Fig. S1A), and a clear “smudge” (bright yellow region in Supplementary Fig. S1A) of k-mer pairs with a coverage of 2n and a normalized coverage of 1/2, all of which suggests that the sample is diploid. nQuire also predicted that the M. capitata sample was a diploid (i.e., the diploid model had the lowest delta log-likelihood value; Supplementary Table S6), supporting the results of GenomeScope2 and Smudegeplot.
Compared with the old assembly, the new M. capitata assembly has a slightly higher BUSCO completeness for both the Metazoa (from 95.2% to 95.7%, respectively) and Eukaryota (from 97.7% to 99.2%, respectively) datasets but a significantly reduced number of duplicated BUSCO genes for both the Metazoa (from 21.2% to 1.6%, respectively) and Eukaryota (from 22.0% to 1.2%, respectively) datasets (Fig. 3A, B; Supplementary Table S7). The high number of duplicated BUSCO genes in the old assembly is likely a result of haplotigs that were not removed during the assembly process; this problem appears to have been resolved in the new assembly. Compared with the other published Montipora genomes, the new M. capitata assembly is the most contiguous and complete to date, with a significantly higher N50 (47.7 Mbp compared to the next best of 1.2 Mbp in M. efflorescens) and BUSCO completeness (e.g., 99.2% Eukaryota dataset completeness compared to the next best of 92.1% in M. cactus). Because the same PacBio and Illumina libraries were used to construct the new and old assemblies, the significant improvement observed in the new assembly is attributed to the use of a different hybrid assembly approach, combined with the Dovetail Omni-C library preparation and scaffolding with the HiRise (v2.2.0) software.
Pocillopora genome assemblies
The P. acuta genome assembly generated in this study (hereinafter the “Hawaiian P. acuta”) is larger (408 Mbp) than P. acuta from Indonesia (352 Mbp) [22] (Supplementary Table S7) and its estimated genome size of 353 Mbp (Fig. 2E). The size of the P. meandrina genome assembly generated in this study (377 Mbp) is comparable to that in the published Indonesian P. acuta (352 Mbp) [22] and P. verrucosa (381 Mbp) [21] species but is larger than in P. damicornis (234 Mbp) [4] (Supplementary Table S7), although the latter is likely underassembled given its smaller size relative to the estimated genome size for that species. Moreover, the estimated genome sizes for these species appears to be around 330 to 350 Mbp, with the assemblies being 350 to 380 Mbp in size (excluding the Hawaiian P. acuta [see discussion below]). This suggests that species in the genus Pocillopora have genomes that are ∼350 Mbp in size.
The Hawaiian P. acuta isolate that was sequenced is a triploid; the presence of 3 major peaks in the k-mer frequency histogram (at ∼17x, ∼35, and ∼51x), which fit the triploid model implemented by GenomeScope2 (black line Fig. 2E and Supplementary Fig. S1F), and the clear “smudge” (bright yellow region in Supplementary Fig. S1F) of k-mer pairs with a coverage of ∼3n and a normalized coverage of 1/3 suggest that the sample is triploid. nQuire also predicts that the P. acuta is a triploid (Supplementary Table S6), supporting the results of GenomeScope2 and Smudegeplot. For P. meandrina, GenomeScope2 (Fig. 2D), Smudgeplot (Supplementary Fig. S1E), and nQuire (Supplementary Table S6) all predict that the isolate sequenced is a diploid.
The BUSCO completeness of the Hawaiian P. acuta genome is improved for both the Metazoa (96.1%), and Eukaryota (98.5%) datasets compared to the Indonesian P. acuta assembly (89.4% and 91.4%, respectively) and the other Pocillopora assemblies (∼91–95% and 91–98%, respectively; Supplementary Table S7 and Fig. 3A, B). However, the Hawaiian assembly does have a slightly higher proportion of duplicated BUSCO genes (2.5% and 2.0% in the Metazoa and Eukaryota datasets) compared with some (the Indonesian P. acuta and P. damicornis genomes, which have <1% in both datasets) but not all (the P. verrucosa genome, which has 2.9% and 5.5%, respectively) of the published genomes. This is likely a result of the Hawaiian P. acuta being a triploid; haplotig removal programs (i.e., HaploMerger2 [40]) are generally designed for use with diploid species; therefore, it is unsurprising that they were unable to fully resolve the assembly given the added complexity associated with resolving assemblies of higher ploidy genomes. Regardless, the Hawaiian P. acuta assembly is more contiguous (i.e., higher N50 and fewer scaffolds) than the other Pocillopora genomes and is the first assembly generated from a nondiploid coral. The P. meandrina genome has a BUSCO completeness (96.1% for the Metazoa and 98.8% for the Eukaryota datasets) that is just as high as the Hawaiian P. acuta genome but with fewer duplicated BUSCO genes (1.2% and 0.4%, respectively), suggesting that this assembly has minimal retained haplotigs (Supplementary Table S7 and Fig. 3A, B).
Porites compressagenome assembly
The size of the P. compressa genome assembly generated in this study (593 Mbp) is similar to the published P. australiensis (576 Mbp) [24] and P. lutea (552 Mbp) [23] genomes and a little smaller than P. astreoides (677 Mbp). The estimated genome sizes for these species appear to be around 525 to 550 Mbp (excluding P. astreoides, P. lutea, and P. rus), with the assemblies coming in at around 550 to 600 Mbp. The high number of duplicated BUSCO genes in the P. astreoides assembly (11.5% and 14.9% for the Metazoa and Eukaryota datasets, respectively; Supplementary Table S7 and Fig. 3A, B) suggests that its larger assembly size (compared with the other Porites species) is likely explained by retained haplotigs. The genome assembly (470 Mbp) and estimated genome size (405 Mbp) of P. rus are smaller than the other Porites isolates, but these data were generated from holobiont samples (i.e., samples with coral, algal symbiont, and associated bacteria DNA present) using a metagenomic binning strategy. The difference in this approach compared with how the other Porites genomes were processed likely explains the difference between the sizes. P. lutea has an estimated genome size of 694 Mbp, which is significantly larger than the other Porites species and its assembled genome. Although this suggests that the P. lutea genome is underassembled (comprising only ∼80% of the estimated genome), its relatively high completeness (95.3% and 98.5% for the Metazoa and Eukaryota datasets, respectively) suggests that the genome size has been overestimated, possibly driven by sequencing error or other factors associated with sample preparation or collection from the field. These results indicate that species in the genus Porites have genomes that are just under 600 Mbp in size. For P. compressa, GenomeScope2 (Fig. 2F), Smudgeplot (Supplementary Fig. S1I), and nQuire (Supplementary Table S6) all predict that the isolate sequenced is a diploid.
The BUSCO completeness of the P. compressa assembly is slightly higher (95.5% for the Metazoa and 99.2% for the Eukaryota datasets) compared to the P. astreoides (93.2% and 98.0%, respectively), P. australiensis (91.6% and 94.9%, respectively), P. lutea (95.3% and 98.5%, respectively), and P. rus (69.6% and 67.1%, respectively) assemblies (Supplementary Table S7 and Fig. 3A, B) but has a much higher N50 (4 Mbp) compared to the published species (0.41, 0.55, 0.66, and 0.14 Mbp, respectively) and fewer scaffolds (608 vs. 3,051, 4,983, 2,975, and 14,982, respectively). The published genome assemblies also have many more gaps (∼0–29% of assembled bases are “N” characters) compared to P. compressa (0%), demonstrating that the new assembly is of equally high completeness compared to the published species but with a much higher contiguity.
Predicted protein-coding genes
For M. capitata, 54,384 protein-coding genes were predicted in the new assembly compared with 63,227 predicted in the old version (Supplementary Table S7). In the new assembly, 56.68% of the predicted protein-coding genes were assigned putative functions using CD-Search, 44.26% using eggNOG-mapper, and 21.20% using KAAS (Supplementary Table S8). The reduction in the number of predicted genes in the new M. capitata assembly, compared with the published version, is likely driven by its reduced assembly size, with many of the missing genes likely arising from haplotigs retained in the old assembly that were removed in the new version. The BUSCO completeness of the predicted genes is improved in the new assembly (95.2% of the Metazoa and 96.5% for the Eukaryota BUSCO datasets; Fig. 3C, D) compared with the old assembly (94.0% and 93.3%, respectively), and the number of duplicated BUSCO genes is reduced in the new assembly (2.3% and 1.2%, respectively) compared to the published (18.2% and 18.8%, respectively). The predicted gene set from the new Hawaiian M. capitata assembly also has >4.2% and >3.5% more complete BUSCO genes (from the Metazoa and Eukaryota datasets, respectively) recovered compared to the other published isolates, demonstrating that the gene models predicted in the new assembly are also highly complete. Although an increase in the number of genes predicted in the new Hawaiian M. capitata genome, compared with the published species, could be attributed to differences in the workflows used to predicted the genes in these species [28], it is also likely driven by the higher completeness and contiguity of the new genome assembly.
There are 33,730 predicted protein-coding genes in the Hawaiian P. acuta and 31,840 in the P. meandrina genome assemblies, which is ∼4,000 to 8,000 more than predicted in other Pocillopora species (Supplementary Table S7). In P. acuta, 67.76% of the predicted protein-coding genes were assigned putative functions using CD-Search, 49.76% using eggNOG-mapper, and 32.35% using KAAS, and in P. meandrina, 69.44% of the predicted protein-coding genes were assigned putative functions using CD-Search, 51.76% using eggNOG-mapper, and 33.66% using KAAS (Supplementary Table S8). The number of complete BUSCO genes from the Metazoa and Eukaryota BUSCO datasets is >6% higher in the new Hawaiian P. acuta and P. meandrina species than in the other Pocillopora species; the Hawaiian P. acuta also has 29.6% and 31.3% (respectively) more complete BUSCO genes recovered than the Indonesian P. acuta (Supplementary Table S7; Fig. 3C, D). The number of duplicated BUSCO genes is >0.7% and >2.3% (respectively) higher in the Hawaiian P. acuta gene set compared with the published Pocillopora species, but this was expected given the increased size of the genome assembly. The proportion of fragmented BUSCO genes is >0.9% and >2% lower (Metazoa and Eukaryota BUSCO datasets, respectively) in the Hawaiian Pocillopora species compared with the published species. The average transcript length and the number of CDSs per transcript of the Hawaiian Pocillopora genes (∼1,350 bp and ∼6.6, respectively) are congruent with the predicted genes of the published Pocillopora species (∼1,100–1,900 bp and ∼5.5–7.5, respectively). This suggests that the higher number of predicted genes in the Hawaiian Pocillopora species is not caused by the presence of haplotigs in the genome assembly, although this likely contributes to the slightly higher number of duplicated BUSCO genes in the Hawaiian P. acuta, or by the presence of fragmented genes models, because the number of fragmented BUSCO genes and the gene statistics suggest that the majority are full length. Therefore, the higher number of predicted genes in this species can be (at least partially) attributed to the more complete and contiguous genome assemblies of the Hawaiian Pocillopora species relative to published species.
There are 44,130 predicted protein-coding genes in the Hawaiian P. compressa genome assembly (Supplementary Table S7), which is >8,000 more genes than predicted in the P. australiensis (35,910) and P. lutea (31,126) genomes, 4,677 more than in the P. rus (39,453) genome, and 20,506 less than in the P. astreoides (64,636) genome. In P. compressa, 63.91% of the predicted protein-coding genes were assigned putative functions using CD-Search, 46.22% using eggNOG-mapper, and 27.48% using KAAS (Supplementary Table S8). The number of complete BUSCO genes from the Metazoa and Eukaryota BUSCO datasets is >4% higher in P. compressa than in the published Porites species (Supplementary Table S7; Fig. 3C, D). The number of duplicated BUSCO genes in P. compressa is similar to P. lutea and P. rus but lower than in P. astreoides and P. australiensis, and the number of fragmented BUSCO genes in P. compressa is much lower (>1.9% and >5.1%, respectively) than in the published species. As with the previous Hawaiian genomes, we attribute the higher number of predicted genes in this species to a more complete and contiguous assembly, relative to the published data.
Similarity between Montipora capitata chromosomal and extra-chromosomal scaffolds
There are 1,685 scaffolds (totaling ∼101 Mbp) in the new M. capitata assembly that were not placed into the 14 putative chromosomes by the scaffolding software. Given that the size of the 14 chromosomal sequences totals ∼680 Mbp, which is close to the estimated genome size of 644 Mbp, it is possible that the extra-chromosomal sequences represent retained haplotigs. To explore this issue, we compared the predicted genes in the extra-chromosomal (6,545 protein-coding genes) and chromosomal (47,839) scaffolds to determine how similar the protein content is between the 2 sets of scaffolds and to see if the extra-chromosomal proteins tend to be contained within a single chromosome, suggesting that they are likely to be retained haplotigs. Out of the 6,546 proteins encoded in the extra-chromosomal scaffolds, 3,896 (59.53%) have hits to chromosomal proteins with >75% query coverage and >75% identity, and 1,623 (24.80%) have hits to chromosomal proteins with >95% query coverage and >95% identity. This suggests that whereas the 2 sets of scaffolds encode many similar (although not identical) proteins, the protein inventory of the extra-chromosomal scaffolds only partially overlaps with the gene inventory of the chromosomal scaffolds (we would expect them to have a high level of overlap if they were haplotigs). Furthermore, the extra-chromosomal scaffolds encode 12% of the total predicted genes but, when analyzed separately using BUSCO, have only 1.9% of the Metazoa and 1.6% of the Eukaryota BUSCO genes recovered. This conflict between the number of predicted genes in the scaffolds and the number of BUSCO genes suggests that these scaffolds cannot be easily explained as unresolved haplotigs. Finally, of the 3,896 proteins with top hits in the leniently filtered dataset (hit with >75% query coverage and >75% identity), 2,748 (70.53%) were on scaffolds with other proteins with top hits to different chromosomes. This suggests that the extra-chromosomal scaffolds have significant structural differences when compared to the chromosomes. These results suggest that the extra-chromosomal scaffolds do not comprise retained haplotigs, but given their significant size, which increases the assembly size well above the estimated size, additional analyses will need to be done to determine the placement of these sequences in the chromosomes and the genes they encode.
Potential Implications
The substantial improvement in the contiguity and completeness of the assemblies and predicted genes from the Hawaiian M. capitata, P. meandrina, P. acuta, and P. compressa species will enable many follow-up studies. The chromosome-level assembly of the M. capitata isolate will not only serve as a key reference genome for future population studies focusing on this species in Hawaii but also enable more detailed studies on genome content (such as repeats), gene content, and gene synteny with other species from reefs across the world. The P. acuta genome, although not at chromosome-level resolution, is the most complete available for this genus and will be a valuable model for not only comparative analysis but also analysis of ploidy in corals. As the first assembly ever generated from a nondiploid coral, these data will open up new questions surrounding the role of ploidy in coral evolution and adaptation and how this phenomenon is involved in the life cycle of this species and potentially other Pocillopora species, in both Hawaiʻi and other reefs across the world. These questions are critical, because an understanding of how changes in ploidy evolve in these corals, particularly in response to stress, will help us model the response of these ecosystems to anthropogenic climate change and may even provide a new avenue of research for the development of stress-resistant “super” corals.
Data Availability
The SRA Run IDs of the Omni-C data generated from the Hawaiian M. capitata, the PacBio and Illumina genome data used for genome assembly, and the RNA-seq data used for gene prediction are listed in Supplementary Table S1 for each species. The genome assemblies, predicted genes, and functional annotations are available at Rutgers's website for the Hawaiian M. capitata [59], P. acuta [60], P. meandrina [61], and P. compressa [62]. The data from the other Montipora, Pocillopora, and Porites species used in this study are available from their respective repositories listed in Supplementary Table S5. Supporting data and materials are available in the GigaDB database [63], with individual datasets for M. capitata [64], P. acuta [65], P. meandrina [66], and P. compressa [67].
Additional Files
Supplementary Fig. S1. GenomeScope2 (left) and Smudgeplot (right) profiles for (A) Hawaiian M. capitata (this study), (B) Waiopae tide pools M. capitata, (C) M. cactus, (D) M. efflorescens, (E) P. meandrina (this study), (F) Hawaiian P. acuta (this study), (G) Indonesian P. acuta, (H) P. verrucose, (I) P. compressa (this study), (J) P. australiensis, and (K) P. lutea. The profiles were computed for each species using 21-mers generated from the trimmed short-read data listed in Supplementary Table S5.
Supplementary Table S1. Summary of read data used for genome assembly and gene prediction.
Supplementary Table S2. Summary of coral assemblies before and after haplotype merging.
Supplementary Table S3. List of Symbiodiniaceae genomes used to assess symbiont contamination in the coral genome assemblies.
Supplementary Table S4. Top 10 BLASTn hits against the NCBI's nt database for regions of coral scaffolds with greater than a given coverage of hits to Symbiodiniaceae assembled genomes.
Supplementary Table S5. Metadata for the genome and gene models downloaded for the coral species used for comparative analysis.
Supplementary Table S6. Results from nQuire lrdmodel ploidy estimation for the Hawaiian coral genomes analyzed in this study.
Supplementary Table S7. Comparison between the published Montipora, Pocillopora, and Porites genomes and those generated in this study. All statistics were calculated in this study using the available genome and gene models.
Supplementary Table S8. Number of predicted protein-coding genes in each of the new Hawaiian coral genomes with functional annotations.
Abbreviations
bp: base pairs; BUSCO: Benchmarking Universal Single-Copy Orthologs; Gbp: gigabase pairs; HM2: HaploMerger2; KAAS: KEGG Automatic Annotation Server; Kbp: kilobase pairs; Mbp: megabase pairs; NCBI: National Center for Biotechnology Information; PacBio: Pacific BioSciences; RNA-seq: RNA sequencing; SRA: Sequencing Read Archive.
Funding
This work was supported by the National Science Foundation grant NSF-OCE 1756616, the Catalyst Science Fund grant 2020–008, the National Institute of Food and Agriculture and United States Department of Agriculture (USDA) Hatch grant NJ01180, and the National Aeronautics and Space Administration grant 80NSSC19K0462 awarded to D.B. D.B. and H.S.Y. were also supported by the Collaborative Genome Program of the Korea Institute of Marine Science and Technology Promotion (KIMST) funded by the Ministry of Oceans and Fisheries (MOF) (20180430). J.M.L. was supported by research grants from the National Research Foundation of Korea (NRF, 2020R1C1C1010193) and “The Project to Make Multi-Ministerial National Biological Research Resources More Advanced” program through the Korea Environment Industry & Technology Institute (KEITI) funded by the Korea Ministry of Environment[MOE (2021003420004)]. H.M.P. was supported by the USDA National Institute of Food and Agriculture, Hatch Formula project accession number 1017848 and the National Science Foundation grant NSF-OCE 1756623. E.M. was supported by the Paul G. Allen Family Foundation.
Authors' Contributions
D.B. conceived the project with H.M.P. and J.M.L. T.G.S., J.M.L., and Y.J.J. did the bioinformatic analyses. H.S.Y. provided sequencing resources, and H.M.P. led the coral sample collection and processing with E.M. T.G.S. wrote the manuscript draft with J.M.L., and all authors commented on and approved the submitted version.
Competing Interests
The authors declare that they have no other competing interests.
Supplementary Material
ACKNOWLEDGEMENTS
We acknowledge support from the research and administrative staff at the Hawaiʻi Institute of Marine Biology, where some of this work was done.
Contributor Information
Timothy G Stephens, Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08901, USA.
JunMo Lee, Department of Oceanography, Kyungpook National University, Daegu, Buk-gu 41566, Korea.
YuJin Jeong, Department of Oceanography, Kyungpook National University, Daegu, Buk-gu 41566, Korea.
Hwan Su Yoon, Department of Biological Sciences, Sungkyunkwan University, Suwon 16419, Korea.
Hollie M Putnam, Department of Biological Sciences, University of Rhode Island, Kingston, RI 02881, USA.
Eva Majerová, Hawaiʻi Institute of Marine Biology, Kāneʻohe, HI 96744, USA.
Debashish Bhattacharya, Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08901, USA.
References
- 1. van Oppen MJH, Koolmees EM, Veron JEN. Patterns of evolution in the scleractinian coral genus Montipora (Acroporidae). Mar Biol. 2004;144(1):9–18. [Google Scholar]
- 2. Forsman ZH, Concepcion GT, Haverkort RD, et al. Ecomorph or endangered coral? DNA and microstructure reveal Hawaiian species complexes: Montipora dilatata/flabellata/turgescens & M. patula/verrilli. PLoS One. 2010;5(12):e15021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Schmidt-Roach S, Miller KJ, Lundgren P, et al. With eyes wide open: a revision of species within and closely related to the Pocillopora damicornis species complex (Scleractinia; Pocilloporidae) using morphology and genetics. Zool J Linn Soc. 2014;170(1):1–33. [Google Scholar]
- 4. Cunning R, Bay RA, Gillette P, et al. Comparative analysis of the Pocillopora damicornis genome highlights role of immune system in coral evolution. Sci Rep. 2018;8(1):16134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Williams A, Pathmanathan JS, Stephens TG, et al. Multi-omic characterization of the thermal stress phenome in the stony coral Montipora capitata. PeerJ. 2021;9:e12335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Mayfield AB, Chen YJ, Lu CY, et al. The proteomic response of the reef coral Pocillopora acuta to experimentally elevated temperatures. PLoS One. 2018;13(1):e0192001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Henley EM, Quinn M, Bouwmeester J, et al. Reproductive plasticity of Hawaiian Montipora corals following thermal stress. Sci Rep. 2021;11(1):12525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Putnam HM, Davidson JM, Gates RD. Ocean acidification influences host DNA methylation and phenotypic plasticity in environmentally susceptible corals. Evol Appl. 2016;9(9):1165–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Jury CP, Delano MN, Toonen RJ. High heritability of coral calcification rates and evolutionary potential under ocean acidification. Sci Rep. 2019;9(1):20419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Padilla-Gamino JL, Pochon X, Bird C, et al. From parent to gamete: vertical transmission of Symbiodinium (Dinophyceae) ITS2 sequence assemblages in the reef building coral Montipora capitata. PLoS One. 2012;7(6):e38440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Damjanovic K, Menendez P, Blackall LL, et al. Mixed-mode bacterial transmission in the common brooding coral Pocillopora acuta. Environ Microbiol. 2020;22(1):397–412. [DOI] [PubMed] [Google Scholar]
- 12. Cunha RL, Forsman ZH, Belderok R, et al. Rare coral under the genomic microscope: timing and relationships among Hawaiian Montipora. BMC Evol Biol. 2019;19(1):153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Johnston EC, Forsman ZH, Flot JF, et al. A genomic glance through the fog of plasticity and diversification in Pocillopora. Sci Rep. 2017;7(1):5991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Aurelle D, Pratlong M, Oury N, et al. Population genomics of Pocillopora corals: insights from RAD-sequencing. bioRxiv. 2021. doi: 10.1101/2021.05.04.442552. [DOI] [Google Scholar]
- 15. Caruso C, de Souza MR, Ruiz-Jones L, et al. Genetic patterns in Montipora capitata across an environmental mosaic in Kāne'ohe Bay, O'ahu, Hawai'i. Molecular Ecology. 2022;31(20):5201–5213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Shumaker A, Putnam HM, Qiu H, et al. Genome analysis of the rice coral Montipora capitata. Sci Rep. 2019;9(1):2571. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Shinzato C, Khalturin K, Inoue J, et al. Eighteen coral genomes reveal the evolutionary origin of Acropora strategies to accommodate environmental changes. Mol Biol Evol. 2021;38(1):16–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Helmkampf M, Bellinger MR, Geib S, et al. Draft genome of the rice coral Montipora capitata obtained from linked-read sequencing. Genome Biol Evol. 2019;11(7):2045–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Yuki Y, Go S, Yuna Z, et al. Comparative genomics highlight the importance of lineage-specific gene families in evolutionary divergence of the coral genus, Montipora. BMC Ecol Evol. 2021;22(1):71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Parra G, Bradnam K, Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23(9):1061–7. [DOI] [PubMed] [Google Scholar]
- 21. Buitrago-Lopez C, Mariappan KG, Cardenas A, et al. The genome of the cauliflower coral Pocillopora verrucosa. Genome Biol Evol. 2020;12(10):1911–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Vidal-Dupiol J, Chaparro C, Pratlong M, et al. Sequencing, de novo assembly and annotation of the genome of the scleractinian coral, Pocillopora acuta. bioRxiv. 2020. 698688. doi: 10.1101/698688. [DOI] [Google Scholar]
- 23. Robbins SJ, Singleton CM, Chan CX, et al. A genomic view of the reef-building coral Porites lutea and its microbial symbionts. Nat Microbiol. 2019;4(12):2090–100. [DOI] [PubMed] [Google Scholar]
- 24. Shinzato C, Takeuchi T, Yoshioka Y, et al. Whole-genome sequencing highlights conservative genomic strategies of a stress-tolerant, long-lived scleractinian coral, Porites australiensis Vaughan, 1918. Genome Biol Evol. 2021;13(2):evab270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Wong KH, Putnam HM. The genome of the mustard hill coral, Porites astreoides. GIGAbyte. 2022;2022:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Kenyon JC. Models of reticulate evolution in the coral genus Acropora based on chromosome numbers: Parallels with plants. Evolution. 1997;51(3):756–67. [DOI] [PubMed] [Google Scholar]
- 27. Nurk S, Walenz BP, Rhie A, et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 2020;30(9):1291–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Chen YB, Gonzalez-Pech RA, Stephens TG, et al. Evidence that inconsistent gene prediction can mislead analysis of dinoflagellate genomes. J Phycol. 2020;56(1):6–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Walker BJ, Abeel T, Shea T, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9(11):e112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Marcais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27(6):764–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Dougan KE, Bellantuono AJ, Kahlke T, et al. Whole-genome duplication in an algal symbiont serendipitously confers thermal tolerance to corals. bioRxiv. 2022. doi: 10.1101/2022.04.10.487810. [DOI] [Google Scholar]
- 33. Li T, Yu L, Song B, et al. Genome improvement and core gene set refinement of Fugacium kawagutii. Microorganisms. 2020;8(1):102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. González-Pech RA, Stephens TG, Chen Y, et al. Comparison of 15 dinoflagellate genomes reveals extensive sequence and structural divergence in family Symbiodiniaceae and genus Symbiodinium. BMC Biol. 2021;19(1):73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Nand A, Zhan Y, Salazar OR, et al. Genetic and spatial organization of the unusual chromosomes of the dinoflagellate Symbiodinium microadriaticum. Nat Genet. 2021;53(5):618–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Quinlan AR, Hall IM. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27(2):573–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Bao Z, Eddy SR. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002;12(8):1269–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Price AL, Jones NC, Pevzner PA. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21(Suppl 1):i351–8. [DOI] [PubMed] [Google Scholar]
- 40. Huang S, Kang M, Xu A. HaploMerger2: rebuilding both haploid sub-assemblies from high-heterozygosity diploid genome assembly. Bioinformatics. 2017;33(16):2577–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Simao FA, Waterhouse RM, Ioannidis P, et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–2. [DOI] [PubMed] [Google Scholar]
- 42. Haas BJ, Papanicolaou A, Yassour M, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013;8(8):1494–512. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Grabherr MG, Haas BJ, Yassour M, et al. Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Dobin A, Davis CA, Schlesinger F, et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Bruna T, Hoff KJ, Lomsadze A, et al. BRAKER2: Automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform. 2021;3(1):lqaa108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Lomsadze A, Burns PD, Borodovsky M. Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res. 2014;42(15):e119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Stanke M, Keller O, Gunduz I, et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Research. 2006;34(Web Server):W435–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Marchler-Bauer A, Bo Y, Han L, et al. CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 2017;45(D1):D200–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Huerta-Cepas J, Forslund K, Coelho LP, et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-Mapper. Mol Biol Evol. 2017;34(8):2115–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Moriya Y, Itoh M, Okuda S, et al. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 2007;35(Web Server):W182–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Celis JS, Wibberg D, Ramirez-Portilla C, et al. Binning enables efficient host genome reconstruction in cnidarian holobionts. Gigascience. 2018;7(7):giy075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Pertea G, Pertea M. GFF Utilities: GffRead and GffCompare [version 1; peer review: 2 approved]. F1000Research. 2020;9:304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Ranallo-Benavidez TR, Jaron KS, Schatz MC. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 2020;11(1):1432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal. 2011;17(1):3. [Google Scholar]
- 55. Weiss CL, Pais M, Cano LM, et al. nQuire: A statistical framework for ploidy estimation using next generation sequencing. BMC Bioinf. 2018;19(1):122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Danecek P, Bonfield JK, Liddle J, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10(2):giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Manni M, Berkeley MR, Seppey M, et al. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 2021;38(10):4647–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Camacho C, Coulouris G, Avagyan V, et al. BLAST+: Architecture and applications. BMC Bioinf. 2009;10(1):421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Genome data for Montipora capitata. http://cyanophora.rutgers.edu/montipora/ (Version 3). Accessed July 18 2022. [Google Scholar]
- 60. Genome data for Pocillopora acuta. http://cyanophora.rutgers.edu/Pocillopora_acuta/ (Version 2). Accessed July 18 2022. [Google Scholar]
- 61. Genome data for Pocillopora meandrina. http://cyanophora.rutgers.edu/Pocillopora_meandrina/ (Version 1). Accessed July 18 2022. [Google Scholar]
- 62. Genome data for Porites compressa. http://cyanophora.rutgers.edu/porites_compressa/ (Version 1). Accessed July 18 2022. [Google Scholar]
- 63. Stephens TG, Lee J, Jeong Y, et al. Supporting data for “High-quality genome assemblies from key Hawaiian coral species” GigaScience Database. 2022. 10.5524/102259. [DOI] [PMC free article] [PubMed]
- 64. Stephens TG, Lee J, Jeong Y, et al. Chromosome-level genome assembly of Montipora capitata. GigaScience Database. 2022. 10.5524/102268 [DOI]
- 65. Stephens TG, Lee J, Jeong Y, et al. Genome assembly of a triploid Pocillopora acuta. GigaScience Database. 2022. 10.5524/102269 [DOI]
- 66. Stephens TG, Lee J, Jeong Y, et al. Genome assembly of Pocillopora meandrina. GigaScience Database. 2022. 10.5524/102270 [DOI]
- 67. Stephens TG, Lee J, Jeong Y, et al. Genome assembly of Porites compressa. GigaScience Database. 2022. 10.5524/102271 [DOI]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Stephens TG, Lee J, Jeong Y, et al. Supporting data for “High-quality genome assemblies from key Hawaiian coral species” GigaScience Database. 2022. 10.5524/102259. [DOI] [PMC free article] [PubMed]
- Stephens TG, Lee J, Jeong Y, et al. Chromosome-level genome assembly of Montipora capitata. GigaScience Database. 2022. 10.5524/102268 [DOI]
- Stephens TG, Lee J, Jeong Y, et al. Genome assembly of a triploid Pocillopora acuta. GigaScience Database. 2022. 10.5524/102269 [DOI]
- Stephens TG, Lee J, Jeong Y, et al. Genome assembly of Pocillopora meandrina. GigaScience Database. 2022. 10.5524/102270 [DOI]
- Stephens TG, Lee J, Jeong Y, et al. Genome assembly of Porites compressa. GigaScience Database. 2022. 10.5524/102271 [DOI]
Supplementary Materials
Data Availability Statement
The SRA Run IDs of the Omni-C data generated from the Hawaiian M. capitata, the PacBio and Illumina genome data used for genome assembly, and the RNA-seq data used for gene prediction are listed in Supplementary Table S1 for each species. The genome assemblies, predicted genes, and functional annotations are available at Rutgers's website for the Hawaiian M. capitata [59], P. acuta [60], P. meandrina [61], and P. compressa [62]. The data from the other Montipora, Pocillopora, and Porites species used in this study are available from their respective repositories listed in Supplementary Table S5. Supporting data and materials are available in the GigaDB database [63], with individual datasets for M. capitata [64], P. acuta [65], P. meandrina [66], and P. compressa [67].