Skip to main content
Journal of Heredity logoLink to Journal of Heredity
. 2022 Sep 8;113(6):699–705. doi: 10.1093/jhered/esac050

Reference genome assembly of the sunburst anemone, Anthopleura sola

Brendan H Cornwell 1,, Eric Beraut 2, Colin Fairbairn 3, Oanh Nguyen 4, Mohan P A Marimuthu 5, Merly Escalona 6, Erin Toffelmier 7,8
Editor: Kenneth Olsen
PMCID: PMC9709963  PMID: 36074002

Abstract

The sunburst anemone Anthopleura sola is an abundant species inhabiting the intertidal zone of coastal California. Historically, this species has extended from Baja California, Mexico to as far north as Monterey Bay, CA. However, recently the geographic range of this species has expanded to Bodega Bay, CA, possibly as far north as Salt Point, CA. This species also forms symbiotic partnerships with the dinoflagellate Breviolum muscatinei, a member of the family Symbiodiniaceae. These partnerships are analogous to those formed between tropical corals and dinoflagellate symbionts, making A. sola an excellent model system to explore how hosts will (co)evolve with novel symbiont populations they encounter as they expand northward. This assembly will serve as the foundation for identifying the population genomic patterns associated with range expansions, and will facilitate future work investigating how hosts and their symbiont partners will evolve to interact with one another as geographic ranges shift due to climate change.

Keywords: Anthopleura sola, California Conservation Genomics Project, CCGP, range expansion, symbiosis

Introduction

The sunburst anemone, Anthopleura sola is a large, solitary anemone inhabiting the intertidal zone of the Pacific coast from Baja California, Mexico to central California (Fig. 1). Within the past half century, the geographic range of this species has expanded northward (Denny and Gaines 2007), likely ending between Bodega Bay, CA and Salt Point (Mendocino County), CA (BHC, pers. obs.). Northward expansions of species historically relegated to more equatorial latitudes along the California coast have been documented during temporary periods of increased temperatures near geographic range edges (Sanford et al. 2019), but have nonetheless prompted researchers to begin assessing how these populations will evolve to match novel geographic locations they encounter during longer-term range expansions. A particularly important feature of the A. sola expansion is that they are likely encountering novel symbiont populations that historically have only interacted with 2 other symbiotic species that are members of the genus Anthopleura, A. xanthogrammica and A. elegantissma whose geographic ranges extend to Alaska. Previous work has shown that these symbionts are shared between these 3 species—with the exception of the southernmost populations of A. sola and A. elegantissima, where symbionts are partitioned by host species (Cornwell and Hernández 2021). As A. sola continues to move northward, interactions between newly arriving hosts and naive symbiont populations will become more common, which will allow researchers to identify patterns of molecular and physiological coevolution in both partners as geographic ranges shift with climate change. Because this symbiosis is analogous to the partnership between tropical corals and dinoflagellates, characterizing how novel symbiotic partnerships evolve along the California coast will have global implications.

Fig. 1.

Fig. 1.

Anthopleura sola polyp in sandy habitat (image credit: B. Cornwell).

Well-assembled genomes are an important tool for identifying genomic patterns associated with range expansions and coevolution with symbiont partners. A. sola exhibits little population structure across its geographic range with no evidence for historical population bottlenecks, which likely contributes to the highest average level of heterozygosity of the 3 symbiotic species of Anthopleura on the Pacific coast of North America (π = 0.0095; Cornwell and Hernández 2021). A draft assembly of A. sola has already been generated without long reads, resulting in a contig N50 of 5,224 bp and scaffold N50 of 16,096 with a total estimated genome size of 434 Mb (Cornwell 2020). Here, we present a new assembly for A. sola which substantially improves on previously published versions, and creates a new resource for marine scientists studying how marine populations will evolve as their geographic ranges shift with warming conditions.

Methods

Biological materials

DNA was extracted from a single A. sola polyp collected in Pacific Grove, CA by Brendan Cornwell (36.621707, −121.904580). All tissue for sequencing was preserved by snap freezing in liquid Nitrogen and shipping on dry ice for the University of California, Davis and the University of California, Santa Cruz, the remaining tissue was preserved in ethanol (Fig. 1).

Nucleic acid library preparation and sequencing

High molecular weight (HMW) genomic DNA (gDNA) was extracted from 60 mg of body wall tissue using the Nanobind Tissue Big DNA kit as per the manufacturer’s instructions (Pacific BioSciences—PacBio, Menlo Park, CA) with the following modifications. After the second resuspension step, we pelleted the tissue homogenate by centrifuging at 16,000 × g (4 °C for 5 min) to remove the residual wash buffer and performed the lysis step with 1.5× reaction volume. The DNA purity was estimated using absorbance ratios (260/280 = 1.84 and 260/230 = 2.28) on the NanoDrop ND-1000 spectrophotometer. The final DNA yield (334 ng/µL; 32 µg) was quantified using the Quantus Fluorometer (QuantiFluor ONE dsDNA Dye assay, Promega, Madison, WI). The size distribution of the HMW DNA was estimated using the Femto Pulse system (Agilent, Santa Clara, CA) and found that 70% of the fragments were 120 kb or more.

We generated long reads for the assembly using the SMRTbell Express Template Prep Kit v2.0 (PacBio, Cat. #100-938-900). Briefly, HMW DNA was sheared to 15 to 20 kb, cleaned and end-repaired, and finally size selected using the BluePippin system (Sage Science, Beverly, MA; Cat. #BLF7510) to generate a library of fragments greater than 9 kb. The 15 to 20 kb average HiFi SMRTbell library was sequenced at UC Davis DNA Technologies Core (Davis, CA) using 3 8M SMRT cells, Sequel II sequencing chemistry 2.0, and 30-h movies each on a PacBio Sequel II sequencer.

We prepared Omni-C libraries using the Dovertail Omni-C Kit (Dovetail Genomics, Scotts Valley, CA). After grinding tissue with a mortar and pestle under liquid nitrogen, chromatin was fixed in the nucleus and strained through 100 and 40 µm cell strainers. We digested the chromatin using DNaseI to generate Illumina-compatible length distributions, purified the DNA and generated an NGS library using an NEB Ultra II DNA Library Prep kit (New England Biolabs, Ipswich, MA). The library was sequenced at Vincent J. Coates Genomics Sequencing Lab (Berkeley, CA) on an Illumina NovaSeq platform (Illumina, San Diego, CA) to generate approximately 100 million 2 × 150 bp read pairs per GB of genome size.

Nuclear genome assembly

We assembled the genome of A. sola following the CCGP assembly protocol Version 4.0, which produces a high-quality and highly contiguous assembly using PacBio HiFi reads and Omni-C data while minimizing manual curation (outlined on Table 1). Briefly, we removed remnant adapter sequences from the PacBio HiFi reads using HiFiAdapterFilt (Sim et al. 2022) and generated the initial dual or partially phased diploid assembly (http://lh3.github.io/2021/10/10/introducing-dual-assembly) with the filtered PacBio reads and the Omni-C data using HiFiasm (Cheng et al. 2022). We tagged output haplotype 1 as the primary assembly, and output haplotype 2 as the alternate assembly. Next, we scaffolded both assemblies using Omni-C data with SALSA (Ghurye et al. 2017, 2019).

Table 1.

Assembly pipeline and software used.

Assembly Software and optionsa Version
Filtering PacBio HiFi adapters HiFiAdapterFilt Commit 64d1c7b
K-mer counting Meryl (k = 21) 1
Estimation of genome size and heterozygosity GenomeScope 2
De novo assembly (contiging) HiFiasm (Hi-C Mode, –primary, output p_ctg.hap1, p_ctg.hap2) 0.16.1-r375
Scaffolding
 Omni-C scaffolding SALSA (-DNASE, -i 20, -p yes) 2
 Gap closing YAGCloser(-mins 2 -f20 -mcc2 -prt 0.25 -eft 0.2 -pld 0.2) Commit
0e34c3b
Omni-C contact map generation
 Short-read alignment BWA-MEM (-5SP) 0.7.17-r1188
 SAM/BAM processing samtools 1.11
 SAM/BAM filtering pairtools 0.3.0
 Pairs indexing pairix 0.3.7
 Matrix generation cooler 0.8.10
 Matrix balancing hicExplorer (hicCorrectmatrix correct --filterThreshold -2 4) 3.6
 Contact map visualization HiGlass 2.1.11
PretextMap 0.1.4
PretextView 0.1.5
PretextSnapshot 0.0.3
Organelle assembly
 Mitogenome assembly MitoHiFi (-r, -p 50, -o 1) 2 commit
c06ed3e
Genome quality assessment
 Basic assembly metrics QUAST (--est-ref-size) 5.0.2
 Assembly completeness BUSCO (-m geno, -l metazoa) 5.0.0
Merqury 2020-01-29
Contamination screening
 Local alignment tool BLAST+ 2.1
 General contamination screening BlobToolKit 2.3.3

Software citations are listed in the text.

aOptions detailed for nondefault parameters

We generated the Omni-C contact maps for both assemblies by aligning the Omni-C data against the corresponding assemblies with BWA-MEM (Li 2013), identified ligation junctions, and generated Omni-C pairs using pairtools (Goloborodko et al. 2018). We generated a multiresolution Omni-C matrix with cooler (Abdennur and Mirny 2020) and balanced it with hicExplorer (Wolff et al. 2018). We used HiGlass (Kerpedjiev et al. 2018) and the PretextSuite (https://github.com/wtsi-hpag/PretextView; https://github.com/wtsi-hpag/PretextMap; https://github.com/wtsi-hpag/PretextSnapshot) to visualize the contact map. We analyzed the contact maps for major misassemblies, cutting scaffolds at the joins (gaps) where misassemblies were identified. No further joins were made after this step. Using the PacBio HiFi reads and YAGCloser (https://github.com/merlyescalona/yagcloser), we closed some of the remaining gaps generated during scaffolding. We then checked for contamination using the BlobToolKit Framework (Challis et al. 2020). Finally, we trimmed remnants of sequence raptors and mitochondrial contamination identified during the contamination screening performed by NCBI.

Mitochondrial genome assembly

We assembled the mitochondrial genome of A. sola from the PacBio HiFi reads using the reference-guided pipeline MitoHiFi (https://github.com/marcelauliano/MitoHiFi; Allio et al. 2020). The mitochondrial sequence of Anthopleura midori (NCBI:NC_030274.1) was used as the starting sequence. After completion of the nuclear genome, we searched for matches of the resulting mitochondrial assembly sequence in the nuclear genome assembly using BLAST+ (Camacho et al. 2009) and filtered out contigs and scaffolds from the nuclear genome with a percentage of sequence identity >99% and size smaller than the mitochondrial assembly sequence.

Genome size estimation and quality assessment

We generated k-mer counts from the PacBio HiFi reads using meryl (https://github.com/marbl/meryl). The k-mer database was then used in GenomeScope 2.0 (Ranallo-Benavidez et al. 2020) to estimate genome features including genome size, heterozygosity, and repeat content. To obtain general contiguity metrics, we ran QUAST (Gurevich et al. 2013). To evaluate genome quality and completeness we used BUSCO (Manni et al. 2021) with the metazoa ortholog database (metazoa_odb10) which contains 954 genes. Assessment of base level accuracy (QV) and k-mer completeness was performed using the previously generated meryl database and merqury (Rhie et al. 2020). We further estimated genome assembly accuracy via BUSCO gene set frameshift analysis using the pipeline described in Korlach et al. (2017).

Measurements of the size of the phased blocks are based on the size of the contigs generated by HiFiasm on HiC mode. We follow the quality metric nomenclature established by Rhie et al. (2021), with the genome quality code x·y·P·Q·C, where x = log10[contig NG50]; y = log10[scaffold NG50]; P = log10[phased block NG50]; Q = Phred base accuracy QV (quality value); C = % genome represented by the first “n” scaffolds, where n = 28. As there is no karyotype information available for A. sola, and karyotype of ancestral taxa varies (Genome On A Tree; https://goat.genomehubs.org/—search: “tax_name(anthopleura sola)”) we are using an estimated “n”(number of chromosomes) based on scaffold size, and visual inspection of the contact maps. For consistency with nomenclature and literature we are keeping the quality code as is. Quality metrics for the notation were calculated on the primary assembly.

Results

The Omni-C and PacBio HiFi sequencing libraries generated 61.1 million read pairs and 3.02 million reads, respectively. The latter yielded 131.8-fold coverage (N50 read length 15,671 bp; minimum read length 47 bp; mean read length 15,656 bp; and maximum read length 53,294 bp) based on the GenomeScope 2.0 genome size estimation of 240 Mb. Based on PacBio HiFi reads, we estimated 0.274% sequencing error rate and 2.7% heterozygosity rate. The k-mer spectrum based on PacBio HiFi reads shows a bimodal distribution with 2 major peaks at ~62- and ~133-fold coverage, where peaks correspond to homozygous and heterozygous states of a diploid species, respectively (Fig. 2A). The distribution presented in this k-mer spectrum supports that of a high heterozygosity profile.

Fig. 2.

Fig. 2.

(A) K-mer spectra. (B) BlobToolKit Snailplot showing N50 metrics for A. sola assembly ddArcGlau1 and BUSCO scores for the metazoan_adb10 set of orthologs. (C) Contact map of Primary/Alternate assembly. [This will be regenerated once NCBI has given the final approval of a genome].

The final assembly (jaAntSola1) consists of 2 pseudo haplotypes, primary and alternate. The primary assembly has a total length of 288,960,535 bp with contig and scaffold N50 of 2,720,395 and 10,852,815 bp, respectively. We also generated an alternative assembly with similar results in total length (299,680,816 bp), but lower contig and scaffold N50s of 2,204,677 and 8,187,693 bp. The total number of contigs in the primary assembly is 368, which are assembled into 270 scaffolds; the alternate assembly contains about twice the number of assembled contigs and scaffolds: 666 and 556, respectively. The average GC content for both the primary and alternate assemblies is 38%. For the primary assembly, the longest contig was 6,637,320 while the largest scaffold was 23,766,007 bp. The largest contig in the alternative assembly is similarly sized (6,654,740), although the longest scaffold is ca. 4 Mb longer (27,475,960). On average there are 339 gaps per Gb in the primary assembly. The BUSCO scores revealed a high level of completeness for both the primary and alternative assemblies (95.60%), while duplicated (0.70%), fragmented (1.90%), and missing (2.50%) orthologs were uncommon. For full assembly statistics for the primary and alternate assembly, see Table 2.

Table 2.

Sequencing and assembly statistics, and accession numbers.

BioProjects and vouchers CCGP NCBI BioProject PRJNA720569
Genera NCBI BioProject PRJNA766262
Species NCBI BioProject PRJNA794142
NCBI BioSample SAMN24505220
Specimen identification CCGP_MDBC_AS_20200803
NCBI Genome accessions Primary Alternate
Assembly accession JALHLI000000000 JALHLJ000000000
Genome sequences GCA_023349425.1 GCA_023349385.1
Genome sequence PacBio HiFi reads Run 1 PACBIO_SMRT (Sequel II) run: 2M spots, 31.7G bases, 23.1 Gb
Accession SRX15312497
Omni-C Illumina reads Run 2 ILLUMINA (Illumina NovaSeq 6000) runs: 61.2M spots, 18.6G bases, 6.2 Gb
Accession SRX15312498, SRX15312499
Genome Assembly Quality Metrics Assembly identifier (quality codea) jaAntSola1(6.7.P6.Q50.C86)
HiFi read coverageb 131×
Primary Alternate
Number of contigs 368 666
Contig N50 (bp) 2,720,395 2,204,677
Contig NG50b 3,338,572 2,768,812
Longest contigs 6,637,320 6,654,740
Number of scaffolds 270 556
Scaffold N50 10,852,815 8,187,693
Scaffold NG50b 11,525,122 11,129,546
Largest scaffold 23,766,007 27,475,960
Size of final assembly (bp) 288,970,335 299,680,816
Phased block NG50b 3,125,244 2,594,170
Gaps per Gbp (#Gaps) 339 (98) 367 (110)
Indel QV (frameshift) 50.11015651 49.75891136
Base pair QV 57.77 57.3793
Full assembly = 57.5667
K-mer completeness 73.7607 74.1146
Full assembly = 99.0702
BUSCO completeness (metazoa), n = 954 C S D F M
Pc 95.60% 94.90% 0.70% 1.90% 2.50%
Ac 95.60% 94.90% 0.70% 1.90% 2.50%
Organelles 1 partial mitochondrial sequence JALHLI010000270.1

aAssembly quality code x·y·P·Q·C, where x = log10[contig NG50]; y = log10[scaffold NG50]; P = log10[phased block NG50]; Q = Phred base accuracy QV (quality value); C = % genome represented by the first “n” scaffolds, following a karyotype 2n = 28. BUSCO scores. (C)omplete and (S)ingle; (C)omplete and (D)uplicated; (F)ragmented and (M)issing BUSCO genes. n, number of BUSCO genes in the set/database.

bRead coverage and NGx statistics have been calculated based on a genome size of 288 Mb.

c(P)rimary and (A)lternate assembly values.

Discussion

A. sola is one of several species whose geographic range is shifting poleward as ocean temperatures warm, making this assembly integral to future studies aimed at detecting the genomic basis of local adaptation, identifying gene flow across the former, current, and future geographic range of A. sola, and understanding the genetic basis of partner compatibility with the endosymbiont Breviolum muscatinei (an analogous partnership to their tropical coral cousins). Recent work suggests that interactions between host and symbiont genomes play a role in structuring some genetic loci between the 2 partners (Cornwell and Hernández 2021), and this improved assembly will allow for a much higher level resolution—both in terms of marker density and location in the genome—of those genetic interactions.

Within the genus, long-read assemblies using Oxford Nanopore chemistry yielded an assembly of 243 Mb in 5,359 contigs for the sister species A. elegantissima (Dimond et al. 2021), which largely agrees with the estimated size of this assembly—ca. 289Mb. Both of these assemblies are much smaller than the previous estimate for A. sola of 434 Mb (Cornwell 2020), suggesting that assemblies using only short reads may not properly resolve repetitive regions or could erroneously divide contigs or scaffolds that represent the same location in the genome. One reason for this might be the high levels of heterozygosity that are characteristic not just of Anthopleura spp., but of many marine invertebrates, which highlights the value long reads in constructing genome assemblies for highly heterozygous organisms that inhabit ocean environments. Recent assemblies for other cnidarian species (Acropora) recover 14 chromosome-level scaffolds with a larger estimated genome size of 450 to 475 Mb (Fuller et al. 2020; López-Nandam et al. 2021). This assembly is contained within 270 scaffolds, but no clear threshold in the scaffold sizes of this assembly suggest a chromosome number (the amount each scaffold adds to the overall assembly size appears to reach an inflection point at N = 28 but this is far from definitive).

Acknowledgments

PacBio Sequel II library prep and sequencing was carried out at the DNA Technologies and Expression Analysis Cores at the UC Davis Genome Center, supported by National Institutes of Health Shared Instrumentation Grant 1S10OD010786-01. Deep sequencing of Omni-C libraries used the Novaseq S4 sequencing platforms at the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley, supported by NIH S10 OD018174 Instrumentation Grant. We thank the staff at the UC Davis DNA Technologies and Expression Analysis Cores and the UC Santa Cruz Paleogenomics Laboratory for their diligence and dedication to generating high-quality sequence data.

Contributor Information

Brendan H Cornwell, Department of Biology, Hopkins Marine Station of Stanford University, Pacific Grove, CA, United States.

Eric Beraut, Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, United States.

Colin Fairbairn, Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, United States.

Oanh Nguyen, DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California, Davis, CA, United States.

Mohan P A Marimuthu, DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California, Davis, CA, United States.

Merly Escalona, Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States.

Erin Toffelmier, Department of Ecology & Evolutionary Biology, University of California, Los Angeles, CA, United States; La Kretz Center for California Conservation Science, Institute of the Environment and Sustainability, University of California, Los Angeles, CA, United States.

Funding

This study is a contribution of the Marine Networks Consortium (PIs: Michael N. Dawson, Rachael A. Bay) as part of the California Conservation Genomics Project (PI: H. Bradley Shaffer), with funding provided to the University of California by the State of California, State Budget Act of 2019 [UC Award ID RSI-19-690224].

Data availability

Data generated for this study are available under NCBI BioProject PRJNA794142. Raw sequencing data for samples CCGP_MDBC_AS_20200803 (NCBI BioSample SAMN24505220) are deposited in the NCBI Short Read Archive (SRA) under SRX15312497 for PacBio HiFi sequencing data, and SRX15312498 and SRX15312499 for the Omni-C Illumina sequencing data. GenBank accessions for both primary and alternate assemblies are GCA_023349425.1 and GCA_023349385.1; and for genome sequences JALHLI000000000 and JALHLJ000000000. The GenBank organelle genome assembly for the mitochondrial genome is JALHLI010000270.1. Assembly scripts and other data for the analyses presented can be found at the following GitHub repository: www.github.com/ccgproject/ccgp_assembly.

References

  1. Abdennur N, Mirny LA.. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics. 2020;36(1):311–316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Allio R, Schomaker-Bastos A, Romiguier J, Prosdocimi F, Nabholz B, Delsuc F.. MitoFinder: efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics. Mol Ecol Resour. 2020;20(4):892–905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL.. BLAST+: architecture and applications. BMC Bioinf. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Challis R, Richards E, Rajan J, Cochrane G, Blaxter M.. BlobToolKit—interactive quality assessment of genome assemblies. G3 Genes|Genomes|Genetics, 2020;10:1361–1374. doi: 10.1534/g3.119.400908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cheng H, Jarvis ED, Fedrigo O, Koepfli K-P, Urban L, Gemmell NJ, et al. Haplotype-resolved assembly of diploid individuals without parental data, Nat Biotechnol 2022;40:1332–1335. doi: 10.1038/s41587-022-01261-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cornwell BH. Gene flow in the anemone Anthopleura elegantissima limits signatures of local adaptation across an extensive geographic range. Mol Ecol. 2020;2020(29):2550–2566. [DOI] [PubMed] [Google Scholar]
  7. Cornwell BH, Hernández L.. Genetic structure in the endosymbiont Breviolummuscatinei’ is correlated with geographical location, environment and host species. Proc R Soc B. 2021;288:20202896. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Denny M, Gaines S.. Encyclopedia of tidepools and rocky shores. Berkeley (CA): University of California Press; 2007. [Google Scholar]
  9. Dimond JL, Nguyen N, Roberts SB.. DNA methylation profiling of a cnidarian-algal symbiosis using nanopore sequencing. G3. 2021;11(7):jkab148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Fuller ZL, Mocellin VJL, Morris LA, Cantin N, Shepherd J, Sarre L, Peng J, Liao Y, Pickrell J, Andolfatto P.. Population genetics of the coral Acropora millepora: toward genomic prediction of bleaching. Science. 2020;369(6501):eaba4674. [DOI] [PubMed] [Google Scholar]
  11. Ghurye J, Pop M, Koren S, Bickhart D, Chin C-S.. Scaffolding of long read assemblies using long range contact information. BMC Genomics. 2017;18(1):527. doi: 10.1186/s12864-017-3879-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Ghurye J, Rhie A, Walenz BP, Schmitt A, Selvaraj S, Pop M, Phillippy AM, Koren, S.. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLoS Comput Biol. 2019:e1007273. doi: 10.1371/journal.pcbi.1007273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Goloborodko A, Abdennur N, Venev S, hbbrandao, gfudenberg. mirnylab/pairtools: v0.2.0.2018. doi: 10.5281/zenodo.1490831. [DOI]
  14. Gurevich A, Saveliev V, Vyahhi N, Tesler G.. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29(8):1072–1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Kerpedjiev P, Abdennur N, Lekschas F, McCallum C, Dinkla K, Strobelt H, Luber JM, Ouellette SB, Azhir A, Kumar N, et al. HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol. 2018;19(1):125. doi: 10.1186/s13059-018-1486-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Korlach J, Gedman G, Kingan SB, Chin C-S, Howard JT, Audet J-N, Jarvis ED.. De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads. GigaScience. 2017;6(10):1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv, 2013;arXiv:1303.3997, preprint: not peer reviewed. doi: 10.48550/arXiv.1303.3997. [DOI] [Google Scholar]
  18. López-Nandam EH, Albright R, Hanson EA, Sheets EA, Palumbi SR.. Mutations in coral soma and sperm imply lifelong stem cell renewal and cell lineage selection, bioRxiv, 2021, 453148, preprint: not peer reviewed. doi: 10.1101/2021.07.20.453148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Manni M, Berkeley MR, Seppey M, Simao FA, Zdobnov EM.. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes, Mol Biol Evol, 2021;38:4647–4654. doi: 10.1093/molbev/msab199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Ranallo-Benavidez TR, Jaron KS, Schatz MC.. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 2020;11(1):1432. doi: 10.1038/s41467-020-14998-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Uliano-Silva M, Chow W, Fungtammasan A, Kim J, et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature. 2021;592:737–746. doi: 10.1038/s41586-021-03451-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Rhie A, Walenz BP, Koren S, Phillippy AM.. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21(1):245. doi: 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Sanford E, Sones JL, García-Reyes M, et al. Widespread shifts in the coastal biota of northern California during the 2014–2016 marine heatwaves. Sci Rep. 2019;9:4216. doi: 10.1038/s41598-019-40784-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Sim SB, Corpuz RL, Simmonds TJ, Geib SM.. HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly. BMC Genomics. 2022;23:157. doi: 10.1186/s12864-022-08375-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Wolff J, Bhardwaj V, Nothjunge S, Richard G, Renschler G, Gilsbach R, Manke T, Backofen R, Ramírez F, Grüning BA.. Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization. Nucleic Acids Res. 2018;46:W11–W16. doi: 10.1093/nar/gky504. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data generated for this study are available under NCBI BioProject PRJNA794142. Raw sequencing data for samples CCGP_MDBC_AS_20200803 (NCBI BioSample SAMN24505220) are deposited in the NCBI Short Read Archive (SRA) under SRX15312497 for PacBio HiFi sequencing data, and SRX15312498 and SRX15312499 for the Omni-C Illumina sequencing data. GenBank accessions for both primary and alternate assemblies are GCA_023349425.1 and GCA_023349385.1; and for genome sequences JALHLI000000000 and JALHLJ000000000. The GenBank organelle genome assembly for the mitochondrial genome is JALHLI010000270.1. Assembly scripts and other data for the analyses presented can be found at the following GitHub repository: www.github.com/ccgproject/ccgp_assembly.


Articles from Journal of Heredity are provided here courtesy of Oxford University Press

RESOURCES