Skip to main content
Journal of Heredity logoLink to Journal of Heredity
. 2022 Nov 30;113(6):673–680. doi: 10.1093/jhered/esac047

A draft reference genome of the red abalone, Haliotis rufescens, for conservation genomics

Joanna S Griffiths 1,, Ruta M Sahasrabudhe 2, Mohan P A Marimuthu 3, Noravit Chumchim 4, Oanh H Nguyen 5, Eric Beraut 6, Merly Escalona 7, Andrew Whitehead 8
Editor: Jose Lopez
PMCID: PMC9709998  PMID: 36190478

Abstract

Red abalone, Haliotis rufescens, are herbivorous marine gastropods that primarily feed on kelp. They are the largest and longest-lived of abalone species with a range distribution in North America from central Oregon, United States, to Baja California, MEX. Recently, red abalone have been in decline as a consequence of overharvesting, disease, and climate change, resulting in the closure of the commercial fishery in the 1990s and the recreational fishery in 2018. Protecting this ecologically and economically important species requires an understanding of their current population dynamics and connectivity. Here, we present a new red abalone reference genome as part of the California Conservation Genomics Project (CCGP). Following the CCGP genome strategy, we used Pacific Biosciences HiFi long reads and Dovetail Omni-C data to generate a scaffold-level assembly. The assembly comprises 616 scaffolds for a total size of 1.3 Gb, a scaffold N50 of 45.7 Mb, and a BUSCO complete score of 97.3%. This genome represents a significant improvement over a previous assembly and will serve as a powerful tool for investigating seascape genomic diversity, local adaptation to temperature and ocean acidification, and informing management strategies.

Keywords: California Conservation Genomics Project, CCGP, conservation genomics, red abalone

Introduction

Red abalone, Haliotis rufescens, are an invaluable resource that have been used by Native coastal Californians (Rick et al. 2019) and have been commercially and recreationally harvested along the West coast of the United States (Fig. 1). Their ability to grow quickly and reach large sizes has supported a fledgling aquaculture industry (Cook 2016). Red abalone have the broadest geographic range out of all 7 Haliotis species native to California, extending from central Oregon, United States, to Baja California, MEX (Cox 1962). Recently, however, red abalone have been impacted by overharvesting, climate change, and disease, which has resulted in population declines and the closing of the recreational fishery in 2018 (Rogers-Bennett and Catton 2019). Warming ocean temperatures have increased the prevalence of the bacterial disease “withering abalone syndrome,” further contributing to abalone declines. Their main food source, bull kelp, has also declined from sea urchin population explosions and warming temperatures (Rogers-Bennett and Catton 2019). Robust genomic resources for this species would provide an important foundation for genomics-enabled research that may enhance aquaculture production and sustainability, and improve conservation efforts in the wild. For example, wild red abalone populations show varying sensitivities to ocean acidification exposure (Swezey et al. 2020), suggesting that red abalone harbors genetic variation that may provide resilience to changing oceans. Identification of adaptively important genetic variation could be useful to help guide sustainable aquaculture and wild conservation.

Fig. 1.

Fig. 1.

(A) Adult red abalone, Haliotis rufescens. Photo taken by Jackson Gross. (B) Red abalone larvae 74 d postfertilization at 1.6× magnification. Photo taken by Sara Boles.

Here, we report on the genome assembly of red abalone as part of the California Conservation Genomics Project (CCGP), the goal of which is to use a “community genomics” approach to describe patterns across 230 species native to California (Shaffer et al. 2022). Our goal was to generate a reference genome that is an improvement on a previous assembly (Masonbrink et al. 2019), and of quality comparable to other species included in the CCGP. Using the newly generated reference genomes and resequenced individuals from across the state, the CCGP are developing tools to identify important hotspots of genetic diversity for multiple species and providing a framework for informed conservation decisions and management plans. The CCGP have successfully assembled the genomes for a few species, including the black abalone, northwestern pond turtle, and Big Berry Manzanita (Huang et al. 2022; Orland et al. 2022; Todd et al. 2022). With support from CCGP, we generated a scaffold-level assembly using a hybrid de novo approach that combines Hi-C chromatin-proximity and PacBio HiFi long-read sequencing data.

Methods

Biological materials

Adult red abalone from Van Damme State Park, CA (39.269 N, 123.798 W), were collected in 2016 and housed at Bodega Marine Laboratory (University of California Davis). Foot and epipodial clippings (100 to 200 mg) from 2 abalone individuals were collected in July of 2020 and 2021 for DNA extraction and generation of the reference genome.

Nucleic acid library preparation and sequencing

High molecular weight DNA extraction

High molecular weight (HMW) genomic DNA (gDNA) was extracted from 36 mg of the epipodial tissue collected in 2021 using the Nanobind Tissue Big DNA kit (Pacific Biosciences—PacBio, CA) following the manufacturer’s instructions with the following minor modifications. We performed an additional wash of tissue homogenate with the CT buffer and pelleted it by centrifuging at 18,000 × g (4 °C for 5 min) to remove residual buffer before proceeding with the lysis step. The extracted HMW DNA was further purified using the phenol–chloroform extraction method (PacBio). We assessed DNA purity using absorbance ratios (260/280 = 1.83 and 260/230 = 2.09) on a NanoDrop ND-1000 spectrophotometer. The DNA yield (148 ng/µL; 11.1 µg total) was quantified using QuantiFluor ONE dsDNA Dye assay (Promega, WI). We determined the size distribution of the HMW DNA using the Femto Pulse system (Agilent, CA) and found that 83% of the fragments were 100 kb or more.

HiFi library preparation and sequencing

The HiFi SMRTbell library was constructed using the SMRTbell Express Template Prep Kit v2.0 (PacBio, Cat. #100-938-900) according to the manufacturer’s instructions. HMW gDNA was sheared to a target DNA size distribution between 15 and 18 kb. The sheared gDNA was concentrated using 0.45× of AMPure PB beads (PacBio, Cat. #100-265-900) for the removal of single-strand overhangs at 37 °C for 15 min, followed by further enzymatic steps of DNA damage repair at 37 °C for 30 min, end repair and A-tailing at 20 °C for 10 min and 65 °C for 30 min, and ligation of overhang adapter v3 at 20 °C for 60. The SMRTbell library was purified and concentrated with 1× Ampure PB beads (PacBio, Cat. #100-265-900) for nuclease treatment at 37 °C for 30 min and size selection using the BluePippin/PippinHT system (Sage Science, MA; Cat. #BLF7510/HPE7510) to collect fragments greater than 7 to 9 kb. The 15 to 20 kb average HiFi SMRTbell library was sequenced at University of California Davis DNA Technologies Core (Davis, CA) using two 8M SMRT cells, Sequel II sequencing chemistry 2.0, and 30-h movies each on a PacBio Sequel II sequencer.

Omni-C library preparation and sequencing

The Omni-C library was prepared using the Dovetail Omni-C Kit (Dovetail Genomics, CA) according to the manufacturer’s protocol with slight modifications. First, the foot tissue collected in 2020 was thoroughly ground with a mortar and pestle while cooled with liquid nitrogen. Subsequently, chromatin was fixed in place in the nucleus. The suspended chromatin solution was then passed through 100 and 40 µm cell strainers to remove large debris. Fixed chromatin was digested under various conditions of DNase I until a suitable fragment length distribution of DNA molecules was obtained. Chromatin ends were repaired and ligated to a biotinylated bridge adapter followed by proximity ligation of adapter containing ends. After proximity ligation, crosslinks were reversed and the DNA purified from proteins. Purified DNA was treated to remove biotin that was not internal to ligated fragments. An NGS library was generated using an NEB Ultra II DNA Library Prep kit (New England Biolabs, MA) with an Illumina compatible y-adaptor. Biotin-containing fragments were then captured using streptavidin beads. The postcapture product was split into 2 replicates prior to PCR enrichment to preserve library complexity with each replicate receiving unique dual indices. The libraries were sequenced at Vincent J. Coates Genomics Sequencing Lab (Berkeley, CA) on an Illumina NovaSeq platform (Illumina, CA) to generate approximately 100 million 2 × 150 bp read pairs per GB of genome size.

Genome assembly

Nuclear genome assembly

We assembled the red abalone genome following the CCGP assembly protocol Version 4.0, as outlined in Table 1. As with other CCGP assemblies, our goal is to produce a high quality and highly contiguous assembly using PacBio HiFi reads and Omni-C data while minimizing manual curation. Briefly, we removed remnant adapter sequences from the PacBio HiFi dataset using HiFiAdapterFilt (Sim et al. 2022) and obtained the initial dual or partially phased diploid assembly assembly (http://lh3.github.io/2021/10/10/introducing-dual-assembly) with HiFiasm using the filtered HiFi reads and the Omni-C reads (Cheng et al. 2021). We tagged output haplotype 1 as the primary assembly, and output haplotype 2 as the alternate assembly. We scaffolded both assemblies using the Omni-C data with SALSA (Ghurye et al. 2017, 2019).

Table 1.

Assembly pipeline and software used.

Assembly Software and optionsa Version
Filtering PacBio HiFi adapters HiFiAdapterFilt Commit 64d1c7b
K-mer counting Meryl (k = 21) 1
Estimation of genome size and heterozygosity GenomeScope 2
De novo assembly (contiging) HiFiasm (Hi-C Mode, –primary, output p_ctg.hap1, p_ctg.hap2) 0.16.1-r375
Scaffolding
 Omni-C scaffolding SALSA (-DNASE, -i 20, -p yes) 2
 Gap closing YAGCloser (-mins 2 -f 20 -mcc 2 -prt 0.25 -eft 0.2 -pld 0.2) Commit 0e34c3b
Omni-C contact map generation
 Short-read alignment BWA-MEM (-5SP) 0.7.17-r1188
 SAM/BAM processing samtools 1.11
 SAM/BAM filtering pairtools 0.3.0
 Pairs indexing pairix 0.3.7
 Matrix generation cooler 0.8.10
 Matrix balancing hicExplorer (hicCorrectmatrix correct --filterThreshold -2 4) 3.6
 Contact map visualization HiGlass 2.1.11
PretextMap 0.1.4
PretextView 0.1.5
PretextSnapshot 0.0.3
Organelle assembly
 Mitogenome assembly MitoHiFi (-r, -p 50, -o 1) 2 Commit c06ed3e
Genome quality assessment
 Basic assembly metrics QUAST (--est-ref-size) 5.0.2
 Assembly completeness BUSCO (-m geno, -l metazoa) 5.0.0
Merqury 1
Contamination screening
 Local alignment tool BLAST+ 2.10
 General contamination screening BlobToolKit 2.3.3

Software citations are listed in the text.

Options detailed for non-default runs.

We generated Omni-C contact maps for both assemblies by aligning the Omni-C data against the corresponding assembly with BWA-MEM (Li 2013), identified ligation junctions, and generated Omni-C pairs using pairtools (Goloborodko et al. 2018). We generated a multiresolution Omni-C matrix with cooler (Abdennur and Mirny 2020) and balanced it with hicExplorer [Version 3.6] (Ramírez et al. 2018). We used HiGlass (Kerpedjiev et al. 2018) and the PretextSuite (https://github.com/wtsi-hpag/PretextView; https://github.com/wtsi-hpag/PretextMap; https://github.com/wtsi-hpag/PretextSnapshot) to visualize the contact maps. We checked the contact maps for major misassemblies. If found, we cut the assemblies at the gaps where misassemblies were found. No further joins were made after this step. Using the PacBio HiFi reads and YAGCloser (https://github.com/merlyescalona/yagcloser), we closed some of the remaining gaps generated during scaffolding. We then checked for contamination using the BlobToolKit Framework (Challis et al. 2020). Finally, we trimmed remnants of sequence adaptors and mitochondrial contamination. The genome was automatically run through NCBI’s Eukaryotic Genome Annotation Pipeline using RNA-Seq reads from Masonbrink et al. (2019).

Mitochondrial genome assembly

We assembled the mitochondrial genome of the red abalone from the PacBio HiFi reads using the reference-guided pipeline MitoHiFi (https://github.com/marcelauliano/MitoHiFi; Allio et al. 2020). The mitochondrial sequence of Haliotis ovina (NCBI:MZ147805.1) was used as the starting reference sequence. After completion of the nuclear genome, we searched for matches of the resulting mitochondrial assembly sequence in the nuclear genome assembly using BLAST+ (Camacho et al. 2009) and filtered out contigs and scaffolds from the nuclear genome with a percentage of sequence identity >99% and size smaller than the mitochondrial assembly sequence.

Genome size estimation and quality assessment

We generated k-mer counts (k = 21) from the PacBio HiFi reads using meryl (https://github.com/marbl/meryl). The k-mer database was then used in GenomeScope2.0 (Ranallo-Benavidez et al. 2020) to estimate genome features including genome size, heterozygosity, and repeat content. To obtain general contiguity metrics, we ran QUAST (Gurevich et al. 2013). To evaluate genome quality and completeness we used BUSCO (Manni et al. 2021) with the metazoa ortholog database (metazoa_odb10) which contains 954 genes. Assessment of base level accuracy (QV) and k-mer completeness was performed using the previously generated meryl database and merqury (Rhie et al. 2020). We further estimated genome assembly accuracy via BUSCO gene set frameshift analysis using the pipeline described in Korlach et al. (2017). Measurements of the size of the phased blocks are based on the size of the contigs generated by HiFiasm on HiC mode (initial diploid assembly). Following data availability and quality metrics established by Rhie et al. (2021), we will use the derived genome quality notation x·y·P·Q·C, where x = log10[contig NG50]; y = log10[scaffold NG50]; P = log10[phased block NG50]; Q = Phred base accuracy QV (quality value); C = % genome represented by the first “n” scaffolds, following a known karyotype of 2n = 36 (Gallardo-Escarate et al. 2007). Quality metrics for the notation were calculated on the primary assembly.

Results

The Omni-C and PacBio HiFi sequencing libraries generated 52.2 million read pairs and 4.3 million reads, respectively. The latter yielded 50.8-fold coverage (N50 read length 15,755 bp; minimum read length 242 bp; mean read length 15,032 bp; maximum read length 58,582 bp) based on the GenomeScope2.0 genome size estimation of 1.28 Gb. Based on PacBio HiFi reads, we estimated 0.139% sequencing error rate and 1.37% heterozygosity rate. The k-mer spectrum based on PacBio HiFi reads show (Fig. 2A) a bimodal distribution with 2 major peaks at ~24- and ~49-fold coverage, where peaks correspond to homozygous and heterozygous states of a diploid species. The distribution presented in this k-mer spectrum supports that of a high heterozygosity profile.

Fig. 2.

Fig. 2.

Visual overview of genome assembly metrics. (A) K-mer spectrum output generated from PacBio HiFi data without adapters using GenomeScope2.0. The bimodal pattern observed corresponds to a diploid genome. K-mers covered at lower coverage and high frequency correspond to differences between haplotypes, whereas the higher coverage and slightly lower frequency k-mers correspond to the similarities between haplotypes. (B) BlobToolKit Snail plot showing a graphical representation of the quality metrics presented in Table 2 for the Haliotis rufescens primary assembly (xgHalRufe1). The plot circle represents the full size of the assembly. From the inside-out, the central plot covers length-related metrics. The red line represents the size of the longest scaffold; all other scaffolds are arranged in size-order moving clockwise around the plot and drawn in gray starting from the outside of the central plot. Dark and light orange arcs show the scaffold N50 and scaffold N90 values. The central light gray spiral shows the cumulative scaffold count with a white line at each order of magnitude. White regions in this area reflect the proportion of Ns in the assembly. The dark versus light blue area around it shows mean, maximum, and minimum GC versus AT content at 0.1% intervals (Challis et al. 2020). Omni-C contact maps for the primary (C) and alternate (D) genome assembly generated with PretextSnapshot. Hi-C contact maps translate proximity of genomic regions in 3D space to contiguous linear organization. Each cell in the contact map corresponds to sequencing data supporting the linkage (or join) between 2 of such regions.

The final assembly (xgHalRufe1) consists of 2 pseudo haplotypes, primary and alternate, where both genome sizes are similar to the estimated value from GenomeScope2.0 (Fig. 2A). The primary assembly consists of 616 scaffolds spanning 1.33 Gb with contig N50 of 8.8 Mb, scaffold N50 of 45.7 Mb, largest contig of 38.6 Mb and largest scaffold of 94.2 Mb. On the other hand, the alternate assembly consists of 494 scaffolds, spanning 1.37 Gb with contig N50 of 7.9 Mb, scaffold N50 of 44.1 Mb, largest contig of 35.6 Mb and largest scaffold of 78.7 Mb. Assembly statistics are reported in tabular and graphical forms are in Table 2, and Fig. 2B for the primary assembly (see Supplementary Fig. 1 for the alternate assembly).

Table 2.

Sequencing and assembly statistics, and accession numbers.

BioProjects and vouchers CCGP NCBI BioProject PRJNA720569
Genera NCBI BioProject PRJNA765838
Species NCBI BioProject PRJNA777175
NCBI BioSample SAMN26275698, SAMN26275699
Specimen identification Individual from Van Damme State park (USA: California)
NCBI Genome accessions Primary Alternate
Assembly accession JALGQA000000000 JALGQB000000000
Genome sequences GCA_023055435.1 GCA_023055495.1
Genome sequence PacBio HiFi reads Run 1 PACBIO_SMRT (Sequel II) run: 4.3M spots, 65.4G bases, 39Gb
Accession SRX15312148
Omni-C Illumina reads Run 2 ILLUMINA (Illumina NovaSeq 6000) runs: 52.3M spots, 14.7G bases, 5 Gb
Accession SRX15312149, SRX15312150
Genome assembly quality metrics Assembly identifier (quality codea) xgHalRufe1(6.7.P6.Q99.C71)
HiFi read coverageb 50.8×
Primary Alternate
Number of contigs 770 494
Contig N50 (bp) 8,868,657 7,900,433
Contig NG50 (bp)b 9,411,628 8,542,416
Longest contigs 38,619,994 35,604,284
Number of scaffolds 616 362
Scaffold N50 (bp) 45,695,856 44,111,220
Scaffold NG50 (bp)b 45,695,856 49,836,399
Largest scaffold 94,228,061 78,740,588
Size of final assembly (bp) 1,334,471,355 1,379,205,445
Phased block NG50 (bp)b 9,411,628 9,783,964
Gaps per Gbp (#Gaps) 115 (154) 96 (132)
Indel QV (frameshift) 50.48001329 50.19467804
Base pair QV 64.9337 65.6339
Full assembly = 65.2755
K-mer completeness 81.3176 82.7031
Full assembly = 99.2998
BUSCO completeness (metazoa), n = 954 C S D F M
Pc 97.30% 96.90% 0.40% 1.70% 1.00%
Ac 97.40% 96.20% 1.20% 1.50% 1.10%
Organelles 1 partial mitochondrial sequence JALGQA010000616.1

Assembly quality code x·y·P·Q·C derived notation, from Rhie et al. (2021). x = log10[contig NG50]; y = log10[scaffold NG50]; P = log10[phased block NG50]; Q = Phred base accuracy QV (quality value); C = % genome represented by the first “n” scaffolds, following a known karyotype of 2n = 36 (Gallardo-Escarate et al. 2007). Quality code for all the assembly denoted by primary assembly (xgHalRufe1.0.p). BUSCO scores. (C)omplete and (S)ingle; (C)omplete and (D)uplicated; (F)ragmented and (M)issing BUSCO genes. n, number of BUSCO genes in the set/database.

Read coverage and NGx statistics have been calculated based on the estimated genome size of 1.28 Gb.

P(rimary) and (A)lternate assembly values.

We identified a total of 16 misassemblies generated during the scaffolding step, 7 on the primary, and 9 on the alternate assembly, and broke the assemblies at the corresponding joins. We were able to close a total of 24 gaps, 9 on the primary assembly and 15 on the alternate assembly. We filtered 2 contigs from the primary assembly corresponding to contamination matches to a nematode and a brachiopod, and 1 contig from the alternate corresponding to an arthropod contaminant. Finally, we filtered out a single contig from the alternate assembly corresponding to mitochondrial contamination. The primary assembly has a BUSCO completeness score of 97.3% using the metazoa gene set, a per base quality (QV) of 64.9, a k-mer completeness of 81.3, and a frameshift indel QV of 50.4. The alternate assembly has a BUSCO completeness score of 97.4% using the aves gene set, a per base quality (QV) of 65.6, a k-mer completeness of 82.7, and a frameshift indel QV of 50.1. The Omni-C contact maps shows that both assemblies are highly contiguous (Fig. 2C and D). We have deposited scaffolds corresponding to both primary and alternate haplotypes on NCBI (see Table 2 and Data availability for details).

The mitochondrial genome assembled with MitoHiFi has a genome size of 17,141 bp. The base composition of the mitochondrial assembly is A = 25.44%, C = 13.62%, G = 25.45%, T = 35.49% and consists of 22 unique transfer RNAs and 13 protein coding genes.

Discussion

The previously assembled genome for red abalone, using PacBio long reads and Illumina short reads, comprised 8,371 scaffolds with a BUSCO complete score of 95.1%, suggesting that the assembly is of high quality (Masonbrink et al. 2019). To be consistent with the CCGP data analysis pipeline and to ensure that our dataset is comparable with other species from CCGP using the “community genomics” approach, we generated a new reference genome using a hybrid assembly that combines PacBio long-read data with Omni-C chromatin conformation data used for scaffolding. Compared with the previous assembly, the new reference genome is composed of a considerably smaller number of scaffolds (615) and an improved BUSCO complete score of 97.3%.

We found the genome size of H. rufescens to be 1.33 Gb, which is similar to the previous estimate for this species (1.49 Gb; Masonbrink et al. 2019), and to other Haliotis species, such as H. rubra (blacklip abalone 1.24 to 1.31 Gb; Gan et al. 2019), H. laevigata (greenlip abalone 1.76 Gb; Botwright et al. 2019), H. cracherodii (black abalone 1.1 Gb; Orland et al. 2022), and H. discus hannai (Pacific abalone 1.86 Gb; Nam et al. 2017). The GC content for our new genome was 40.9%, a value similar to the previously assembled genome by Masonbrink et al. (2019) (40.2%).

This reference genome will serve as an important resource for ongoing investigations using whole-genome resequencing efforts for populations spanning southern Oregon, United States, to Baja California, MEX, as part of the CCGP goals (Shaffer et al. 2022). We plan to address important questions that will inform conservation and management practices: 1) What is the genetic structure of red abalone populations and the degree of population connectivity? We are particularly interested in investigating the level of gene flow among northern California and southern Oregon populations where connectivity among populations is virtually unknown (Gruenthal et al. 2007). 2) What is the genetic basis of adaptation to temperature and ocean acidification resilience? The red abalone’s distribution spans a broad geographic range, exposing them to a variety of environmental gradients, which may promote local adaptation. 3) Using historical and modern samples, what is the effective population size for populations in recent decline? Given the difficulty of performing scuba transect surveys and the low density of individuals in their habitat, genomic data may be able to fill this knowledge gap. The outcomes of research studies that address these questions will provide important insights to help guide conservation, management, and sustainable aquaculture for this iconic species.

Supplementary Material

esac047_suppl_Supplementary_Figure_S1

Acknowledgments

We thank Blythe Marshman and Dr Sara Boles for collecting epipodal tissue used to generate the reference genome. PacBio Sequel II library prep and sequencing were carried out at the DNA Technologies and Expression Analysis Cores at the University of California Davis Genome Center, supported by NIH Shared Instrumentation Grant 1S10OD010786-01. Deep sequencing of Omni-C libraries used the Novaseq S4 sequencing platforms at the Vincent J. Coates Genomics Sequencing Laboratory at University of California Berkeley, supported by NIH S10 OD018174 Instrumentation Grant. We thank the staff at the University of California Davis DNA Technologies and Expression Analysis Cores and the University of California Santa Cruz Paleogenomics Laboratory for their diligence and dedication to generating high quality sequence data. We thank the CCGP leadership team (Brad Schaffer, Victoria Sork, and Erin Toffelmier) for securing funding and for project organization and coordination.

Contributor Information

Joanna S Griffiths, Department of Environmental Toxicology, University of California Davis, Davis, CA, United States.

Ruta M Sahasrabudhe, DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California Davis, Davis, CA, 95616, United States.

Mohan P A Marimuthu, DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California Davis, Davis, CA, 95616, United States.

Noravit Chumchim, DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California Davis, Davis, CA, 95616, United States.

Oanh H Nguyen, DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California Davis, Davis, CA, 95616, United States.

Eric Beraut, Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, United States.

Merly Escalona, Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States.

Andrew Whitehead, Department of Environmental Toxicology, University of California Davis, Davis, CA, United States.

Funding

This work was supported by the California Conservation Genomics Project, with funding provided to the University of California by the State of California, State Budget Act of 2019 [UC Award ID RSI-19-690224 to AW].

Data availability

Data generated for this study are available under NCBI BioProject PRJNA777175. Raw sequencing data for VD_foot and VD_epi2 (NCBI BioSample SAMN26275698, SAMN26275699) are deposited in the NCBI Short Read Archive (SRA) under SRX15312148 for the PacBio HiFi sequencing data and SRX15312149, SRX15312150 for the Omni-C sequencing data. GeneBank accessions for both primary and alternate assemblies are GCA_023055435.1 and GCA_023055495.1; and genome sequences JALGQA000000000 and JALGQB000000000. The GenBank accession for the mitochondrial genome is JALGQA010000616.1. Assembly scripts and other data for the analyses presented can be found at the following GitHub repository: www.github.com/ccgproject/ccgp_assembly. RNA-Seq read data used to annotate the genome are located in the NCBI short read archive (BioProject accession: PRJNA488641).

References

  1. Abdennur N, Mirny LA.. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics. 2020;36(1):311–316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Allio R, Schomaker-Bastos A, Romiguier J, Prosdocimi F, Nabholz B, Delsuc F.. MitoFinder: efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics. Mol Ecol Resour. 2020;20(4):892–905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Botwright NA, Zhao M, Wang T, McWilliam S, Colgrave ML, Hlinka O, Li S, Suwansa-ard S, Subramanian S, McPherso L, et al. Greenlip abalone genome and protein analysis provides insights into maturation and spawning. G3. 2019;9(10):3067–3078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL.. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10(421). [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Challis R, Richards E, Rajan J, Cochrane G, Blaxter M.. BlobToolKit—interactive quality assessment of genome assemblies. G3. 2020;10(4):1361–1374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cheng H, Jarvis ED, Fedrigo O, Koepfli K-P, Urban L, Gemmell NJ, Li H.. Robust haplotype-resolved assembly of diploid individuals without parental data, arXiv [q-bio.GN], arXiv:2109.04785, 2021, preprint: not peer reviewed. [DOI] [PMC free article] [PubMed]
  7. Cook PA. Recent trends in worldwide abalone production. J Shellfish Res. 2016;35(3):581–583. [Google Scholar]
  8. Cox KW. Fish Bulletin No. 118. California Abalones, Family Haliotidae. 1962. https://escholarship.org/content/qt5c46p19z/qt5c46p19z.pdf?t=krnkos [accessed 2022 Jan 5].
  9. Gallardo-Escarate C, del Rio-Portilla MA.. Karyotype composition in three California abalones and their relationship with genome size. J Shellfish Res. 2007;26:825–832. [Google Scholar]
  10. Gan HM, Hua Tan G, Austin CM, Sherman CDH, Ting Wong Y, Strugnell J, Gervis M, McPherson L, Miller A.. Best foot forward: nanopore long reads, hybrid meta-assembly, and haplotig purging optimizes the first genome assembly for the Southern Hemisphere Blacklip abalone. Front Genet. 2019;10:889. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Ghurye J, Pop M, Koren S, Bickhart D, Chin C-S.. Scaffolding of long read assemblies using long range contact information. BMC Genomics. 2017;18(1):527. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Ghurye J, Rhie A, Walenz BP, Schmitt A, Selvaraj S, Pop M, Koren S.. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLoS Comput Biol. 2019;15(8):e1007273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Goloborodko A, Abdennur N, Venev S; hbbrandao & gfudenberg. mirnylab/pairtools: v0.2.0. 2018. [Google Scholar]
  14. Gruenthal KM, Acheson LK, Burton RS.. Genetic structure of natural populations of California red abalone (Haliotis rufescens) using multiple genetic markers. Mar Biol. 2007;152(6):1237–1248. [Google Scholar]
  15. Gurevich A, Saveliev V, Vyahhi N, Tesler G.. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29(8):1072–1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Huang Y, Escalona M, Morrison G, Marimuthu MPA, Nguyen O, Toffelmier E, Bradley Shaffer H, Litt A.. Reference genome assembly of the big berry manzanita (Arctostaphylos glauca). J Hered. 2022;113(2):188–196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Kerpedjiev P, Abdennur N, Lekschas F, McCallum C, Dinkla K, Strobelt H, Gehlenborg N.. HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol. 2018;19(1):1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Korlach J, Gedman G, Kingan SB, Chin C-S, Howard JT, Audet J-N, Jarvis ED.. De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads. GigaScience. 2017;6(10):1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM, arXiv, arXiv:1303.3997, 2013, preprint: not peer reviewed.
  20. Manni M, Berkeley MR, Seppey M, Simao FA, Zdobnov EM.. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes, arXiv [q-bio], arXiv:2106.11799, 2021, preprint: not peer reviewed. [DOI] [PMC free article] [PubMed]
  21. Masonbrink RE, Purcell CM, Boles SE, Whitehead A, Hyde JR, Seetharam AS, Severin AJ.. An annotated genome for Haliotis rufescens (red aaalone) and resequenced green, pink, pinto, black, and white abalone species. Genome Biol Evol. 2019;11(2):431–438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Nam B-H, Kwak W, Kim Y-O, Kim D-G, Kong H-J, Kim W-J, Kang J-H, Park J-Y, An C-M, Moon J-Y, et al. Genome sequence of Pacific abalone (Haliotis discus hannai): the first draft genome in family Haliotidae. GigaScience. 2017;6(5):1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Orland C, Escalona M, Sahasrabudhe R, Marimuthu MPA, Nguyen O, Beraut E, Marshman B, Moore J, Raimondi P, Shapiro B.. A draft reference genome assembly of the critically endangered black abalone, Haliotis cracherodii. J Hered. 2022: 113:665–672. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Ramírez F, Bhardwaj V, Arrigoni L, Lam KC, Grüning BA, Villaveces J, Habermann B, Akhtar A, Manke T.. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat Commun. 2018;9:189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Ranallo-Benavidez TR, Jaron KS, Schatz MC.. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 2020;11(1):1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Uliano-Silva M, Chow W, Fungtammasan A, Kim J, et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature. 2021;592(7856):737–746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Rhie A, Walenz BP, Koren S, Phillippy AM.. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21(1):1– 27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Rick TC, Braje TJ, Erlandson JM.. Early red abalone shell middens, human subsistence, and environmental change on California’s Northern Channel Islands. J Ethnobiol. 2019;39(2):204–222. [Google Scholar]
  29. Rogers-Bennett L, Catton CA.. Marine heat wave and multiple stressors tip bull kelp forest to sea urchin barrens. Sci Rep. 2019;9(1):1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Shaffer HB, Toffelmier E, Corbett-Detig RB, Escalona M, Erickson B, Fiedler P, Gold M, Harrigan RJ, Hodges S, Luckau TK, et al. Landscape genomics to enable conservation actions: the California Conservation Genomics Project. J Hered. 2022;113:577–588. [DOI] [PubMed] [Google Scholar]
  31. Sim SB, Corpuz RL, Simmonds TJ, Geib SM.. HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly. BMC Genomics. 2022;23(1):1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Swezey DS, Boles SE, Aquilino KM, Stott HK, Bush D, Whitehead A, Rogers-Bennett L, Hill TM, Sanford E.. Evolved differences in energy metabolism and growth dictate the impacts of ocean acidification on abalone aquaculture. Proc Natl Acad Sci USA. 2020;117(42):26513–26519. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Todd BD, Jenkinson TS, Escalona M, Beraut E, Nguyen O, Sahasrabudhe R, Scott PA, Toffelmier E, Wang IJ, Shaffer HB.. Reference genome of the Northwestern Pond Turtle, Actinemys marmorata. J Hered. 2022;113:624–631. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

esac047_suppl_Supplementary_Figure_S1

Data Availability Statement

Data generated for this study are available under NCBI BioProject PRJNA777175. Raw sequencing data for VD_foot and VD_epi2 (NCBI BioSample SAMN26275698, SAMN26275699) are deposited in the NCBI Short Read Archive (SRA) under SRX15312148 for the PacBio HiFi sequencing data and SRX15312149, SRX15312150 for the Omni-C sequencing data. GeneBank accessions for both primary and alternate assemblies are GCA_023055435.1 and GCA_023055495.1; and genome sequences JALGQA000000000 and JALGQB000000000. The GenBank accession for the mitochondrial genome is JALGQA010000616.1. Assembly scripts and other data for the analyses presented can be found at the following GitHub repository: www.github.com/ccgproject/ccgp_assembly. RNA-Seq read data used to annotate the genome are located in the NCBI short read archive (BioProject accession: PRJNA488641).


Articles from Journal of Heredity are provided here courtesy of Oxford University Press

RESOURCES