A haplotype-resolved genome assembly of the bocaccio rockfish, Sebastes paucispinis

Rishi De-Kayne; Stacy Li; Merly Escalona; Runyang Nicolas Lou; Juan Manuel Vazquez; Gregory L Owens; Sree Rohit Raj Kolora; Conner Jainese; Katelin Seeto; Merit McCrea; Oanh Nguyen; Noravit Chumchim; Ruta Sahasrabudhe; Colin W Fairbairn; Richard E Green; William E Seligmann; Milton Love; Peter H Sudmant

doi:10.1093/jhered/esaf026

. 2025 May 5;116(6):826–834. doi: 10.1093/jhered/esaf026

A haplotype-resolved genome assembly of the bocaccio rockfish, Sebastes paucispinis

Rishi De-Kayne ^1,^a,^✉, Stacy Li ^2,^a, Merly Escalona ^3,^a, Runyang Nicolas Lou ⁴, Juan Manuel Vazquez ⁵, Gregory L Owens ⁶, Sree Rohit Raj Kolora ⁷, Conner Jainese ⁸, Katelin Seeto ⁹, Merit McCrea ¹⁰, Oanh Nguyen ¹¹, Noravit Chumchim ¹², Ruta Sahasrabudhe ¹³, Colin W Fairbairn ¹⁴, Richard E Green ¹⁵, William E Seligmann ¹⁶, Milton Love ¹⁷, Peter H Sudmant ^18,^✉

Editor: Elizabeth Alter

¹ Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA

² Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA

³ Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA

⁴ Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA

⁵ Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA

⁶ Department of Biology, University of Victoria, Victoria, BC, Canada

⁷ Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA

⁸ Marine Science Institute, University of California, Santa Barbara, Santa Barbara, CA, USA

⁹ Marine Science Institute, University of California, Santa Barbara, Santa Barbara, CA, USA

¹⁰ Marine Science Institute, University of California, Santa Barbara, Santa Barbara, CA, USA

¹¹ DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California, Davis, Davis, CA, USA

¹² DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California, Davis, Davis, CA, USA

¹³ DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California, Davis, Davis, CA, USA

¹⁴ Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA

¹⁵ Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA

¹⁶ Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA

¹⁷ Marine Science Institute, University of California, Santa Barbara, Santa Barbara, CA, USA

¹⁸ Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA

Rishi De-Kayne, Stacy Li and Merly Escalona Co-first author ordered alphabetically by last name.

^✉

Corresponding authors: The Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, United States. Email: rdekayne@berkeley.edu (RD)

^✉

psudmant@berkeley.edu (PHS)

Roles

Rishi De-Kayne: Data curation, Formal analysis, Investigation, Resources, Software, Validation, Visualization, Writing - original draft, Writing - review & editing

Stacy Li: Conceptualization, Data curation, Formal analysis, Project administration, Resources, Software, Writing - original draft, Writing - review & editing

Merly Escalona: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing - original draft, Writing - review & editing

Juan Manuel Vazquez: Data curation, Methodology, Resources

Sree Rohit Raj Kolora: Conceptualization, Data curation, Investigation, Methodology, Resources, Writing - review & editing

Conner Jainese: Investigation, Methodology, Resources

Katelin Seeto: Data curation, Investigation, Methodology

Merit McCrea: Data curation, Methodology, Resources

Oanh Nguyen: Data curation, Investigation, Methodology, Resources

Noravit Chumchim: Data curation, Investigation, Methodology, Resources

Colin W Fairbairn: Investigation, Methodology, Resources

Milton Love: Data curation, Funding acquisition, Methodology, Project administration, Supervision, Writing - review & editing

Elizabeth Alter: Corresponding Editor

PMCID: PMC12584591 PMID: 40323688

Abstract

Rockfishes (genus Sebastes) are one of the most diverse clades amongst teleosts (ray-finned fishes). The genus includes more than 110 species which are distributed broadly across the North Pacific Ocean, North and South Atlantic Ocean, and Southeastern Pacific Ocean. Rockfishes exhibit particularly high diversity along the western coast of the United States, where their abundance plays a critical role in local marine ecosystems and fisheries. Sebastes paucispinis (“bocaccio”) is a rockfish species most commonly found off the coast of California. In 2005, Bocaccio were federally declared overfished following massive depletion by commercial and recreational fisheries from the 1980s to early 2000s. Implementation of significant restrictions has bolstered recovery of critical rockfish populations along the California and Oregon coasts, but the impact of anthropogenic stressors on bocaccio, and other Sebastes species, has yet to be fully evaluated. Here, we present the first de novo reference-quality genome assembly of Sebastes paucispinis, as part of the California Conservation Genomics Project.

Keywords: bocaccio, California Conservation Genomics Project, CCGP, rockfish, Sebastes

Introduction

Sebastes paucispinis, also known as the bocaccio rockfish, is a critically endangered (IUCN v.2.3) member of the diverse genus of rockfishes, Sebastes of which there are more than 110 species. Bocaccio are named for both their near absence of head spines (paucispinis meaning “few spines” in Latin) and large mouth (bocaccio meaning “big mouth” or “ugly mouth” in Italian; Love et al. 2002). Bocaccio have been described as being a variety of colors, ranging from olive-brown or silvery grey, to red, pink, and even orange (Orr et al. 1998; Love et al. 2002; Fig. 1). In addition to this variable main body coloration, bocaccio may have patches of white as well as dark (cancerous) melanistic blotches that typically present in older, larger, individuals (Orr et al. 1998; Love et al. 2002). Bocaccio are a long-lived species, with an estimated maximum lifespan of 70 years (Department of Fisheries and Oceans Canada, pers. comm. to M.L.), and are geographically distributed along the northeastern Pacific Ocean, spanning the western reaches of North America from Alaska to central Baja California (Love et al. 2021). Juveniles are found near the surface and in inshore waters, adults are found at depths between 20 and 475 m. Bocaccio are found primarily over such complex structures as rocky reefs and oil platforms (where they form substantial aggregations; Love et al. 2002, 2005, 2009). They are most abundant off British Columbia and from northern California to at least northern Baja California (Love and Passarelli 2020), where they play a critical ecological role, representing a key species in marine food webs due to their broadly piscivorous diet.

Fig. 1. — A) images of *Sebastes paucispinis* (image credit SWFSC ROV Team) illustrating bocaccio color variation, and B) the read length histogram of HiFi reads used for the assembly, where the red line represents the mean read length of 13,283 bp.

Bocaccio are widely enjoyed for human consumption and as such represent an economically important species for both recreational and commercial fisheries (He and Field 2017). However, overfishing has driven a significant modern decline in historically robust populations along the Pacific coast, especially in the Puget Sound/Georgia Basin regions where bocaccio are now protected under the Endangered Species Act (Williams et al. 2010; https://www.fisheries.noaa.gov/species/bocaccio-protected). Populations were estimated to have declined by ~96% to 98% in 2005 compared to historic levels and were federally declared overfished, though the population has increased in recent years and the fishery reopened in 2018 (He and Field 2017). Several life history traits, namely, late-onset maturity and relative longevity, have made bocaccio particularly vulnerable to overfishing (Love et al. 2002). Although a number of genetic studies have aimed to assess the impacts, consequences, and future outlook for bocaccio populations, the lack of species-specific high-quality genetic resources has hindered the use of state-of-the-art conservation genomics approaches (Matala et al. 2004; Drake et al. 2010; Buonaccorsi et al. 2012). As a result, the genetic consequences of overfishing in the years preceding and following federal restrictions on fishing activities have yet to be comprehensively quantified.

As part of the California Conservation Genomics Project (CCGP) consortium (Shaffer et al. 2022), we present a complete Sebastes paucispinis reference-quality genome to aid in bocaccio conservation efforts, supporting the CCGP’s broader efforts in creating high-quality genomic resources for studying and monitoring species diversity.

Methods

Biological materials

A single Sebastes paucispinis specimen was collected in May 2019 from the Footprint State Marine Reserve, situated between the Anacapa and Santa Cruz islands west of the southern California coast (in accordance with NOAA permit LOA-3-2019). The specimen was euthanized, immediately dissected on ice, and the liver tissue was removed and flash-frozen.

High molecular weight gDNA isolation

High molecular weight (HMW) genomic DNA (gDNA) was extracted from 40 mg of frozen liver tissue (Voucher number—SEB-726) using the Nanobind Tissue Big DNA kit (Pacific BioSciences—PacBio, Menlo Park, CA) following manufacturer’s instructions. Purity of gDNA was accessed using NanoDrop ND-1000 spectrophotometer where a 260/280 ratio of 1.83 and 260/230 ratio of 2.11 were observed. DNA yield was 16.6 µg as quantified by Qbit 2.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA). Integrity of the HMW gDNA was verified on a Femto pulse system (Agilent Technologies, Santa Clara, CA) where 85% of the DNA was observed in fragments above 100 kb.

HiFi library preparation and sequencing

We constructed a HiFi SMRTbell library using the SMRTbell Express Template Prep Kit v2.0 (PacBio, Cat. #100-938-900) following the manufacturer’s instructions. HMW gDNA was sheared to a target DNA size distribution between 15 kb and 18 kb using Diagenode’s Megaruptor 3 system (Diagenode, Belgium; cat. B06010003). The sheared gDNA was concentrated using 0.45× of AMPure PB beads (PacBio, Cat. #100-265-900) for the removal of single-strand overhangs at 37 °C for 15 min, followed by further enzymatic steps of DNA damage repair at 37 °C for 30 min, end repair and A-tailing at 20 °C for 10 min and 65 °C for 30 min, and ligation of overhang adapters v3 at 20 °C for 60 min. The SMRTbell library was purified and concentrated with 1X AMPure PB beads for nuclease treatment at 37 °C for 30 min and size selection using the PippinHT system (Sage Science, Beverly, MA; Cat #HPE7510) to collect fragments greater than 7kb to 9 kb. The 15 kb to 20 kb average HiFi SMRTbell library was sequenced at UC Davis DNA Technologies Core (Davis, CA) using two 8M SMRT cells on the Sequel IIe with sequencing chemistry 2.0 and 30-h movies.

Omni-C library preparation and sequencing

We prepared an Omni-C library using a Dovetail Omni-C Kit (Dovetail Genomics; Scotts Valley, CA) according to the manufacturer’s protocol with slight modifications. First, a second sample of S. paucispinis frozen liver tissue from the same individual was thoroughly ground with a mortar and pestle under liquid nitrogen. Subsequently, chromatin was fixed in place in the nucleus. The suspended chromatin solution was then passed through 100 and 40 μm cell strainers to remove large debris. Fixed chromatin was digested under various conditions of DNase I until a suitable fragment length distribution of DNA molecules was obtained. Chromatin ends were repaired and ligated to a biotinylated bridge adapter followed by proximity ligation of adapter-containing ends. After proximity ligation, crosslinks were reversed and the DNA was purified from proteins. Purified DNA was treated to remove biotin that was not internal to ligated fragments. An NGS library was generated using an NEB Ultra II DNA Library Prep kit (New England Biolabs [NEB], Ipswich, MA) with an Illumina-compatible y-adaptor. Biotin-containing fragments were then captured using streptavidin beads. The post-capture product was split into two replicates prior to PCR enrichment to preserve library complexity with each replicate receiving unique dual indices. The library was sequenced at Vincent J. Coates Genomics Sequencing Lab (Berkeley, CA) on an Illumina NovaSeq 6000 platform (Illumina, San Diego, CA) to generate approximately 100 million 2 × 150 bp paired-end reads per giga base of genome size.

Nuclear genome assembly

We assembled the genome of a Sebastes paucispinis individual following the CCGP assembly pipeline version 5.0 (www.github.com/ccgproject/ccgp_assembly), as outlined in Table 1 listing the tools and non-default parameters used in the assembly process. We removed the remnants adapter sequences from the PacBio HiFi dataset using HiFiAdapterFilt (Sim et al. 2022) and generated an initial diploid phased assembly using HiFiasm (Cheng et al. 2021) in HiC mode with the filtered PacBio HiFi reads and the Omni-C short-reads, a process that generates two assemblies, one per haplotype. We then aligned the Omni-C data to both assemblies following the Arima Genomics Mapping Pipeline (https://github.com/ArimaGenomics/mapping_pipeline) and then scaffolded both assemblies with SALSA (Ghurye et al. 2017, 2019).

Table 1.

Assembly pipeline and software used. All software is cited in the text.

Assembly step	Software and any non-default options	Version	Reference
Initial assembly
Filtering PacBio HiFi adapters	HiFiAdapterFilt	Commit 64d1c7b	Sim et al. 2022
k-mer counting	Meryl (k = 21)	1	https://github.com/marbl/meryl
Estimation of genome size and heterozygosity	GenomeScope (-l 50)	2	Ranallo-Benavidez et al. 2020
De novo assembly (contiging)	HiFiasm (Hi-C Mode, –primary, output hic.hap1.p_ctg, hic.hap2.p_ctg)	0.19.4-r575	Cheng et al. 2022
Scaffolding
Omni-C data alignment	Arima Genomics Mapping Pipeline	Commit 2e74ea4	https://github.com/ArimaGenomics/mapping_pipeline
Arima Genomics Mapping Pipeline (AGMP)	BWA-MEM	0.7.17-r1188	Li 2013
	samtools	1.11	Danecek et al. 2021
	filter_five_end.pl (AGMP)	Commit 2e74ea4	https://github.com/ArimaGenomics/mapping_pipeline
	two_read_bam_combiner.pl ((AGMP))	Commit 2e74ea4	https://github.com/ArimaGenomics/mapping_pipeline
	picard	2.27.5	https://broadinstitute.github.io/picard/
Omni-C Scaffolding	SALSA (-DNASE, -i 20, -p yes)	2	Ghurye et al. 2017, Ghurye et al. 2019
Omni-C contact map generation
Short-read alignment	BWA-MEM (-5SP)	0.7.17-r1188	Li 2013
SAM/BAM processing	samtools	1.11	Danecek et al. 2021
SAM/BAM filtering	pairtools	0.3.0	Open2C et al. 2024
Pairs indexing	pairix	0.3.7	Lee et al. 2022
Matrix generation	cooler	0.8.10	Abdennur and Mirny 2020
Matrix balancing	hicExplorer (hicCorrectmatrix correct --filterThreshold -2 4)	3.6	Ramírez et al. 2018
Contact map visualization	HiGlass	2.1.11	Kerpedjiev et al. 2018
	PretextMap	0.1.4	https://github.com/wtsi-hpag/PretextView
	PretextView	0.1.5	https://github.com/wtsi-hpag/PretextMap
	PretextSnapshot	0.0.3	https://github.com/wtsi-hpag/PretextSnapshot
Manual curation tools	Rapid curation pipeline (Wellcome Trust Sanger Institute, Genome Reference Informatics Team)	Commit 7acf220c	https://gitlab.com/wtsi-grit/rapid-curation
Genome quality assessment
Basic assembly metrics	QUAST (--est-ref-size)	5.0.2	Gurevich et al. 2013
Assembly completeness	BUSCO (-m geno, -l actinopterygii_odb10)	5.8.2	Manni et al. 2021
Assembly completeness	Merqury	2020-01-29	Rhie et al. 2020
Repeat content and transposable element diversity	RepeatMasker	4.1.2-p1	Smit et al. 2015
	Dfam	3.8	Storer et al. 2021
	CD-HIT	4.8.1	Fu et al. 2012
Contamination screening
Local alignment tool	BLAST + (-db nt, -outfmt “6 qseqid staxids bitscore std,” -max_target_seqs 1, -max_hsps 1, -evalue 1e-25)	2.15	Camacho et al. 2009
General contamination screening	BlobToolKit (HiFi coverage, BUSCO = actinopterygii, NCBI Taxa ID = 72093)	2.3.3	Challis et al. 2020

Open in a new tab

The assemblies for both haplotypes were manually curated by iteratively generating and analyzing their corresponding Omni-C contact maps. Briefly, to generate the contact maps we aligned the Omni-C data with BWA-MEM (Li 2013), identified ligation junctions, and generated Omni-C pairs (Lee et al. 2022) using pairtools (Open2C et al. 2024). Then, we generated multi-resolution Omni-C matrices with cooler (Abdennur and Mirny 2020) and balanced them with hicExplorer (Ramírez et al. 2018). We used HiGlass (Kerpedjiev et al. 2018) and the PretextSuite (https://github.com/wtsi-hpag/PretextView; https://github.com/wtsi-hpag/PretextMap; https://github.com/wtsi-hpag/PretextSnapshot) to visualize the contact maps. We identified misassemblies and misjoins in these contact maps, and modified the assemblies using the Rapid Curation pipeline from the Wellcome Trust Sanger Institute, Genome Reference Informatics Team (https://gitlab.com/wtsi-grit/rapid-curation). Some of the remaining gaps (joins generated during scaffolding and/or curation) were closed using the PacBio HiFi reads and YAGCloser (https://github.com/merlyescalona/yagcloser). We checked for contamination using the BlobToolKit Framework (Challis et al. 2020).

Genome quality assessment

We generated k-mer counts from the PacBio HiFi reads using meryl (https://github.com/marbl/meryl). The k-mer counts were then used in GenomeScope2.0 (Ranallo-Benavidez et al. 2020) to estimate genome features including genome size, heterozygosity, and repeat content. To obtain general contiguity metrics, we ran QUAST (Gurevich et al. 2013). To evaluate genome quality and functional completeness we used Benchmarking Universal Single-Copy Orthologs (BUSCO) (Manni et al. 2021) with the Actinopterygii ortholog database (actinopterygii_odb10) which contains 3,640 genes. Assessment of base level accuracy (QV) and k-mer completeness was performed using the previously generated meryl database and Merqury (Rhie et al. 2020). We further estimated genome assembly accuracy via BUSCO gene set frameshift analysis using the pipeline described in (Korlach et al. 2017). Measurements of the size of the phased blocks are based on the size of the contigs generated by HiFiasm on HiC mode. We follow the quality metric nomenclature established by Rhie et al. (2021), with the genome quality code x.y.P.Q.C, where, x = log10[contig NG50]; y = log10[scaffold NG50]; P = log10 [phased block NG50]; Q = Phred base accuracy QV (quality value); C = % genome represented by the first ‘n’ scaffolds, following a karyotype of 2n = 48 for this species, estimated as a mode from ancestral species number of chromosomes (Genome on a Tree—GoaT; tax_name(Sebastes paucispinis); Challis et al. 2023). Quality metrics for the notation were calculated on the haplotype one assembly.

Assembly validation and comparison

To confirm the quality of the assembly, we compared its BUSCO genome completeness assessment (Manni et al. 2021) with other Sebastes rockfish assemblies. Specifically, we compared it against the widow rockfish Sebastes entomelas (NCBI: GCA_045837885.1 and NCBI: GCA_045838235.1), the Acadian redfish Sebastes fasciatus (NCBI: GCA_043250625.1 and GCA_043250585.1), and the honeycomb rockfish Sebastes umbrosus (NCBI: GCF_015220745.1 and GCA_015220095.1).

Repeat content and transposable element diversity

To determine the diversity of repeat elements along the S. paucispinis genome, we produced a species-specific repeat library using repeatmodeler (Smit and Hubley 2008). We combined this species-specific library with ancestral repeats for S. paucispinis from the Dfam database (Storer et al. 2021) and reduced redundancy with cd-hit-est (Fu et al. 2012). We then proceeded to mask repeats in each corresponding species assembly using RepeatMasker (Smit et al. 2015), providing us with a summary of transposable elements (TEs) and other repetitive elements along the genome (Smit et al. 2015).

Results

Sequencing data

The PacBio HiFi library generated 6.01 million reads and the Omni-C library generated 216.72 million read pairs. The PacBio HiFi sequences yielded ~79× genome coverage and had an N50 read length of 13,699 bp; a minimum read length of 96 bp; a mean read length of 13,283 bp; and a maximum read length of 58,255 bp (Fig. 1B). Based on the PacBio HiFi data, Genomescope 2.0 estimated a genome size of 767.83 Mb, a 0.153% sequencing error rate, and 0.211% heterozygosity. The k-mer spectrum shows a bimodal distribution with a major coverage peak at ~100-fold coverage and a minor coverage peak at ~50-fold coverage (Fig. 2A).

Fig. 2. — A) A k-mer-based analysis of genome size and duplication level run using GenomeScope, B) a BlobToolKit Snail plot of the *S. paucispinis* assembly fSebPau1.0.hap1 illustrating assembly summary statistics. The full outer circle represents the entirety of the assembly. From the center of the snail plot, the red line indicates the length of the longest scaffold in the assembly, scaffolds are represented in grey and arranged in ascending length order clockwise around the plot. Light and dark orange sections of the plot represent the N90 and N50 values of the assembly, respectively, and light and dark blue regions on the outer edge of the plot represent AT and GC content, respectively. Dovetail Omni-C contact maps for C) haplotype 1 and D) haplotype 2.

Nuclear genome assembly

The final genome assembly (fSebPau1) consists of two phased haplotypes, both assemblies are similar in size, and similar but not equal to the estimated genome size from GenomeScope2.0. The haplotype one assembly (fSebPau1.0.hap1) consists of 211 scaffolds spanning 806.86 Mb with a contig N50 of 14.26 Mb, a scaffold N50 of 34.62 Mb, the largest contig size of 31.4 Mb, and the largest scaffold size of 43.67 Mb. The haplotype two assembly (fSebPau1.0.hap2) consists of 115 scaffolds spanning 802.85 Mb with a contig N50 of 18.44 Mb, a scaffold N50 of 43.52 Mb, the largest contig size of 35.2 Mb, and the largest scaffold size of 43.52 Mb (Table 2 and Fig. 2B).

Table 2.

Summary of assembly statistics for the Sebastes paucispinis genome assembly.

Bio projects and vouchers	CCGP NCBI BioProject			PRJNA720569
	Genera NCBI BioProject			PRJNA765858
	Species NCBI BioProject			PRJNA777218
	NCBI BioSample			SAMN36697770
	Specimen identification			SEB-726
	NCBI Genome accessions			Haplotype 1		Haplotype 2
	Assembly accession			JAUPFX000000000		JAUPFY000000000
	Genome sequences			GCA_036937225.1		GCA_036937175.1
Genome sequence	PacBio HiFi reads		Run	1 PACBIO_SMRT (Sequel IIe) run: 6M spots, 80G bases, 48.9Gb
			Accession	SRX25151838
	Omni-C Illumina reads		Run	2 ILLUMINA (Illumina NovaSeq 6000) runs: 216.7M spots, 65.4G bases, 21.5Gb
			Accession	SRX25151838-9
Genome assembly quality metrics	Assembly identifier (Quality code)^*			fSebPau1(7.7.P7.Q64.C99)
	HiFi Read coverage ^§			79.95X
				Haplotype 1		Haplotype 2
	Number of contigs			378		278
	Contig N50 (bp)			14,264,469		18,449,034
	Contig NG50 ^§			23,814,533		27,712,622
	Longest Contigs			31,405,881		35,207,844
	Number of scaffolds			211		115
	Scaffold N50			34,620,701		34,476,039
	Scaffold NG50 ^§			38,903,856		38,870,289
	Largest scaffold			43,679,056		43,528,066
	Size of final assembly			806,866,550		802,857,081
	Phased block NG50 ^§			23,644,358		27,712,622
	Gaps per Gbp (# Gaps)			207(167)		203(163)
	Indel QV (Frame shift)			48.27606757		48.51125506
	Base pair QV			64.5533		64.5533
				Full assembly = 64.2817
	k-mer completeness			97.3282		97.3208
				Full assembly = 99.7538
	BUSCO completeness (actinopterygii_odb10) n = 3,640		C^**	S	D	F	M
		H1^‡	99.3%	98.8%	0.5%	0.6%	0.01%
		H2^‡	99.3%	98.9%	0.4%	0.6%	0.01%

Open in a new tab

^*Assembly quality code x.y.P.Q.C derived notation, from (Rhie et al. 2021). x = log10[contig NG50]; y = log10[scaffold NG50]; P = log10 [phased block NG50]; Q = Phred base accuracy QV (Quality value); C = % genome represented by the first “n” scaffolds, following a karyotype of 2n = 48 for this species, estimated as a mode from ancestral species number of chromosomes (Genome on a Tree—GoaT; tax_name(Sebastes paucispinis); Challis et al. 2023). Quality metrics for the notation were calculated on the haplotype one assembly.

^§Read coverage and NGx statistics have been calculated based on the estimated genome size of 767.83 Mb.

^‡(H1) Haplotype 1 and (H2) Haplotype 2 assembly values.

^**BUSCO Scores. Complete BUSCOs (C). Complete and single-copy BUSCOs (S). Complete and duplicated BUSCOs (D). Fragmented BUSCOs (F). Missing BUSCOs (M).

During manual curation, we made a total of 91 joins (47 on haplotype one and 44 on haplotype two), 6 breaks (4 on haplotype one and 2 on haplotype two) based on the Omni-C contact map signal and were able to close a total of 11 gaps (5 on haplotype one and 6 on haplotype two). No further contigs were modified or removed. The Omni-C contact maps show highly contiguous assemblies, with chromosome-length scaffolds (Fig. 2C and D). Assembly statistics are reported in Table 2 and represented graphically in (Fig. 2B). We have deposited the genome assembly on NCBI GenBank (see Table 2 and “Data availability” for details).

Assembly validation and comparison

The haplotype one assembly has a BUSCO completeness score for the Actinopterygii gene set of 99.3%, a base pair quality value (QV) of 64.02, a k-mer completeness of 97.32%, and a frameshift indel QV of 48.27. The haplotype two assembly has a BUSCO completeness score for the Actinopterygii gene set of 99.3%, a base pair QV of 64.55, a k-mer completeness of 97.32%, and a frameshift indel QV of 48.51. This was in keeping with other high-quality Sebastes rockfish assemblies which each had between 98.9% and 99.3% of BUSCOs complete (Table 3). The completeness of the alternative, or haplotype 2, S. paucispinius exceeded the completeness of other rockfish assemblies (including those with comparable haplotype 1 completeness) with a BUSCO missingness of only 0.1%, the same as haplotype 1, compared to a range of 0.2% to 3.3% missing across other assemblies.

Table 3.

Summary of BUSCOs across recent haplotype-resolved rockfish assemblies.

Common name	Latin name	Genome assembly initiative	Haplotype—NCBI accession	BUSCO results C = Complete, S = Single copy, D = Duplicated, F = Fragmented, M = Missing, n = number of BUSCOs
Bocaccio rockfish	*Sebastes paucispinis*	CCGP	hap 1 - GCA_036937225.1	C:99.3%[S:98.8%,D:0.5%],F:0.6%,M:0.1%,n:3,640
Bocaccio rockfish	*Sebastes paucispinis*	CCGP	hap 2 - GCA_036937175.1	C:99.3%[S:98.9%,D:0.4%],F:0.6%,M:0.1%,n:3,640
Widow rockfish	Sebastes entomelas	CCGP	hap 1 - GCA_045837885.1	C:99.3%[S:98.7%,D:0.7%],F:0.5%,M:0.1%,n:3,640
Widow rockfish	Sebastes entomelas	CCGP	hap 2 - GCA_045838235.1	C:99.2%[S:98.5%,D:0.6%],F:0.5%,M:0.3%,n:3,640
Acadian redfish	Sebastes fasciatus	VGP	hap 1 - GCA_043250625.1	C:99.3%[S:98.7%,D:0.6%],F:0.5%,M:0.2%,n:3,640
Acadian redfish	Sebastes fasciatus	VGP	hap 2 - GCA_043250585.1	C:92.1%[S:88.9%,D:3.2%],F:3.7%,M:4.1%,n:3,640
Honeycomb rockfish	Sebastes umbrosus	VGP	hap 1 - GCF_015220745.1	C:98.9%[S:98.4%,D:0.5%],F:0.5%,M:0.6%,n:3,640
Honeycomb rockfish	Sebastes umbrosus	VGP	hap 2 - GCA_015220095.1	C:95.3%[S:94.3%,D:1.0%],F:1.4%,M:3.3%,n:3,640

Open in a new tab

Repeat content and transposable element diversity.

TEs and repeats spanned 45.66% of the genome in total (Fig. 3A). The most abundant category was Class II TEs (or retrotransposons), which span 23.26% of the genome, followed by unclassified repeats (11.97%), Class I TEs (or DNA transposons, 6.49%) and then simple repeats (2.23%; Fig. 3A). Our analysis of repeat element lengths shows that the median in repeat lengths is around 100 bp, while the longest detected repeat spans over 500 kb (Fig. 3B). The length distribution varies slightly across different repeat classes, with simple and low-complexity repeats being the shortest classes and satellites the longest.

Discussion

Here we present a haplotype-resolved chromosome-scale assembly of the bocaccio rockfish Sebastes paucispinis. The assembly comprises 211 scaffolds and spans 806.9 Mb, with a scaffold N50 of 34.6 Mb. This assembly is similar in quality to other recently assembled rockfish genomes, with 99.3% complete BUSCOs (and only 0.6% fragmented, and 0.1% missing), and represents a substantial step forward compared to rockfish assemblies produced with older sequencing and assembly technologies. The assembly highlights the high repeat content present in rockfish genomes, with over 45% of the genome being spanned by TEs and repeats.

Rockfish have become a focal clade for studying aging due to their broad variation in lifespan (Kolora et al. 2021). Previous comparative genomics investigations into the evolution of lifespan variation and longevity have been carried out using genomes produced with older, lower fidelity, sequencing technologies (Kolora et al. 2021), resulting in less contiguous/accurate reference assemblies. Recent advances in sequencing technologies and genome assembly tools such as HiFiasm (Cheng et al. 2021), used to produce this bocaccio genome assembly, represent the state of the art and pave the way for more comprehensive investigations into aging. In addition, haplotype-resolved PacBio HiFi genomes will bolster ongoing work into a host of other questions in the rockfish clade including studies into the basis of ecological speciation between rockfish species, instances of local adaptation, as well as more fundamental investigations into aspects of genomic variation, both in terms of nucleotide variation and structural variation (which relies heavily on having high-quality reference genomes).

Following their dramatic decline from 1960 to 1990 it is clear that bocaccio populations require careful management and monitoring to facilitate population/stock recovery (Love et al. 2002; He and Field 2017). As with other fish taxa, the designation and monitoring of new marine protected areas (MPAs), and associated genetic/genomic monitoring and assessment, to assist in stock recovery, can be a powerful tool to aid in the recovery of wild populations. Genomic resources can help in a multitude of ways allowing the production of genetic panels for genetic diversity assessments, and to identify subpopulation structure across species ranges. In the case of bocaccio, little is known about population structure across their vast range, and while pre-genomic microsatellite and simple sequence repeat markers revealed little evidence of significant population structure (Matala et al. 2004; Drake et al. 2010; Buonaccorsi et al. 2012), these approaches use only a tiny fraction of the genome (10s of markers). High-quality chromosome-scale genomes allow the identification of millions of markers, dramatically increasing the genomic resolution with which we can identify fine-scale population variation. These insights can then be translated into improved knowledge of population structure and ultimately guide recovery and population management plans (Bernatchez et al. 2017; Bernos et al. 2020). The interesting spatial distribution of bocaccio, particularly their abundance and preference for schooling near oil platforms, also provides a unique opportunity to study human-wildlife interactions and the impacts of anthropogenic structures on wild populations (Love et al. 2005).

This chromosome-scale bocaccio assembly was produced as part of the CCGP and fulfills the mission of producing genomic resources that will assist in the protection of California wildlife in the face of ever-increasing anthropogenic stressors. This high-quality bocaccio assembly will aid in bocaccio conservation and act as a resource that the broader scientific community can use to address fundamental biology questions using rockfish as a focal clade, including ongoing investigations into the evolution of lifespan variation.

Acknowledgments

PacBio Sequel II library prep and sequencing were carried out at the DNA Technologies and Expression Analysis Cores at the UC Davis Genome Center, supported by NIH Shared Instrumentation Grant 1S10OD010786-01. Deep sequencing of Omni-C libraries used the Novaseq S4 sequencing platforms at the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley, supported by NIH S10 OD018174 Instrumentation Grant. We thank the staff at the UC Davis DNA Technologies and Expression Analysis Cores and the UC Santa Cruz Paleogenomics Laboratory for their diligence and dedication to generating high-quality sequence data.

Contributor Information

Rishi De-Kayne, Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA.

Stacy Li, Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA.

Merly Escalona, Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA.

Runyang Nicolas Lou, Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA.

Juan Manuel Vazquez, Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA.

Gregory L Owens, Department of Biology, University of Victoria, Victoria, BC, Canada.

Sree Rohit Raj Kolora, Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA.

Conner Jainese, Marine Science Institute, University of California, Santa Barbara, Santa Barbara, CA, USA.

Katelin Seeto, Marine Science Institute, University of California, Santa Barbara, Santa Barbara, CA, USA.

Merit McCrea, Marine Science Institute, University of California, Santa Barbara, Santa Barbara, CA, USA.

Oanh Nguyen, DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California, Davis, Davis, CA, USA.

Noravit Chumchim, DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California, Davis, Davis, CA, USA.

Ruta Sahasrabudhe, DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California, Davis, Davis, CA, USA.

Colin W Fairbairn, Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA.

Richard E Green, Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA.

William E Seligmann, Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA.

Milton Love, Marine Science Institute, University of California, Santa Barbara, Santa Barbara, CA, USA.

Peter H Sudmant, Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA.

Author contributions

Rishi De-Kayne (Data curation, Formal analysis, Investigation, Resources, Software, Validation, Visualization, Writing—original draft, Writing—review & editing), Stacy Li (Conceptualization, Data curation, Formal analysis, Project administration, Resources, Software, Writing—original draft, Writing—review & editing), Merly Escalona (Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing—original draft, Writing—review & editing), Runyang Lou (Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing—original draft, Writing—review & editing), Gregory Owens (Validation, Visualization, Writing—review & editing), Sree Rohit Raj Kolora (Conceptualization, Data curation, Investigation, Methodology, Resources, Writing—review & editing), Conner Jainese (Investigation, Methodology, Resources), Katelin Seeto (Data curation, Investigation, Methodology), Merit McCrea (Data curation, Methodology, Resources), Oanh Nguyen (Data curation, Investigation, Methodology, Resources), Noravit Chumchim (Data curation, Investigation, Methodology, Resources), Ruta Sahasrabudhe (Data curation, Methodology, Resources), Colin W. Fairbairn (Investigation, Methodology, Resources), Richard Green (Investigation, Methodology, Project administration), William Seligmann (Investigation, Methodology, Resources), Milton Love (Data curation, Funding acquisition, Methodology, Project administration, Supervision, Writing—review & editing), and Peter Sudmant (Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing—original draft, Writing—review & editing)

Funding

This work was supported by the California Conservation Genomics Project, with funding provided to the University of California by the State of California, State Budget Act of 2019 [UC Award ID RSI-19-690224] to PHS, NIH NIGMS award R35GM13798 to PHS, and a North Pacific Research Board Award (NPRB P2112).

Data availability

Data generated for this study are available under NCBI BioProject PRJNA720569. Raw sequencing data for sample SEB-726 (NCBI BioSample SAMN36697770) are deposited in the NCBI Short Read Archive (SRA) under SRX25151838 (PacBio HiFi), and SRX25151839 + SRX25151840 (Dovetail Omni-C). Assembly scripts and other data for the analyses presented can be found at the following GitHub repository: www.github.com/ccgproject/ccgp_assembly

References

Abdennur N, Fudenberg G, Flyamer IM, Galitsyna AA, Goloborodko A, Imakaev M, Venev SV; Open2C. Pairtools: from sequencing data to chromosome contacts. PLoS Comput Biol. 2024:20:e1012164. [DOI] [PMC free article] [PubMed] [Google Scholar]
Abdennur N, Mirny LA. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics. 2020:36:311–316. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bernatchez L, Wellenreuther M, Araneda C, Ashton DT, Barth JMI, Beacham TD, Maes GE, Martinsohn JT, Miller KM, Naish KA, et al. Harnessing the power of genomics to secure the future of seafood. Trends Ecol Evol 2017:32:665–680. [DOI] [PubMed] [Google Scholar]
Bernos TA, Jeffries KM, Mandrak NE. Linking genomics and fish conservation decision making: a review. Rev Fish Biol Fish. 2020:30:587–604. [Google Scholar]
Buonaccorsi VP, Kimbrell CA, Lynn EA, Hyde JR. Comparative population genetic analysis of bocaccio rockfish Sebastes paucispinis using anonymous and gene-associated simple sequence repeat loci. J Hered. 2012:103:391–399. [DOI] [PubMed] [Google Scholar]
Challis R, Kumar S, Sotero-Caio C, Brown M, Blaxter M. Genomes on a Tree (GoaT): a versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life. Wellcome Open Res. 2023:8:24. [DOI] [PMC free article] [PubMed] [Google Scholar]
Challis R, Richards E, Rajan J, Cochrane G, Blaxter M. BlobToolKit—interactive quality assessment of genome assemblies. G3 (Bethesda). 2020:10:1361–1374. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021:18:170–175. [DOI] [PMC free article] [PubMed] [Google Scholar]
Drake JS, Berntson EA, Gustafson RG, Holmes EE, Levin PS, Tolimieri N, et al. (2010). Status review of five rockfish species in Puget Sound, Washington: Bocaccio (Sebastes paucispinis), canary rockfish (S. pinniger), yelloweye rockfish (S. ruberrimus), greenstriped rockfish (S. elongatus), and redstripe rockfish (S. proriger). U.S. Dept Commer., NOAA Tech. Memo. NMFS-NWFSC-108, 234 p.
Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012:28:3150–3152. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ghurye J, Pop M, Koren S, Bickhart D, Chin C-S. Scaffolding of long read assemblies using long range contact information. BMC Genomics. 2017:18:527. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ghurye J, Rhie A, Walenz BP, Schmitt A, Selvaraj S, Pop M, Phillippy AM, Koren S. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLoS Comput Biol. 2019:15:e1007273. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013:29:1072–1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
He X, Field JC. Stock assessment update: status of bocaccio, Sebastes paucispinis, in the conception, Monterey and Eureka INPFC areas for 2017. Pacific Fishery Management Council, Portland, Oregon; 2017. [Google Scholar]
Kerpedjiev P, Abdennur N, Lekschas F, McCallum C, Dinkla K, Strobelt H, Luber JM, Ouellette SB, Azhir A, Kumar N, et al. HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol. 2018:19:125. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kolora SRR, Owens GL, Vazquez JM, Stubbs A, Chatla K, Jainese C, Seeto K, McCrea M, Sandel MW, Vianna JA, et al. Origins and evolution of extreme life span in Pacific Ocean rockfishes. Science. 2021:374:842–847. [DOI] [PMC free article] [PubMed] [Google Scholar]
Korlach J, Gedman G, Kingan SB, Chin C-S, Howard JT, Audet J-N, Cantin L, Jarvis ED. De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads. GigaScience. 2017:6:1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee S, Bakker CR, Vitzthum C, Alver BH, Park PJ. Pairs and Pairix: a file format and a tool for efficient storage and retrieval for Hi-C read pairs. Bioinformatics. 2022:38:1729–1731. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv [q-bioGN]. 2013.
Love M, Schroeder D, Lenarz W. Distribution of bocaccio (Sebastes paucispinis) and cowcod (Sebastes levis) around oil platforms and natural outcrops off California with implications for larval production. Bull Mar Sci. 2005:77:397–408. [Google Scholar]
Love MS, Bizzarro JJ, Cornthwaite AM, Frable BW, Maslenikov KP. Checklist of marine and estuarine fishes from the AlaskaYukon Border, Beaufort Sea, to Cabo San Lucas, Mexico. Zootaxa. 2021:5053:1–285. [DOI] [PubMed] [Google Scholar]
Love MS, Passarelli JK. Miller and lea’s guide to the coastal marine fishes of California, 2nd ed. Davis, California, USA: UCANR Publications; 2020. [Google Scholar]
Love MS, Yoklavich M, Schroeder DM. Demersal fish assemblages in the Southern California Bight based on visual surveys in deep water. Environ Biol Fishes. 2009:84:55–68. [Google Scholar]
Love MS, Yoklavich M, Thorsteinson LK. The Rockfishes of the Northeast Pacific. Berkeley and Los Angeles, California, USA: University of California Press; 2002. [Google Scholar]
Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 2021:38:4647–4654. [DOI] [PMC free article] [PubMed] [Google Scholar]
Matala AP, Gray AK, Gharrett AJ, Love MS. Microsatellite variation indicates population genetic structure of bocaccio. N Am J Fish Manag. 2004:24:1189–1202. [Google Scholar]
Orr JW, Brown MA, Baker DC, Alaska Fisheries Science Center (U.S.). Guide to rockfishes (Scorpaenidae) of the genera Sebastes, Sebastolobus, and Adelosebastes of the Northeast Pacific Ocean. NOAA technical memorandum NMFS-AFSC; 1998. https://repository.library.noaa.gov/view/noaa/26628 [Google Scholar]
Ramírez F, Bhardwaj V, Arrigoni L, Lam KC, Grüning BA, Villaveces J, Habermann B, Akhtar A, Manke T. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat Commun. 2018:9:189. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ranallo-Benavidez TR, Jaron KS, Schatz MC. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 2020:11:1432. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Uliano-Silva M, Chow W, Fungtammasan A, Kim J, et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature. 2021:592:737–746. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020:21:245. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shaffer HB, Toffelmier E, Corbett-Detig RB, Escalona M, Erickson B, Fiedler P, Gold M, Harrigan RJ, Hodges S, Luckau TK, et al. Landscape genomics to enable conservation actions: The California Conservation Genomics Project. J Hered. 2022:113:577–588. [DOI] [PubMed] [Google Scholar]
Sim SB, Corpuz RL, Simmonds TJ, Geib SM. HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly. BMC Genomics. 2022:23:157. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smit AFA, Hubley R. RepeatModeler Open-1.0, 2008. Available fom http://www repeatmasker org [Google Scholar]
Smit AFA, Hubley R, Green P. RepeatMasker Open-4.0. 2013–2015, 2015.
Storer J, Hubley R, Rosen J, Wheeler TJ, Smit AF. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob DNA. 2021:12:2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Williams GD, Levin PS, Palsson WA. Rockfish in puget sound: an ecological history of exploitation. Mar Policy 2010:34:1010–1020. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CIT0001] Abdennur N, Fudenberg G, Flyamer IM, Galitsyna AA, Goloborodko A, Imakaev M, Venev SV; Open2C. Pairtools: from sequencing data to chromosome contacts. PLoS Comput Biol. 2024:20:e1012164. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0002] Abdennur N, Mirny LA. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics. 2020:36:311–316. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0003] Bernatchez L, Wellenreuther M, Araneda C, Ashton DT, Barth JMI, Beacham TD, Maes GE, Martinsohn JT, Miller KM, Naish KA, et al. Harnessing the power of genomics to secure the future of seafood. Trends Ecol Evol 2017:32:665–680. [DOI] [PubMed] [Google Scholar]

[CIT0004] Bernos TA, Jeffries KM, Mandrak NE. Linking genomics and fish conservation decision making: a review. Rev Fish Biol Fish. 2020:30:587–604. [Google Scholar]

[CIT0005] Buonaccorsi VP, Kimbrell CA, Lynn EA, Hyde JR. Comparative population genetic analysis of bocaccio rockfish Sebastes paucispinis using anonymous and gene-associated simple sequence repeat loci. J Hered. 2012:103:391–399. [DOI] [PubMed] [Google Scholar]

[CIT0006] Challis R, Kumar S, Sotero-Caio C, Brown M, Blaxter M. Genomes on a Tree (GoaT): a versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life. Wellcome Open Res. 2023:8:24. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0007] Challis R, Richards E, Rajan J, Cochrane G, Blaxter M. BlobToolKit—interactive quality assessment of genome assemblies. G3 (Bethesda). 2020:10:1361–1374. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0008] Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021:18:170–175. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0009] Drake JS, Berntson EA, Gustafson RG, Holmes EE, Levin PS, Tolimieri N, et al. (2010). Status review of five rockfish species in Puget Sound, Washington: Bocaccio (Sebastes paucispinis), canary rockfish (S. pinniger), yelloweye rockfish (S. ruberrimus), greenstriped rockfish (S. elongatus), and redstripe rockfish (S. proriger). U.S. Dept Commer., NOAA Tech. Memo. NMFS-NWFSC-108, 234 p.

[CIT0010] Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012:28:3150–3152. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0011] Ghurye J, Pop M, Koren S, Bickhart D, Chin C-S. Scaffolding of long read assemblies using long range contact information. BMC Genomics. 2017:18:527. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0012] Ghurye J, Rhie A, Walenz BP, Schmitt A, Selvaraj S, Pop M, Phillippy AM, Koren S. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLoS Comput Biol. 2019:15:e1007273. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0013] Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013:29:1072–1075. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0014] He X, Field JC. Stock assessment update: status of bocaccio, Sebastes paucispinis, in the conception, Monterey and Eureka INPFC areas for 2017. Pacific Fishery Management Council, Portland, Oregon; 2017. [Google Scholar]

[CIT0015] Kerpedjiev P, Abdennur N, Lekschas F, McCallum C, Dinkla K, Strobelt H, Luber JM, Ouellette SB, Azhir A, Kumar N, et al. HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol. 2018:19:125. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0016] Kolora SRR, Owens GL, Vazquez JM, Stubbs A, Chatla K, Jainese C, Seeto K, McCrea M, Sandel MW, Vianna JA, et al. Origins and evolution of extreme life span in Pacific Ocean rockfishes. Science. 2021:374:842–847. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0017] Korlach J, Gedman G, Kingan SB, Chin C-S, Howard JT, Audet J-N, Cantin L, Jarvis ED. De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads. GigaScience. 2017:6:1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0018] Lee S, Bakker CR, Vitzthum C, Alver BH, Park PJ. Pairs and Pairix: a file format and a tool for efficient storage and retrieval for Hi-C read pairs. Bioinformatics. 2022:38:1729–1731. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0019] Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv [q-bioGN]. 2013.

[CIT0020] Love M, Schroeder D, Lenarz W. Distribution of bocaccio (Sebastes paucispinis) and cowcod (Sebastes levis) around oil platforms and natural outcrops off California with implications for larval production. Bull Mar Sci. 2005:77:397–408. [Google Scholar]

[CIT0021] Love MS, Bizzarro JJ, Cornthwaite AM, Frable BW, Maslenikov KP. Checklist of marine and estuarine fishes from the AlaskaYukon Border, Beaufort Sea, to Cabo San Lucas, Mexico. Zootaxa. 2021:5053:1–285. [DOI] [PubMed] [Google Scholar]

[CIT0022] Love MS, Passarelli JK. Miller and lea’s guide to the coastal marine fishes of California, 2nd ed. Davis, California, USA: UCANR Publications; 2020. [Google Scholar]

[CIT0023] Love MS, Yoklavich M, Schroeder DM. Demersal fish assemblages in the Southern California Bight based on visual surveys in deep water. Environ Biol Fishes. 2009:84:55–68. [Google Scholar]

[CIT0024] Love MS, Yoklavich M, Thorsteinson LK. The Rockfishes of the Northeast Pacific. Berkeley and Los Angeles, California, USA: University of California Press; 2002. [Google Scholar]

[CIT0025] Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 2021:38:4647–4654. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0026] Matala AP, Gray AK, Gharrett AJ, Love MS. Microsatellite variation indicates population genetic structure of bocaccio. N Am J Fish Manag. 2004:24:1189–1202. [Google Scholar]

[CIT0027] Orr JW, Brown MA, Baker DC, Alaska Fisheries Science Center (U.S.). Guide to rockfishes (Scorpaenidae) of the genera Sebastes, Sebastolobus, and Adelosebastes of the Northeast Pacific Ocean. NOAA technical memorandum NMFS-AFSC; 1998. https://repository.library.noaa.gov/view/noaa/26628 [Google Scholar]

[CIT0028] Ramírez F, Bhardwaj V, Arrigoni L, Lam KC, Grüning BA, Villaveces J, Habermann B, Akhtar A, Manke T. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat Commun. 2018:9:189. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0029] Ranallo-Benavidez TR, Jaron KS, Schatz MC. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 2020:11:1432. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0030] Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Uliano-Silva M, Chow W, Fungtammasan A, Kim J, et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature. 2021:592:737–746. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0031] Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020:21:245. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0032] Shaffer HB, Toffelmier E, Corbett-Detig RB, Escalona M, Erickson B, Fiedler P, Gold M, Harrigan RJ, Hodges S, Luckau TK, et al. Landscape genomics to enable conservation actions: The California Conservation Genomics Project. J Hered. 2022:113:577–588. [DOI] [PubMed] [Google Scholar]

[CIT0033] Sim SB, Corpuz RL, Simmonds TJ, Geib SM. HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly. BMC Genomics. 2022:23:157. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0034] Smit AFA, Hubley R. RepeatModeler Open-1.0, 2008. Available fom http://www repeatmasker org [Google Scholar]

[CIT0035] Smit AFA, Hubley R, Green P. RepeatMasker Open-4.0. 2013–2015, 2015.

[CIT0036] Storer J, Hubley R, Rosen J, Wheeler TJ, Smit AF. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob DNA. 2021:12:2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0037] Williams GD, Levin PS, Palsson WA. Rockfish in puget sound: an ecological history of exploitation. Mar Policy 2010:34:1010–1020. [Google Scholar]

PERMALINK

A haplotype-resolved genome assembly of the bocaccio rockfish, Sebastes paucispinis

Rishi De-Kayne

Stacy Li

Merly Escalona

Runyang Nicolas Lou

Juan Manuel Vazquez

Gregory L Owens

Sree Rohit Raj Kolora

Conner Jainese

Katelin Seeto

Merit McCrea

Oanh Nguyen

Noravit Chumchim

Ruta Sahasrabudhe

Colin W Fairbairn

Richard E Green

William E Seligmann

Milton Love

Peter H Sudmant

Roles

Abstract

Introduction

Fig. 1.

Methods

Biological materials

High molecular weight gDNA isolation

HiFi library preparation and sequencing

Omni-C library preparation and sequencing

Nuclear genome assembly

Table 1.

Genome quality assessment

Assembly validation and comparison

Repeat content and transposable element diversity

Results

Sequencing data

Fig. 2.

Nuclear genome assembly

Table 2.

Assembly validation and comparison

Table 3.

Fig. 3.

Discussion

Acknowledgments

Contributor Information

Author contributions

Funding

Data availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases