Skip to main content
Journal of Heredity logoLink to Journal of Heredity
. 2023 Dec 13;115(2):203–211. doi: 10.1093/jhered/esad078

Reference genome of Townsend’s big-eared bat, Corynorhinus townsendii

Samantha L R Capel 1, Natalie M Hamilton 2, Devaughn Fraser 3, Merly Escalona 4, Oanh Nguyen 5, Samuel Sacco 6, Ruta Sahasrabudhe 7, William Seligmann 8, Juan M Vazquez 9, Peter H Sudmant 10, Michael L Morrison 11, Robert K Wayne 12,a, Michael R Buchalski 13,
Editor: Shu-Jin Luo
PMCID: PMC10936552  PMID: 38092381

Abstract

Townsend’s big-eared bat, Corynorhinus townsendii, is a cave- and mine-roosting species found largely in western North America. Considered a species of conservation concern throughout much of its range, protection efforts would greatly benefit from understanding patterns of population structure, genetic diversity, and local adaptation. To facilitate such research, we present the first de novo genome assembly of C. townsendii as part of the California Conservation Genomics Project (CCGP). Pacific Biosciences HiFi long reads and Omni-C chromatin-proximity sequencing technologies were used to produce a de novo genome assembly, consistent with the standard CCGP reference genome protocol. This assembly comprises 391 scaffolds spanning 2.1 Gb, represented by a scaffold N50 of 174.6 Mb, a contig N50 of 23.4 Mb, and a benchmarking universal single-copy ortholog (BUSCO) completeness score of 96.6%. This high-quality genome will be a key tool for informed conservation and management of this vulnerable species in California and across its range.

Keywords: California Conservation Genomics Project, CCGP, Chiroptera, long-read assembly, Vespertilionidae

Introduction

Bats (Chiroptera; Blumenbach, 1779) are a highly diverse order comprising nearly a quarter of all mammalian species (Wilson and Reeder 1993) and provide crucial ecosystem services including pollination, seed dispersal, and pest control (Ramírez-Fráncel et al. 2022). Despite being among the most widely distributed terrestrial mammals, the elusive nature of bats hampers efforts to delineate species’ ecological and evolutionary dynamics. Moreover, many bat species are facing declines resulting from anthropogenic threats (Frick et al. 2020), further emphasizing the necessity for reliable assessments of population abundance, genetic diversity, and connectivity.

Townsend’s big-eared bat (Corynorhinus townsendii; Cooper, 1837; Fig. 1A), a medium-sized (10–12 g) whispering bat belonging to the Vespertilionidae family (Handley 1959; Piaggio and Perkins 2005), is considered a species of concern due to apparent declines throughout much of the native range over the last several decades (Harris et al. 2019). Currently, five subspecies of C. townsendii are recognized, occupying much of western North America (Pierson and Rainey 1998). The two subspecies found exclusively east of the Rocky Mountains, C. t. virginianus and C. t. ingens, each occupy highly restricted ranges and are both federally listed under the Endangered Species Act (US Fish and Wildlife Service 1979). The two western subspecies, C. t. townsendii and C. t. pallescens, are more widely distributed, but are considered a species of greatest conservation need by the California Department of Fish and Wildlife (CDFW) (Gonzales and Hoshi 2015). Little is known about the status of the final subspecies, C. t. autralis, which is found in the southern portion of the range extending into Mexico (Pierson and Rainey 1998).

Fig. 1.

Fig. 1

Townsend’s big-eared bat genome assembly. A) A Townsend’s big-eared bat, Corynorhinus townsendii (photo credit: Devaughn Fraser). B) NGx plot comparing high-quality Vespertillionidae genome assemblies. Shown is the cumulative proportion of the genome assembly for each scaffold (x axis) plotted against scaffold size (y axis) for each assembly: C. townsendii (COTO), Myotis myotis (MYMY; greater mouse-eared bat), Pipistrellus kuhlii (PIKU; Kuhl’s pipistrelle), P. pipistrellus (PIPI; common pipistrelle), Antrozous pallidus (ANPA; pallid bat), and Eptesicus fuscus (EPFU; big brown bat). C) BUSCO scores for each assembly nucleotide sequence. Species codes match those of B and asterisks indicate chromosome-level assemblies.

Attempts to evaluate population structure, connectivity, and genetic diversity among C. townsendii subspecies have relied on traditional molecular markers (Weyandt et al. 2005; Smith et al. 2008; Piaggio et al. 2009; Lee et al. 2015; Anderson et al. 2018), which greatly restrict genomic inference due to lack resolution and ability to resolve ecological function. Modern genome-wide studies of genetic variation could be accomplished by aligning sequence data to one of 15 Vespertilionid reference genomes currently available. However, Corynorhinus diverged from these species approximately 24 million years ago (Lack and Van Den Bussche 2010), which could result in erroneous alignments and/or large-scale read dropout (Bohling 2020).

Here, we report the first high-quality, near-chromosome-level assembly for C. townsendii as part of the California Conservation Genomics Project (CCGP), a collaborative effort among state and federal agencies, non-governmental organizations, and universities to generate a comprehensive, statewide genomics database to aid conservation efforts (Shaffer et al. 2022). We also compare benchmarking universal single-copy ortholog (BUSCO) completeness scores with other high-quality Vespertilionidae reference genomes. This reference genome will serve as a critical resource for the high-resolution characterization of genomic variation in this species and, ultimately, aid future management in California and across its range.

Methods

Biological materials

Sample collection

In September 2020, we captured an adult male C. townsendii from a decommissioned mine located near Big Pine, Inyo County, California. The specimen was collected by CDFW staff under the department’s jurisdiction as the trustee for wildlife management in the state of California, CA Fish & Game Code § 1802 (2015). We transported the animal to a CDFW laboratory facility where it was humanely euthanized via a combination of isoflurane and cervical dislocation. We immediately dissected the carcass and collected tissues, including aliquots of bone, eye, heart, skeletal muscle, skin, spleen, and testis. We washed tissue aliquots sequentially in molecular-grade water, ethanol, and water again before flash freezing them in liquid nitrogen. We reserved one aliquot of each tissue type for generating primary cell cultures.

Cell culturing for DNA extraction

We generated primary cell cultures from the donor bat following Yohe et al. (2019) with modifications (see Supplementary Materials, Section 1 for details). Following this protocol, we established cell lines from bone, eye, heart, skeletal muscle, skin, spleen, and testis and banked them in liquid nitrogen using a freezing medium consisting of 90% FBS, 10% DMSO, and 0.2% Primocin. We generated triplicates of 10 million cell aliquots for DNA extraction by seeding three T175 flasks with 2 M primary eye fibroblasts that were passaged as described after four days. We resuspended the cell pellets in 1 ml DPBS and transferred them to 1.5 ml microcentrifuge tubes, washed them once with 1 ml DBPS, and then flash froze the pellets in liquid nitrogen after removing the supernatant.

DNA library preparation and sequencing

HiFi library preparation and sequencing

We isolated high molecular weight genomic DNA from 10 million washed and frozen cells from culture by adding 2 ml of lysis buffer containing 100 mM NaCl, 10 mM Tris–HCl pH 8.0, 25 mM EDTA, 0.5% (w/v) SDS, and 100 µg/ml Proteinase K to the frozen cell pellet. We incubated the reaction at room temperature for several hours until the solution was homogenous, treated the lysate with 20 µg/ml RNase A at 37 °C for 30 min, and cleaned with equal volumes of phenol/chloroform using phase lock gels (Quantabio, Beverly, Massachusetts; Cat. # 2302830). We precipitated the DNA by adding 0.4× volume of 5 M ammonium acetate and 3× volume of ice-cold ethanol, then washed the DNA pellet twice with 70% ethanol and resuspended in elution buffer (10 mM Tris, pH 8.0). We assessed DNA purity using a NanoDrop ND-1000 spectrophotometer (260/280 ratio = 1.9, 260/230 ratio = 2.2) and concentration with a Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Waltham, Massachusetts), isolating a final mass of 20 µg. We verified the integrity of the purified DNA on a Femto pulse system (Agilent Technologies, Santa Clara, California) where 65% of the DNA was in fragments >50 kb.

We constructed the HiFi SMRTbell library using the SMRTbell Express Template Prep Kit v2.0 (Pacific Biosciences—PacBio, Menlo Park, California; Cat. #100-938-900) according to the manufacturer’s instructions. We sheared the DNA to a target size distribution between 15 and 20 kb using the Diagenode Megaruptor 3 system (Diagenode, Belgium; Cat. B06010003). We then concentrated the sheared DNA using 0.45× of AMPure PB beads (PacBio Cat. #100-265-900) for the removal of single-strand overhangs at 37 °C for 15 min, followed by further enzymatic steps of DNA damage repair at 37 °C for 30 min, end repair and A-tailing at 20 °C for 10 min and 65 °C for 30 min, ligation of overhang adapters v3 at 20 °C for 60 min and 65 °C for 10 min to inactivate the ligase, then nuclease treated at 37 °C for 1 h. We purified and concentrated the SMRTbell library with 0.45× Ampure PB beads and used the BluePippin/PippinHT system (Sage Science, Beverly, Massachusetts; Cat. #BLF7510/HPE7510) for size selection of fragments greater than 7–9 kb. We sequenced the final 15–20 kb average HiFi SMRTbell library at the UC Davis DNA Technologies Core (Davis, California) using four SMRT Cell 8M Trays (PacBio Cat. #101-389-001), Sequel II sequencing chemistry 2.0, and 30-h movies each on a PacBio Sequel II sequencer.

Omni-C library preparation and sequencing

We prepared the Omni-C library using the Dovetail Omni-C Kit (Dovetail Genomics, California) according to the manufacturer’s protocol with slight modifications. Briefly, we fixed chromatin in place in the nucleus, digested the fixed chromatin with DNase I followed by extraction. We repaired the chromatin ends and ligated to a biotinylated bridge adapter, followed by proximity ligation of adapter-containing ends. After proximity ligation, we reversed the crosslinks, purified the DNA from proteins, and treated it to remove biotin that was not internal to ligated fragments. We generated the sequencing library using the NEB Ultra II DNA Library Prep kit (NEB, Ipswich, Massachusetts) with an Illumina-compatible y-adaptor. We then captured biotin-containing fragments using streptavidin beads and split the post-capture product into two replicates prior to PCR enrichment to preserve library complexity with each replicate receiving unique dual indices. We sequenced the library at Vincent J. Coates Genomics Sequencing Laboratory (Berkeley, California) on an Illumina NovaSeq 6000 platform (Illumina, California) to generate approximately 100 million 2 × 150 bp read pairs per GB genome size.

Genome assembly

Nuclear genome

We assembled the C. townsendii genome following the CCGP assembly pipeline Version 5.0; lists of the tools, versions, and non-default parameters used in the assembly process are described in Table 1. The pipeline uses PacBio HiFi reads and Omni-C data to produce high quality and highly contiguous genome assemblies. First, we removed remnant adapter sequences from the PacBio HiFi dataset using HiFiAdapterFilt (Sim et al. 2022) and generated the initial dual or partially phased diploid assembly (http://lh3.github.io/2021/10/10/introducing-dual-assembly) using HiFiasm (Cheng et al. 2022) on Hi-C mode, with the filtered PacBio HiFi reads and the Omni-C dataset. We then aligned the Omni-C data to both assemblies following the Arima Genomics Mapping Pipeline (https://github.com/ArimaGenomics/mapping_pipeline) and scaffolded both assemblies with SALSA (Ghurye et al. 2017, 2019).

Table 1.

Assembly pipeline and software used. Software citations are listed in the text.

Assembly Software and any non-default options Version
Filtering PacBio HiFi adapters HiFiAdapterFilt Commit 64d1c7b
K-mer counting Meryl (k = 21) 1
Estimation of genome size and heterozygosity GenomeScope 2
De novo assembly (contiging) HiFiasm
(Hi-C Mode, –primary, output p_ctg.hap1, p_ctg.hap2)
0.16.1-r375
Scaffolding
 Omni-C data alignment Arima Genomics Mapping Pipeline Commit 2e74ea4
 Omni-C Scaffolding SALSA (-DNASE, -i 20, -p yes) 2
 Gap closing YAGCloser
(-mins 2 -f 20 -mcc 2 -prt 0.25 -eft 0.2 -pld 0.2)
Commit 0e34c3b
Omni-C contact map generation
 Short-read alignment BWA-MEM (-5SP) 0.7.17-r1188
 SAM/BAM processing samtools 1.11
 SAM/BAM filtering pairtools 0.3.0
 Pairs indexing pairix 0.3.7
 Matrix generation cooler 0.8.10
 Matrix balancing hicExplorer
(hicCorrectmatrix correct --filterThreshold -2 4)
3.6
 Contact map visualization HiGlass 2.1.11
PretextMap 0.1.4
PretextView 0.1.5
PretextSnapshot 0.0.3
 Manual curation tools Rapid curation pipeline (Wellcome Trust Sanger Institute, Genome Reference Informatics Team) Commit 4ddca450
Genome quality assessment
 Basic assembly metrics QUAST (--est-ref-size) 5.0.2
BUSCO (-m geno, -l mammalia) 5.0.0
 Assembly completeness Merqury 43,859
Contamination screening
 Local alignment tool BLAST+ (-db nt, -outfmt “6 qseqid staxids bitscore std,” -max_target_seqs 1, -max_hsps 1, -evalue 1e-25) 2.1
 General contamination screening BlobToolKit 2.3.3
Mitochondrial assembly
 Mitochondrial genome assembly MitoHiFi (-r, -p 50, -o 1) 2.2

The genome assemblies were manually curated by iteratively generating and analyzing their corresponding Omni-C contact maps. To generate the contact maps, we aligned the Omni-C data with BWA-MEM (Li 2013) then identified ligation junctions and generated Omni-C pairs using pairtools (Golobordko et al. 2018). We generated multi-resolution Omni-C matrices with cooler (Abdennur and Mirny 2020) and balanced it with hicExplorer (Ramírez et al. 2018). We used HiGlass (Kerpedjiev et al. 2018) and the PretextSuite (https://github.com/wtsi-hpag/PretextView; https://github.com/wtsi-hpag/PretextMap; https://github.com/wtsi-hpag/PretextView; https://github.com/wtsi-hpag/PretextSnapshot; https://github.com/wtsi-hpag/PretextMap; https://github.com/wtsi-hpag/PretextSnapshot) to visualize the contact maps where we identified mis-assemblies and mis-joins, and finally modified the assemblies using the Rapid Curation pipeline from the Wellcome Trust Sanger Institute, Genome Reference Informatics Team (https://gitlab.com/wtsi-grit/rapid-curation). Some of the remaining gaps (joins generated during scaffolding and curation) were closed using the PacBio HiFi reads and YAGCloser (https://github.com/merlyescalona/yagcloser). Finally, we checked for contamination using the BlobToolKit Framework (Challis et al. 2020).

Mitochondrial genome

We assembled the mitochondrial genome of C. townsendii from the PacBio HiFi reads using the reference-guided pipeline MitoHiFi (Allio et al. 2020; Uliano-Silva et al. 2022). We used a closely related mitochondrial sequence archived as Plecotus rafinesquii (NCBI:NC_016872.1; Meganathan et al. 2012) as the starting sequence—P. refinesquii has subsequently been reclassified as belonging to the genus Corynorhinus. After completion of the nuclear genome, we searched for matches of the resulting mitochondrial assembly sequence in the nuclear genome assembly using BLAST+ (Camacho et al. 2009) and filtered out contigs and scaffolds from the nuclear genome with a percentage of sequence identity >99% and size smaller than the mitochondrial assembly sequence.

Evaluating the assembly

We generated k-mer counts from the PacBio HiFi reads using meryl (https://github.com/marbl/meryl). The k-mer counts were then used in GenomeScope2.0 (Ranallo-Benavidez et al. 2020) to estimate genome features including genome size, heterozygosity, and repeat content. To obtain general contiguity metrics, we ran QUAST (Gurevich et al. 2013). To evaluate genome quality and functional completeness we used BUSCO (Manni et al. 2021) with the Mammalian ortholog database (mammalia_odb10) which contains 9,226 genes. Assessment of base-level accuracy (QV) and k-mer completeness was performed using the previously generated meryl database and merqury (Rhie et al. 2020). We further estimated genome assembly accuracy via BUSCO gene set frameshift analysis using the pipeline described in Korlach et al. (2017). Measurements of the size of the phased blocks are based on the size of the contigs generated by HiFiasm on Hi-C mode. We followed the quality metric nomenclature established by (Rhie et al. 2021), with the genome quality code x·y·P·Q·C, where, x = log10 [contig NG50]; y = log10 [scaffold NG50]; P = log10 [phased block NG50]; Q = Phred base accuracy QV (quality value); C = % genome represented by the first “n” scaffolds, following a karyotype of 2n = 32 (Baker and Patton 1967). Quality metrics for the notation were calculated on the assembly for Haplotype 1.

Assembly comparisons

To evaluate how this assembly compares to other high-quality bat genomes, we compared BUSCO completeness scores with other Vespertilionidae reference assemblies that utilized long-read sequencing technologies. This comprised five species including Eptesicus fuscus (GCA_027574615.1; Paulat et al. 2023), Pipistrellus pipistrellus (GCA_903992545.1; Vine et al. 2021), Pipistrellus kuhlii (GCA_014108245.1; Jebb et al. 2020), Antrozous pallidus (GCA_027563665.1; Paulat et al. 2023), and Myotis myotis (GCA_014108235.1; Jebb et al. 2020; all accessed 12 April 2023). To standardize BUSCO scores across assemblies, we performed base-level accuracy assessments on all DNA genome assemblies against mammalia_odb10.

Results

Nuclear genome assembly

The Omni-C and PacBio HiFi sequencing libraries generated 133.43 million read pairs and 3.98 million reads, respectively. The latter yielded ~32-fold coverage (N50 read length 16,382 bp, minimum read length 44 bp, mean read length 16,196 bp, and maximum read length of 52,242 bp) based on the Genomescope 2.0 genome size estimation of 1.97 Gb. Based on PacBio HiFi reads, we estimated 0.157% sequencing error rate and 0.492% nucleotide heterozygosity rate. The k-mer spectrum based on PacBio HiFi reads show a bimodal distribution with two major peaks at ~16 and ~32-fold coverage (Fig. 2A), where peaks correspond to diploid homozygous and heterozygous states.

Fig. 2.

Fig. 2

Visualization of genome assembly metrics. A) K-mer spectra generated from PacBio HiFi sequencing data using GenomeScope2.0. B) BlobToolKit Snail plot representing quality metrics for the primary Corynorhinus townsendii assembly (mCorTow1). The plot circle represents the full length of the assembly. The central red arc and corresponding line indicate the length of the longest scaffold. All other scaffold lengths are shown in dark gray ordered from largest to smallest moving clockwise with lengths indicated by the vertical axis located at 12 o’clock. The central light gray circle shows the cumulative scaffold count using log10 scale. The dark and light orange arcs indicate the scaffold N50 and scaffold N90 values, respectively. The dark to light blue ratios around the outside of the circle represent the proportion of AT to GC content at 0.1% length intervals. C) The Omni-C contact map for genome assemblies Haplotype 1 and D) Haplotype 2. Contact maps translate the proximity of sequenced regions in 3D space to linear order of sequences. Scaffolds are separated by vertical and horizontal lines.

The final assembly (mCorTow1) consists of two partially phased haplotype assemblies that vary slightly in size compared with the estimated value from Genomescope2.0 (Fig. 2A), as has been observed in other taxa (see Pflug et al. 2020, e.g.). Haplotype 1 consists of 610 scaffolds spanning 2.1 Gb with a contig N50 of 23.38 Mb, scaffold N50 of 174.69 Mb, largest contig of 70.93 Mb, and largest scaffold of 233.46 Mb. The Haplotype 2 assembly consists of 399 scaffolds spanning 1.96 Mb with contig N50 of 22.15 Mb, scaffold N50 of 177.75 Mb, largest contig 77.65 Mb, and largest scaffold of 237.41 Mb. Assembly statistics are reported in Table 2 and graphical representation for the Haplotype 1 assembly in Fig. 2B (see Supplementary Fig. S1 for Haplotype 2 graphical representation).

Table 2.

Sequencing and assembly statistics, and accession numbers.

Bio projects & vouchers CCGP NCBI BioProject PRJNA720569
Genera NCBI BioProject PRJNA765806
Species NCBI BioProject PRJNA777157
NCBI BioSample SAMN31536067
Specimen identification COTO_CA2020_CCGP
NCBI Genome accessions Primary Alternate
Assembly accession JAPDVT000000000 JAPDVU000000000
Genome sequences GCA_026230055.1 GCA_026230045.1
Genome sequence PacBio HiFi reads Run 1 PACBIO_SMRT (Sequel II) run: 4M spots, 64.5G bases, 44.4Gb
Accession SRR23445762
Omni-C Illumina reads Run 2 ILLUMINA (Illumina NovaSeq 6000) run, 133.4M spots, 40.3G bases, 13.4Gb
Accession SRR23445761, SRR23445763
Genome assembly quality metrics Assembly identifier (Quality codea) mCorTow1(7.8.P.Q64.C98)
HiFi Read coverageb 32.31X
Haplotype 1 Haplotype 2
Number of contigs 610 399
Contig N50 (bp) 23,382,908 22,150,609
Contig NG50b 24,508,096 22,150,609
Longest Contigs 70,937,382 77,651,888
Number of scaffolds 391 182
Scaffold N50 174,690,156 177,756,282
Scaffold NG50b 178,686,506 177,756,282
Largest scaffold 233,461,832 237,418,211
Size of final assembly 2,104,912,948 1,961,562,149
Phased block NG50b 24,508,096 22,150,609
Gaps per Gbp (# Gaps) 104(219) 111(217)
Indel QV (Frame shift) 40.23 38.6
Base pair QV 64.7466 64.6825
Full assembly = 64.7155
k-mer completeness 94.6054 89.9054
Full assembly = 99.5751
BUSCO completeness (n = 9,226) C S D F M
H1c 96.60% 93.80% 2.80% 0.60% 2.80%
H2c 94.70% 92.00% 2.70% 0.60% 4.70%
Organelles 1 Complete mitochondrial sequence CM047939

aAssembly quality code x·y·P·Q·C derived notation, from (Rhie et al. 2021). x = log10 [contig NG50]; y = log10 [scaffold NG50]; P = log10 [phased block NG50]; Q = Phred base accuracy QV (Quality value); C = % genome represented by the first ‘n’ scaffolds, following a known karyotype for C. townsendii of 2n = 32 (Baker and Patton, 1967). Quality code for all the assembly denoted by primary assembly (mCorTown1.0.hap1).

bRead coverage and NGx statistics have been calculated based on the estimated genome size of 1.997 Gb.

c(H1) Haplotype 1 and (H2) Haplotype 2 values.

During manual curation, we generated a total of 33 breaks and 222 joins where 14 breaks and 122 joins were made on Haplotype 1 and 19 breaks and 100 joins were made on Haplotype 2. We were able to close a total of 25 gaps, 12 on Haplotype 1 and 13 on Haplotype 2. Finally, we filtered out 3 contigs from Haplotype 1 corresponding to Arthropod (2) and Platyhelminthes (1) contamination.

Mitochondrial genome assembly

The final mitochondrial sequence has a size of 16,447 bp, base composition of A = 32.05%, C = 26.18%, G = 14.99%, T = 26.78%, and consists of 22 unique transfer RNAs and 13 protein-coding genes.

Genome quality assessment

Haplotype 1 has a BUSCO completeness score of 96.6% using the Mammalian gene set, a per base quality (QV) of 64.74, a k-mer completeness of 94.60%, and a frameshift indel QV of 38.6. Haplotype 2 has a BUSCO completeness score of 94.7% using the same gene set, a per base quality (QV) of 64.68, a k-mer completeness of 89.90%, and a frameshift indel QV of 40.23. The Omni-C contact maps show that both assemblies are highly contiguous with some chromosome-length scaffolds (Fig. 2C and D). We have deposited scaffolds corresponding to both haplotypes on NCBI (see Table 2 and Data availability for details).

Assembly comparisons

Overall genome assembly quality is comparable to other highly contiguous long-read Vespertilionidae assemblies. The first 16 scaffolds comprise 98.37% of the cumulative assembly coverage (Fig. 1B). Given a known karyotype of 2n = 32 (Baker and Patton 1967), these scaffolds are highly concordant with chromosomes. Mammalia BUSCO completeness scores (Fig. 1C) for Haplotype 1 fall within the ranges of those achieved by chromosome-level Vespertilionidae assemblies.

Discussion

We present a high-quality reference genome for C. townsendii. The contiguity and completeness of this assembly is analogous to that of a chromosome-level assembly. Given a contig N50 of 22 Mb, scaffold N50 of 175 Mb, and a base pair QV of 65, this assembly surpasses the CCGP minimum quality targets (Shaffer et al. 2022) based on standards outlined by the Vertebrate Genome Project (Rhie et al. 2021; Shaffer et al. 2022).

This reference genome is among the highest-quality Vespertilionidae assemblies to date. Of the six reference genomes compared in Fig. 1, our assembly has the lowest total gapped length (0.02 Mb; 0.03 Mb for E. fuscus, 0.5 Mb for P. pipistrellus, 6 Mb for A. pallidus, 12 Mb for P. kuhlii, and 29 Mb for M. myotis). Furthermore, our assembly had the third largest contig N50 length (22 Mb) among the considered genomes (49 Mb for E. fuscus, 42 Mb for A. pallidus, 13 Mb for M. myotis, 11 Mb for P. kuhlii, and 4 Mb for P. pipistrellus). We also achieved the highest BUSCO completeness score (96.6%) compared with other high-quality Vespertilionidae assemblies (96.3% for E. fuscus, 96.0% for P. kuhlii, 95.8% for A. pallidus, 94.0% for M. myotis, and 89.6% for P. pipistrellus).

There has been much ambiguity surrounding attempts to resolve the Vespertilionidae phylogeny, as well as the order Chiroptera more broadly, largely due to bats being an exceedingly diverse group occupying a wide range of habitats, making their evolutionary histories challenging to delineate (Jones et al. 2002; Lack and Van Den Bussche 2010). Thus far, broad-scale phylogenetic studies of bats have relied heavily on analyses of fewer than 10 loci (Jones et al. 2002; Stadelmann et al. 2007; Lack and Van Den Bussche 2010; Agnarsson et al. 2011). This reference genome is the first of its genus and provides an important resource for studies seeking to resolve evolutionary relationships utilizing genome-wide loci, comparisons of genomic architecture, and other phylogenomic techniques.

The genome assembly described here will serve as a vital resource for conservation management. C. townsendii is designated as a Species of Special Concern by the CDFW throughout its statewide range (California Natural Diversity Database 2023). This reference genome will aid ongoing CCGP whole-genome resequencing projects assessing genetic structure, population connectivity, and local adaptation for the purpose of establishing Management Units. Furthermore, given that virtually all C. townsendii subspecies are subjects of conservation concern (US Fish and Wildlife Service 1979; Pierson and Rainey 1998; Harris et al. 2019), this assembly will facilitate future work aimed at employing genomic techniques to conservation management across the remainder of its range.

This highly contiguous, accurate, and complete reference genome is part of the California Conservation Genomics Program initiative to create a comprehensive genomic dataset to assist regional biodiversity conservation management. This assembly also acts as an excellent example of the progression the field of wildlife management has made toward utilizing high-quality genomic resources in biodiversity conservation. We hope that our reference assembly and future work that utilizes this or any other genomic resource(s) generated by the CCGP inspires other state wildlife management organizations to employ high-quality genomic resources to facilitate regional and national biodiversity conservation.

Supplementary Material

esad078_suppl_Supplementary_Figures_S1

Acknowledgments

PacBio Sequel II library prep and sequencing was carried out at the DNA Technologies and Expression Analysis Cores at the UC Davis Genome Center, supported by NIH Shared Instrumentation Grant 1S10OD010786-01. Deep sequencing of Omni-C libraries used the Novaseq S4 sequencing platforms at the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley, supported by NIH S10 OD018174 Instrumentation Grant. We thank the staff at the UC Davis DNA Technologies and Expression Analysis Cores and the UC Santa Cruz Paleogenomics Laboratory for their diligence and dedication to generating high-quality sequence data. We also thank Dr. Courtney Miller for project coordination. This work used computational and storage services associated with the Hoffman2 Shared Cluster provided by UCLA Institute for Digital Research and Education’s Research Technology Group.

Contributor Information

Samantha L R Capel, Wildlife Genetics Research Unit, Wildlife Health Laboratory, California Department of Fish and Wildlife, Sacramento, CA, United States.

Natalie M Hamilton, Department of Rangeland Wildlife and Fisheries Management, Texas A&M University, College Station, TX, United States.

Devaughn Fraser, Connecticut Department of Energy and Environmental Protection, Hartford, CT, United States.

Merly Escalona, Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States.

Oanh Nguyen, DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California Davis, Davis, CA, United States.

Samuel Sacco, Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, United States.

Ruta Sahasrabudhe, DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California Davis, Davis, CA, United States.

William Seligmann, Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, United States.

Juan M Vazquez, Department of Integrative Biology, University of California Berkeley, Berkeley, CA, United States.

Peter H Sudmant, Department of Integrative Biology, University of California Berkeley, Berkeley, CA, United States.

Michael L Morrison, Department of Rangeland Wildlife and Fisheries Management, Texas A&M University, College Station, TX, United States.

Robert K Wayne, Department of Ecology and Evolution, University of California Los Angeles, Los Angeles, CA, United States.

Michael R Buchalski, Wildlife Genetics Research Unit, Wildlife Health Laboratory, California Department of Fish and Wildlife, Sacramento, CA, United States.

Funding

This work was supported by the California Conservation Genomics Project, with funding provided to the University of California by the State of California, State Budget Act of 2019 [UC Award ID RSI-19-690224].

Conflict of interest statement. None declared.

Data availability

Data generated for this study are available under NCBI BioProject PRJNA896196. Raw sequencing data for sample COTO_CA2020_CCGP (NCBI BioSample SAMN31536067) are deposited in the NCBI Short Read Archive (SRA) under SRX19355142 for the PacBio HiFi sequencing data, and SRX19355143 and SRX19355144 for the Omni-C Illumina sequencing data. GenBank accessions for both Haplotype 1 and Haplotype 2 genome sequences are GCA_026230045.1 and GCA_026230055.1 and assembly accessions are JAPDVU000000000 and JAPDVT000000000, respectively. The mitochondrial genome assembly GenBank accession is CM047939.1. Assembly scripts and other data for the analyses presented can be found at the following GitHub repository: www.github.com/ccgproject/ccgp_assembly.www.github.com/ccgproject/ccgp_assembly.

References

  1. Abdennur N, Mirny LA.. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics. 2020:36:311–316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Agnarsson I, Zambrana-Torrelio CM, Flores-Saldana NP, May-Collado LJ.. A time-calibrated species-level phylogeny of bats (Chiroptera, Mammalia). PLoS Curr. 2011:3:RRN1212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Allio R, Schomaker-Bastos A, Romiguier J, Prosdocimi F, Nabholz B, Delsuc F.. MitoFinder: efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics. Mol Ecol Resour. 2020:20:892–905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Anderson AP, Light JE, Takano OM, Morrison ML.. Population structure of the Townsend’s big-eared bat (Corynorhinus townsendii townsendii) in California. J Mammal. 2018:99:646–658. [Google Scholar]
  5. Baker RJ, Patton JL.. Karyotypes and karyotypic variation of North American vespertilionid bats. J Mammal. 1967:48:270–286. [Google Scholar]
  6. Bohling J. Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets. Ecol Evol. 2020:10:7585–7601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. California Natural Diversity Database. Special animals list. Sacramento, CA, USA: California Department of Fish and Wildlife; 2023. [Google Scholar]
  8. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL.. BLAST+: architecture and applications. BMC Bioinf. 2009:10:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Challis R, Richards E, Rajan J, Cochrane G, Blaxter M.. BlobToolKit – interactive quality assessment of genome assemblies. G3 Genes|Genomes|Genetics. 2020:10:1361–1374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cheng H, Jarvis ED, Fedrigo O, Koepfli KP, Urban L, Gemmell NJ, Li H.. Haplotype-resolved assembly of diploid genomes without parental data. Nat Biotechnol. 2022:40:1332–1335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Frick WF, Kingston T, Flanders J.. A review of the major threats and challenges to global bat conservation. Ann N Y Acad Sci. 2020:1469:5–25. [DOI] [PubMed] [Google Scholar]
  12. Ghurye J, Pop M, Koren S, Bickhart D, Chin CS.. Scaffolding of long read assemblies using long range contact information. BMC Genomics. 2017:18:1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Ghurye J, Rhie A, Walenz BP, Schmitt A, Selvaraj S, Pop M, Phillippy AM, Koren S.. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLoS Comput Biol. 2019:15:e1007273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Golobordko A, Abdennur N, Venev S, Brandao B, Fudenberg G.. mirnylab/pairtools. 2018.
  15. Gonzales AG, Hoshi J.. California state wildlife action plan, 2015 update: a conservation legacy for Californians. 2015. [Google Scholar]
  16. Gurevich A, Saveliev V, Vyahhi N, Tesler G.. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013:29:1072–1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Handley CO. A revision of American bats of the genera Euderma and Plecotus. Proc U S Natl Mus. 1959:110:95–246. [Google Scholar]
  18. Harris LS, Morrison ML, Szewczak JM, Osborn SD.. Assessment of the status of the Townsend’s big-eared bat in California. Calif Fish Game. 2019:105:101–119. [Google Scholar]
  19. Jebb D, Huang Z, Pippel M, Hughes GM, Lavrichenko K, Devanna P, Winkler S, Jermiin LS, Skirmuntt EC, Katzourakis A, et al. Six reference-quality genomes reveal evolution of bat adaptations. Nature. 2020:583:578–584. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Jones KE, Purvis A, Maclarnon ANN, Bininda-Emonds ORP, Simmons NB.. A phylogenetic supertree of the bats (Mammalia: Chiroptera). Biol Rev Camb Philos Soc. 2002:77:223–259. [DOI] [PubMed] [Google Scholar]
  21. Kerpedjiev P, Abdennur N, Lekschas F, McCallum C, Dinkla K, Strobelt H, Luber JM, Ouellette SB, Azhir A, Kumar N, et al. HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol. 2018:19:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Korlach J, Gedman G, Kingan SB, Chin C-S, Howard JT, Audet J-N, Cantin L, Jarvis ED.. De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads. GigaScience. 2017:6:1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Lack JB, Van Den Bussche RA.. Identifying the confounding factors in resolving phylogenetic relationships in Vespertilionidae. J Mammal. 2010:91:1435–1448. [Google Scholar]
  24. Lee DN, Stark RC, Puckette WL, Hamilton MJ, Leslie DM, Van Den Bussche RA.. Population connectivity of endangered Ozark big-eared bats (Corynorhinus townsendii ingens). J Mammal. 2015:96:522–530. [Google Scholar]
  25. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv, arXiv:1303.3997, 2013. 10.48550/arXiv.1303.3997. [DOI] [Google Scholar]
  26. Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM.. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 2021:38:4647–4654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Meganathan PR, Pagan HJT, McCulloch ES, Stevens RD, Ray DA.. Complete mitochondrial genome sequences of three bats species and whole genome mitochondrial analyses reveal patterns of codon bias and lend support to a basal split in Chiroptera. Gene. 2012:492:121–129. [DOI] [PubMed] [Google Scholar]
  28. Paulat NS, Storer JM, Moreno-Santillán DD, Osmanski AB, Sullivan KAM, Grimshaw JR, Korstian J, Halsey M, Garcia CJ, Crookshanks C, et al. ; Zoonomia Consortium. Chiropterans are a hotspot for horizontal transfer of DNA transposons in Mammalia. Mol Biol Evol. 2023:40:msad092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Pflug JM, Holmes VR, Burrus C, Spencer Johnston J, Maddison DR.. Measuring genome sizes using read-depth, k-mers, and flow cytometry: methodological comparisons in Beetles (Coleoptera). G3 Genes|Genomes|Genetics. 2020:10:3047–3060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Piaggio AJ, Navo KW, Stihler CW.. Intraspecific comparison of population structure, genetic diversity, and dispersal among three subspecies of Townsend’s big-eared bats, Corynorhinus townsendii townsendii, C. t. pallescens, and the endangered C t virginianus. Conserv Genet. 2009:10:143–159. [Google Scholar]
  31. Piaggio AJ, Perkins SL.. Molecular phylogeny of North American long-eared bats (Vespertilionidae: Corynorhinus); inter- and intraspecific relationships inferred from mitochondrial and nuclear DNA sequences. Mol Phylogenet Evol. 2005:37:762–775. [DOI] [PubMed] [Google Scholar]
  32. Pierson E, Rainey W.. Distribution, status, and management of Townsend’s big-eared bat (Corynorhinus townsendii) in California. 1998. [Google Scholar]
  33. Ramírez F, Bhardwaj V, Arrigoni L, Lam KC, Grüning BA, Villaveces J, Habermann B, Akhtar A, Manke T.. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat Commun. 2018:9:189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Ramírez-Fráncel LA, García-Herrera LV, Losada-Prado S, Reinoso-Flórez G, Sánchez-Hernández A, Estrada-Villegas S, Lim BK, Guevara G.. Bats and their vital ecosystem services: a global review. Integr Zool. 2022:17:2–23. [DOI] [PubMed] [Google Scholar]
  35. Ranallo-Benavidez TR, Jaron KS, Schatz MC.. GenomeScope 20 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 2020:11:1432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Uliano-Silva M, Chow W, Fungtammasan A, Kim J, et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature. 2021:592:737–746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Rhie A, Walenz BP, Koren S, Phillippy AM.. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020:21:245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Shaffer HB, Toffelmier E, Corbett-Detig RB, Escalona M, Erickson B, Fiedler P, Gold M, Harrigan RJ, Hodges S, Luckau TK, et al. Landscape genomics to enable conservation actions: the California conservation genomics project. J Hered. 2022:113:577–588. [DOI] [PubMed] [Google Scholar]
  39. Sim SB, Corpuz RL, Simmonds TJ, Geib SM.. HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly. BMC Genomics. 2022:23:1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Smith SJ, Leslie DM, Hamilton MJ, Lack JB, Van Den Bussche RA.. Subspecific affinities and conservation genetics of Western big-eared bats (Corynorhinus townsendii pallescens) at the edge of their distributional range. J Mammal. 2008:89:799–814. [Google Scholar]
  41. Stadelmann B, Lin L-K, Kunz TH, Ruedi M.. Molecular phylogeny of New World Myotis (Chiroptera, Vespertilionidae) inferred from mitochondrial and nuclear DNA genes. Mol Phylogenet Evol. 2007:43:32–48. [DOI] [PubMed] [Google Scholar]
  42. Uliano-Silva M, Ferreira JGRN, Krasheninnikova K, Darwin Tree of Life Consortium, Formenti G, Abueg L, Torrance J, Myers EW, Durbin R, Blaxter M, et al. MitoHiFi: a python pipeline for mitochondrial genome assembly from PacBio High Fidelity reads. bioRxiv. 2012–2022, 2022. 10.1101/2022.12.23.521667. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. US Fish and Wildlife Service. Endangered and threatened wildlife and plants; listing of Virginia and Ozark big-eared bats as endangered species, and critical habitat determination. Fed Regist. 1979:44:69206–69208. [Google Scholar]
  44. Vine C, Teeling EC, Smith M, Corton C, Oliver K, Skelton J, Betteridge E, Doulcan J, Quail MA, McCarthy SA, et al. The genome sequence of the common pipistrelle, Pipistrellus pipistrellus Schreber 1774. Wellcome Open Res. 2021:6:117. [Google Scholar]
  45. Weyandt SE, Van Den Bussche RA, Hamilton MJ, Leslie DM.. Unraveling the effects of sex and dispersal: Ozark big-eared bat (Corynorhinus townsendii ingens) conservation genetics. J Mammal. 2005:86:1136–1143. [Google Scholar]
  46. Wilson D, Reeder D.. Mammal species of the world, a taxonomic and geographic reference Smithsonian Institution Press, Washington and London in Asoc American Soc. Mammalogist. 1993:18:1–1206. [Google Scholar]
  47. Yohe LR, Devanna P, Davies KT, Potter JH, Rossiter SJ, Teeling EC, Vernes SC, Dávalos LM.. Tissue collection of bats for-omics analyses and primary cell culture. J Vis Exp. 2019:152:59505. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

esad078_suppl_Supplementary_Figures_S1

Data Availability Statement

Data generated for this study are available under NCBI BioProject PRJNA896196. Raw sequencing data for sample COTO_CA2020_CCGP (NCBI BioSample SAMN31536067) are deposited in the NCBI Short Read Archive (SRA) under SRX19355142 for the PacBio HiFi sequencing data, and SRX19355143 and SRX19355144 for the Omni-C Illumina sequencing data. GenBank accessions for both Haplotype 1 and Haplotype 2 genome sequences are GCA_026230045.1 and GCA_026230055.1 and assembly accessions are JAPDVU000000000 and JAPDVT000000000, respectively. The mitochondrial genome assembly GenBank accession is CM047939.1. Assembly scripts and other data for the analyses presented can be found at the following GitHub repository: www.github.com/ccgproject/ccgp_assembly.www.github.com/ccgproject/ccgp_assembly.


Articles from Journal of Heredity are provided here courtesy of Oxford University Press

RESOURCES