Skip to main content
DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes logoLink to DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes
. 2024 Jun 7;31(4):dsae018. doi: 10.1093/dnares/dsae018

A haplotype-resolved reference genome of a long-distance migratory bat, Pipistrellus nathusii (Keyserling & Blasius, 1839)

Maximilian Driller 1,2,#, Thomas Brown 3,4,#, Shannon E Currie 5,6, Michael Hiller 7,8,9, Sylke Winkler 10, Martin Pippel 11, Christian C Voigt 12, Jörns Fickel 13,14, Camila J Mazzoni 15,16,
PMCID: PMC11215541  PMID: 38847751

Abstract

We present a complete, chromosome-scale reference genome for the long-distance migratory bat Pipistrellus nathusii. The genome encompasses both haplotypic sets of autosomes and the separation of both sex chromosomes by utilizing highly accurate long-reads and preserving long-range phasing information through the use of three-dimensional chromatin conformation capture sequencing (Hi-C). This genome, accompanied by a comprehensive protein-coding sequence annotation, provides a valuable genomic resource for future investigations into the genomic bases of long-distance migratory flight in bats as well as uncovering the genetic architecture, population structure and evolutionary history of Pipistrellus nathusii. The reference-quality genome presented here gives a fundamental resource to further our understanding of bat genetics and evolution, adding to the growing number of high-quality genetic resources in this field. Here, we demonstrate its use in the phylogenetic reconstruction of the order Chiroptera, and in particular, we present the resources to allow detailed investigations into the genetic drivers and adaptations related to long-distance migration.

Keywords: genome assembly, genome annotation, PacBio, Hi-C, Pipistrellus nathusii

1. Introduction

Bats (Chiroptera) are the second largest order within the class Mammalia, after rodents (Rodentia), and represent more than 20% of the living diversity of mammals with over 1,400 different species. This species richness is reflected in unique adaptations that allow bats to exist in all ecosystems worldwide except polar regions, covering a plethora of ecological niches.1 The obvious unique trait of bats within the mammalian kingdom is their ability for true flapping flight, exceptional longevity (given their body size and weight),2 vocal communication and learning,3 echolocation4 and importantly bats show an enhanced tolerance towards viruses enabled by a unique set of immune responses.5 Powered flight enables bats to cover long distances within a short period, which facilitates migration in this taxon.6 Migration is described as seasonal bidirectional movements (to-and-fro migration) of animals over long distances.7

Within the order Chiroptera, Nathusius’ pipistrelle (Pipistrellus nathusii) weighing 8 g is an extreme migratory bat in Europe.6 With daily and seasonal migration distances of greater than 100 km and over 2,000 km, respectively, P. nathusii holds the record for longest distance migration among bats of that size.8,9 Nathusius’ pipistrelles inhabit predominantly lowland areas in eastern, north-eastern and central Europe during summer. In autumn, populations from eastern and central European breeding areas migrate within a few months to hibernation areas in western, central and southern Europe.10,11 In spring, post-hibernating bats travel back to their summer range within a few weeks.12 The capacity for such a small bat to cover such long distances with limited fat reserves requires adaptive strategies for optimal fuel use,13 efficient flight speed,14 energy conversion15 and likely also the use of torpor during stopover periods.16 Recently, Nathusius’ pipistrelles have also been established as a model organism to study mammalian navigation, specifically when and how the Earth’s magnetic field is sensed.17,18 Although categorised as Least Concern by the IUCN,19 the increasing numbers of fatalities of this species at wind turbines in Europe11,20 may eventually cause population declines as it is expected for other migratory bats suffering from the same anthropogenic threat.21 Given these threats and the interesting biology of this species, information on the genome of Nathusius’ pipistrelles is urgently needed to advance our knowledge.

Here, we present the high-quality genome assembly of an adult male specimen of Pipistrellus nathusii (Keyserling & Blasius 1839). The genome has been assembled into two sets of phased chromosomes, with both sex chromosomes identified and included in one pseudo-haplotype. This genome assembly will offer a reference for researchers to investigate the mechanisms that enable the migrational behaviour of P. nathusii, the evolution of sex chromosomes among bats, as well as to further entangle the genetic mechanisms that make bats so unique and enable them to adapt to different ecological niches around the world.

2. Methods

The sample for this genome was derived from the liver tissue of an adult male specimen collected at the Ornithological Station of the Latvian University in Pape, Latvia (21 August 2017). This work was carried out in accordance with the guidelines and regulations of the animal care and ethics committee of the University of Latvia and under licenses issued by the Food and Veterinary Service of the Republic of Latvia (Nr. 025564) and the Latvian Nature Conservation Agency (Nr. 33/2017-E).

2.1. Extraction of high-molecular weight genomic DNA

High-molecular weight (HMW) genomic DNA (gDNA) was extracted from snap-frozen liver tissue with the Circulomics Nanobind Tissue Big DNA kit (part number NB-900-701-01, protocol version Nanobind Tissue Big DNA Kit Handbook v1.0 (11/19)) according to the manufacturer’s instructions. In brief, snap-frozen liver tissue was minced with a scalpel to small pieces and homogenized with the Tissue Ruptor II device (Qiagen) on ice in a chaotropic buffer; tissue lysis took place by adding Proteinase K. Cell debris was removed by centrifugation. The released gDNA was bound to the Circulomics Nanobind disk upon the addition of salting buffer and isopropanol. After several washing steps, the gDNA was eluted from the Nanobind disk. Pulse field gel electrophoresis using the Pippin PulseTM device (SAGE Science) revealed HMW DNA molecule length of < 200 kb.

2.2. PacBio HiFi library preparation and sequencing

Two PacBio HiFi libraries of Circulomics-extracted genomic DNA (HMW gDNA) of Pipistrellus nathusii were prepared as recommended by Pacific Biosciences according to the ‘Guidelines for preparing HiFi SMRTbell libraries using the SMRTbell Express Template Prep Kit 2.0’. In summary, HMW gDNA was sheared with the setting 25 kb on the MegaRuptorTM device (Diagenode) to 14–16 kb fragments. 8.5 and 13.5 µg sheared gDNA were used for library preparation. Both PacBio SMRTbellTM libraries were size-selected for fragments larger than 7 or 7.5 kb with the BluePippinTM device according to the manufacturer’s instructions. The size selected libraries were loaded with 60–85 pM on a plate. The libraries were run on four Sequel II SMRT cells (8M) with the SEQUEL II sequencing kit 2.0 for 30 h on the SEQUEL II machine of the DRESDEN concept Genome Center (DcGC), Germany.

2.3. Hi-C chromatin conformation capture

Chromatin conformation capturing was done by making use of the ARIMA-HiC 2.0 kit and following the user guide for animal tissues (ARIMA Document, Part Number: A160162 v00). In brief, 65 mg flash-frozen powdered liver tissue was cross-linked chemically. Circa 3.2 µg cross-linked genomic DNA was digested with the ARIMA-HiC 2.0 restriction enzyme cocktail consisting of four restriction enzymes. The 5ʹ-overhangs are filled in and labelled with biotin. Spatially proximal digested DNA ends were ligated, and finally, the ligated biotin-containing fragments were enriched and went for Illumina library preparation, which followed the ‘ARIMA user guide for Library preparation using the KapaR Hyper Prep kit (ARIMA Document Part Number A160139 v00). The barcoded HiC library was run on an S4 flow cell of a NovaSeq6000 with 2 × 150 cycles.

2.4. Genome assembly

Circular Consensus (HIFI) reads were called from the raw subreads using the CCS module of PacBio requiring a minimum of three passes to generate a CCS read. The reads were then refined using DeepConsensus22 by first mapping the subreads to the CCS reads and running DeepConsensus on the resulting bam files. The resulting HiFi reads were trimmed for remaining SMRTbell adapters using cutadapt.23 Hi-C reads were quality controlled and trimmed for a minimum average phred-quality of 20 using TrimGalore.24

The genome assembly was performed using an adapted version of the Vertebrate Genomes Project pipeline25 (Fig. 1). The PacBio HIFI reads were assembled into two haplotype assemblies of contigs using the Hi-C reads and software hifiasm26 (options: -l 3—h1—h2). To remove any remaining haplotigs, purge dups27 was used. Hi-C reads were aligned to the output hap1 and hap2 assemblies, using the Arima mapping pipeline (https://github.com/ArimaGenomics/mapping_pipeline); the resulting alignments were subsequently used for Hi-C scaffolded with YaHS.28

Figure 1.

Figure 1.

Pipeline used to assemble the Pipistrellus nathusii genome.

The contigs and scaffolds were checked for contaminants using the NCBI Foreign Contamination Screen (FCS),29 but no contigs were identified as contaminated with either adapter sequences, vectors or sequences from foreign species. Manual curation was performed using the RAPID curation pipeline (https://gitlab.com/wtsi-grit/rapid-curation) by realigning the sequencing data and visualizing the Hi-C heatmap using HiGlass.30 The identified Y chromosome in the hap2 assembly was moved to hap1 to create one ‘primary’ haplotype with one copy of each autosome and both sex chromosomes (X & Y). For assembly QC, we utilized Merqury to evaluate kmer completeness and assembly correctness, and BUSCO to evaluate gene completeness, using the mammalia_odb10 database.

The Hi-C contact maps of the three-dimensional contacts in the genome show the successful scaffolding of both haplotypes into chromosome-scale assemblies (Fig. 2; Supplementary Fig. 2).

Figure 2.

Figure 2.

(Top) Hi-C heatmaps of final assembly visualized using HiGlass. Left panel shows the 3D chromatin contacts for the primary assembly including the X chromosome (22nd scaffold) and the Y chromosome (23rd scaffold) and the right panel shows the alternate assembly. Signal is coloured by the number of contacts between two regions of the genome, with white indicating no contacts and red maximum number. The diagonal shows the self-interaction. (Bottom) Nx plots for contigs (left) and scaffolds (right) display the contiguity of all 58 ‘reference’ bat genomes published on NCBI in grey, overlaid with the 2 Pipistrellus Nathusii haplotype assemblies (black). The plots demonstrate the length of assembled sequences (y-axis) which make up which percentage of the assembled genome (x-axis). Lines are coloured by sequencing type, illumina (red), sanger (purple), PacBio (dark blue) and Oxford Nanopore (light blue).

To generate a mitochondrial assembly, we applied oatk31 with parameters -k 1001 -c 30 -m v20230921/mammalia_mito.fam using all PacBio HiFi reads as input. The assembled mitochondrial assembly was used to filter fragmented mitochondrial sequences from the haplotype assemblies generated above.

2.5. Genome annotation

To generate the repeat models and mask the repetitive elements of the genome, RepeatModeler32 was run on both haplotypes with argument -LTRStruct, and the generated repeat families were combined with the existing models in the curated Dfam database33 for ancestors of Pipistrellus nathusii. The combined repeat libraries were then used to mask the genome using RepeatMasker34 with arguments -a -s -gccalc -xsmall.

To generate a genome annotation of functional elements of the genome, we combined evidences from synteny-based projections from the human reference genome hg38 using TOGA,35de-novo transcripts assembled using stringtie36 and created protein and rna-seq-guided gene predictions using Braker337 (Fig. 3). For TOGA, a chain file was generated by mapping the masked P. nathusii genomes to the masked hg38 reference genome using lastz,38 following the make_lastz_chains pipeline recommended by TOGA (https://github.com/hillerlab/make_lastz_chains). These chains were then used by TOGA to find orthologous mappings between the human reference and the P. nathusii genome, classifying each gene as one of either having an intact reading frame, containing inactivating mutations, or containing missing exons because of assembly incompleteness or fragmentation.

Figure 3.

Figure 3.

Pipeline used to create protein-coding annotation.

For Braker gene prediction, all available RNA-seq data for genus Pipistrellus were downloaded from SRA (see accession list in file RNAseq_accessions.txt) and mapped to the P. nathusii genome using hisat239 with argument –dta. These alignments as well as vertebrate protein sequences from orthodb40 were used as hints for Braker3, which was run with argument --AUGUSTUS_ab_initio on the soft-masked genome. Evidences from TOGA, stringtie and Braker3 were combined using EvidenceModeler41 with the following weights as previously defined for bat genome annotations42: toga 8, transdecoder 12, AUGUSTUS 1, gmst 4, GeneMark.hmm3 4. Protein sequences from the resulting gff3 file were assigned functional annotation using diamond blastp43 against the Swissprot database (version 2023-09-13) and by running InterProScan44 with arguments: -dp -pa -appl Pfam,SuperFamily --goterms --iprlookup. Resulting functional annotations were combined using agat_sp_manage_functional_annotations.pl.45

2.6. Species phylogeny

To reconstruct a phylogenetic tree of bat species, including the newly assembled P. nathusii genome, we downloaded all reference Chiroptera genomes with annotations available on GenBank, referencing human genome hg38 to serve as outgroup (accession numbers available in the Supplementary File phylogeny_accessions.txt). The longest isoform of each protein sequence was extracted, and single-copy orthologs were identified using diamond and OrthoFinder.46 The 2,614 identified orthologues were then aligned against each other using MAFFT47 and the output alignments were trimmed using trimAL48 and concatenated into a supermatrics using geneStitcher. An initial phylogeny was then constructed using IQ-TREE49, and then branching time-points were estimated using MCMCTree from the PAML suite,50 using initial boundary estimates for each branching point taken from https://timetree.org. The resulting phylogeny was then displayed using the MCMCTreeR package.

2.7. Sequence synteny and orthology

To identify the degree of overlap between annotated sequences, proteins of 4 Pipistrellus species classified either as Intact or Partially Intact by TOGA were given to OrthoVenn351 as well as available protein sequences annotated by RefSeq for Pipistrellus kuhlii, Ensembl Rapid Release for Pipistrellus pipistrellus and in this study for Pipistrellus nathusii. Chromosomal assignment of the final scaffolds was based on size and synteny to the chromosome level assembly of the sister species P. pipistrellus genome (Sample Identifier: mPipPip1, Accession number: PRJEB39564),52 to identify autosomes as well as the X chromosome using D-Genies.53

3. Results

3.1. Genome assembly

Our assembled Pipistrellus nathusii genome has a high degree of synteny to the published Pipistrellus pipistrellus genome assembly (Supplementary Fig. 5), with the added benefit of containing an assembled Y chromosome as well as containing haplotype-phased sequences. The contiguity of the two assemblies is also in line with other state-of-the-art reference genome assemblies for Chiroptera species (Fig. 2). Among all other 58 published ‘reference’ genome assemblies for bats on GenBank, both haplotype assemblies are among the top 25% regarding contiguity of both contigs (N50 > 17Mb, N90 > 3Mb) and scaffolds (N50 > 87Mb, N90 > 48Mb) (Fig. 2, Table 1, Supplementary Fig. 3), and the BUSCO completeness54 of our primary assembly containing both X and Y chromosomes ranks in the top 10 of reference bat genomes (Complete: 96.2%, Single-copy 94.9%, Multi-copy 1.3%, Fragmented: 0.6%, Missing: 3.2%, mammalia_odb10 database) (Fig. 4). The alternative haplotype has a lower BUSCO score (Complete: 93.7%, Single: 92.6%, Duplicated: 1.1%, Fragmented: 0.7%, Missing: 5.6%), which is likely due to the assembly only containing autosomes. Indeed, when only considering the 21 autosomes assembled in the primary assembly, we obtained similar BUSCO scores (Complete: 94.0%, Single: 92.9%, Duplicated: 1.1%, Fragmented: 0.7%, Missing 5.4%). Of particular note is the assembly of the Y chromosome of our individual, a rare commodity among bat genomes. To date, only 15 of the 58 reference genomes noted above are labelled as being chromosome scale, and only 4 of those are listed as containing a Y chromosome.

Table 1.

General assembly metrics of the final primary and secondary assembly with both sex chromosomes present in the primary haplotype

Primary haplotype Alternate haplotype
Number of scaffolds 153 155
Total (summed) length, bp 1,804,478,338 1,688,812,661
Average scaffold length, bp 11,793,976.07 10,895,565.55
Largest scaffold length, bp 209,197,390 208,371,658
N50, bp (L50) 97,242,187 (6) 86,738,551 (6)
N60, bp (L60) 84,086,715 (9) 83,939,643 (8)
N70, bp (L70) 72,181,568 (11) 72,033,404 (10)
N80, bp (L80) 54,391,558 (14) 53,999,940 (13)
N90, bp (L90) 48,644,773 (17) 48,639,179 (16)
N100, bp (L100) 1,000 (153) 12,061 (155)
% of genome in chromosomes 99.4 99.5
Cumulative gap length, bp 50,800 52,840
Number of gaps 254 263
Merqury QV (HiFi kmers) 60.84 60.86 60.85
Merqury Completeness (%) 87.66 82.86 99.39

Merqury statistics are shown for hap1, hap2 and the combined haplotypes. bp = base pairs, N50 = Length of the shortest sequence at which 50% of the assembly is contained in the cumulative sequences of at least this length, QV = Quality Value.

Figure 4.

Figure 4.

BUSCO scores and TOGA projections for all 58 ‘reference’ bat genomes published in NCBI and the two haplotypes presented here for Pipistrellus nathusii. Scores on the left are sorted by Complete BUSCOs found (Single + Multi Copy) and on the right by the number of intact projections identified by TOGA from hg38 reference out of 18,430 ancestral placental mammalian genes. Species names are coloured by the sequencing technology used for the assembly.

The assembly was found to be highly accurate (QV = 60.84 or approximately 8 errors per 10Mb) and complete (k-mer completeness 99.4%) when considering k-mers found in both the assembly and PacBio HiFi reads dataset via merqury55 (Table 1, Supplementary Fig. 4). We thus believe this presents a useful resource for researchers interested in the evolution of sex chromosomes.

3.2. Genome annotation

We identified around a third of the genome constituting either repeat sequences or transposable elements. Of this, the majority were identified as LINE and SINE elements (Table 2). Of the 18,430 ancestral placental genes annotated in hg38, TOGA was able to classify 17,059 (92.6%) as intact in our primary assembly, 1,167 (6.3%) as containing inactivating mutations and 204 (1.1%) as missing (Fig. 4, right panel). These statistics are similar to the best bat genomes generated so far. Similar to the BUSCO scores above, the alternate assembly gave lower scores of 16,327 (88.6%) genes intact, 1,172 (6.4%) containing inactivating mutations and 931 (5.1%) as missing, likely due to the lack of sex chromosomes in the assembly. Considering the merged set of projections against both haplotypes, TOGA classified 17,180 (93.2%) genes as intact, 1,078 (5.8%) as containing inactivating mutations and 172 (0.9%) genes as missing. The protein-coding genome annotation created by combining de-novo predictions, transcript assemblies from Pipistrellus RNA-seq data and liftover predictions from the human genome resulted in 19,038 annotated genes, 17,630 of which were assigned gene names from the SwissProt database (Table 3). The final protein-coding annotation identified 94.4% of the 9,226 mammalian BUSCO genes and 96.03% of the 13,212 mammalian OMA genes as complete. Furthermore, 94.26% of the annotated protein sequences were identified as consistently placed within the mammalian lineage, with only 0.91% of the 19,038 annotated genes marked as inconsistently placed by OMArk56 (Supplementary Fig. 7).

Table 2.

Repeat content for each haplotype of the Pipistrellus nathusii genome

Repeat Family Primary haplotype Alternate haplotype
SINE 140Mb (7.75%) 134Mb (7.95%)
LINE 200Mb (11.1%) 167Mb (9.92%)
LTR 58.8Mb (3.26%) 52.9Mb (3.13%)
DNA 54.0Mb (2.99%) 50.3Mb (2.98%)
Simple 91.1Mb (5.05%) 76.5Mb (4.53%)
Other 116Mb (6.43%) 104Mb (6.13%)
Total 621Mb (34.4%) 556Mb (32.9%)

‘Simple’ repeats are defined as those labelled as Simple_repeat, Low_complexity and tandem repeats by RepeatModeler and ‘Other’ repeats are those labelled as tRNA, RC, Unknown, PLE, scRNA, Retroposon and snRNA. SINE/LINE = Short/Long Interspersed Nuclear Repeats, LTR = Long Terminal Repeats, DNA = DNA transposons.

Table 3.

Statistics from protein-coding genome annotation methods

Protein-coding genome annotation statistics
No. genes 19,038
No. genes with Functional Annotation 17,630
No. Single-Exon Genes 3,272
Mean no. Exons per Gene 9.2
Mean Gene Length 35,621bp
Mean Intron Length 4,131bp
Mean Coding Sequence Length 1,596bp
BUSCO: Complete 94.4%
BUSCO: Fragmented 1.0%
BUSCO: Missing 4.6%
OMArk: Complete 96.03%
OMArk: Missing 3.97%
OMArk: Consistent 94.26%
OMArk: Inconsistent 0.91%
OMArk: Contaminants 0.00%
OMArk: Unknown 4.83%

BUSCO scores based on mammalia_odb10 database. OMArk results are based on the OMAmer mammalia database. (OMA = Orthologous Matrix).

3.3. Comparative genomic analysis

The phylogenetic reconstruction of published annotated reference genomes together with the assembled and annotated genome here placed P. nathusii within the Vespertilionidae family and the genus Pipistrellus as expected (Fig. 5). Furthermore, we observed a high degree of synteny when comparing the assembled chromosomes of the P. nathusii genome with that of the published P. pipistrellus genome (Supplementary Fig. 5) and a high degree of overlap between identified protein orthologs among all published Pipistrellus genomes (Supplementary Fig. 7). In particular, OrthoVenn3 results indicate that the number of shared orthologs missing in the P. nathusii genome are not above an expected level; however, a higher than expected number of orthologs were found missing from the previously published P. pipistrellus genome.

Figure 5.

Figure 5.

Phylogeny of reference bat genomes with published annotations on GenBank. Species are coloured by family and branching points with 95% confidence intervals are indicated on the tree. Scale is Million Years Ago and vertical grey bars demarcate the geological periods.

4. Discussion

Here, we present a complete, high-quality reference genome for the long-distance migratory bat Pipistrellus nathusii. Combining high-accuracy PacBio HiFi long-read sequencing with long-range sequencing information from 3D Chromatin Conformation Capture (Hi-C), we were able to separate contigs from the two haplotypes into a set of two pseudo-haplomes, where the sequence of each chromosome originates from a single parent. In addition to providing two sets of complete chromosome sequences for the autosomes, this also allowed us to assemble both sex chromosomes, in particular the small Y chromosome.

Furthermore, by combining evidence from synteny mapping, protein sequences from vertebrates and transcript sequences from the Pipistrellus genus, we were able to construct a highly complete protein-coding sequence annotation for our primary assembly, consisting of one copy of each autosome and both sex chromosomes. Taken together, this set of resources will be invaluable to the investigation of the origin of flight in mammals, sex chromosome evolution and, in particular for this species, the evolution of long-distance migration.

Using this high-quality genomic resource, we have demonstrated how one can investigate the phylogenetic relationship between bat species on the molecular level. Furthermore, as the number of published annotated reference genomes in this clade increases, further investigation will be possible into proteins shared and unique to particular bat species and genera.

In summary, we have shown here that the genome assembly and annotation produced for Pipistrellus nathusii are of high quality in regards to other reference bat genomes and outright metrics regarding contiguity, accuracy and completeness. Having a chromosome-scale, highly accurate genome assembly is invaluable for researchers who wish to investigate the evolution of chromosome evolution, genome arrangement and further phylogenetic analysis regarding the many unique phenotypes associated with bats and, particularly, the Pipistrellus genus.

Supplementary Material

dsae018_suppl_Supplementary_Data

Acknowledgements

The authors would like to thank the HPC Service of ZEDAT, Freie Universität Berlin, for computing time.57 We also thank the Long Read Team of the DRESDEN Concept Genome Center, part of the MPI-CBG and the technology platform of the CMCB at the TU Dresden. The authors would like to thank Luísa Schlude Marins for assistance with figures and artistic direction.

Contributor Information

Maximilian Driller, Berlin Center for Genomics in Biodiversity Research (BeGenDiv), Berlin, Germany; Evolutionary Genetics Department, Leibniz-Institut für Zoo- und Wildtierforschung (IZW), Berlin, Germany.

Thomas Brown, Berlin Center for Genomics in Biodiversity Research (BeGenDiv), Berlin, Germany; Evolutionary Genetics Department, Leibniz-Institut für Zoo- und Wildtierforschung (IZW), Berlin, Germany.

Shannon E Currie, Evolutionary Ecology Department, Leibniz-Institut für Zoo- und Wildtierforschung (IZW), Berlin, Germany; School of Biosciences, University of Melbourne, Parkville, 3010 Victoria, Australia.

Michael Hiller, LOEWE Centre for Translational Biodiversity Genomics, Senckenberganlage 25, 60325 Frankfurt, Germany; Senckenberg Research Institute, Senckenberganlage 25, 60325 Frankfurt, Germany; Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 9, 60438 Frankfurt, Germany.

Sylke Winkler, Sequencing and Genotyping, Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany.

Martin Pippel, Sequencing and Genotyping, Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany.

Christian C Voigt, Evolutionary Ecology Department, Leibniz-Institut für Zoo- und Wildtierforschung (IZW), Berlin, Germany.

Jörns Fickel, Evolutionary Genetics Department, Leibniz-Institut für Zoo- und Wildtierforschung (IZW), Berlin, Germany; Institute for Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany.

Camila J Mazzoni, Berlin Center for Genomics in Biodiversity Research (BeGenDiv), Berlin, Germany; Evolutionary Genetics Department, Leibniz-Institut für Zoo- und Wildtierforschung (IZW), Berlin, Germany.

Funding

This project was funded by the Leibniz Competitive Fund 2019-2021, K101/2018 and the LOEWE-Centre for Translational Biodiversity Genomics (TBG) funded by the Hessen State Ministry of Higher Education, Research and the Arts (HMWK). The publication of this article was funded by the Open Access Fund of the Leibniz Association.

Conflict of interest

The authors declare no conflicts of interest.

Data availability

Raw sequencing data and genome assembly are available on INSDC at the following BioProjects PRJEB70415, accession GCA_963693515 (Primary assembly and annotation), PRJEB70416, accession GCA_963693525 (Alternate assembly), PRJEB71102, accession GCA_963855505 (mitochondrial assembly) and PRJEB70389 (Raw genome sequencing data). Software versions used are listed in Table 4. Scripts used to generate assemblies, protein-coding annotation and phylogenetic tree are available here: https://git.imp.fu-berlin.de/begendiv/mpipnat_genome. Repositories for TOGA and make_lastz_changes are available here: https://github.com/hillerlab/TOGA, https://github.com/hillerlab/make_lastz_chains

Table 4.

Software versions used.

Software Version
CCS 6.0.0
Cutadapt 2.3
TrimGalore 0.6.7
hifiasm 0.17.4-r455
purge_dups 1.2.6
YaHS 1.2a.1-patch
bwa-mem2 2.2.1
pairtools 1.0.2
tabix 1.11
pairix 0.3.7
cooler 0.9.3
HiGlass 1.4
fcs-gx 0.4.0-3-g8096f62
oatk 368a247
BUSCO 5.4.7
Merqury 1.3
D-GENIES 1.5.0
samtools 1.10
Ncbi-datasets (access date: 2023-10-02) 15.20.0
RepeatModeler 2.0.4
RepeatMasker 4.1.5
TOGA 1.1.4
Hisat2 2.2.1
Stringtie 2.2.1
TransDecoder 5.7.1
Braker 3.0.2
EvidenceModeler 2.1.0
diamond 2.1.8
InterProScan 5.55-88.0
agat 1.0.0
OMArk 0.3.0
OrthoVenn3-api 1433b6b83e33e441c85db8577719fa3089eb968535e5451e9e5833e0e1cc79cc
OrthoVenn3-web a0651994eb365d0138ccbafec60e4b62b5b68803489d4480186496a0722dcda0
OrthoVenn3-mysql 081da04389be0fc86630feb55bb263b39f0f01ae185fac9bd6cb2d77e354b1f9
Diamond 2.1.8
Orthofinder 2.5.5
MAFFT 7.475
trimAL 1.4.rev15
geneStitcher Commit 974bd21
iqtree 2.2.5
mcmctree 4.10.7

References

  • 1. Teeling, E.C., Vernes, S.C., Dávalos, L.M., Ray, D.A., Gilbert, M.T.P., and Myers, E.; Bat1K Consortium. 2018, Bat biology, genomes, and the bat1k project: To generate chromosome-level genomes for all living bat species, Annu. Rev. Anim. Biosci., 6, 23–46. [DOI] [PubMed] [Google Scholar]
  • 2. Huang, Z., Whelan, C.V., Foley, N.M., et al. 2019, Longitudinal comparative transcriptomics reveals unique mechanisms underlying extended healthspan in bats. Nature Ecology &amp, Evolution, 3, 1110–20. [DOI] [PubMed] [Google Scholar]
  • 3. Vernes, S.C. and Wilkinson, G.S.. 2019, Behaviour, biology and evolution of vocal learning in bats, Philos. Trans. R. Soc. B: Biol. Sci., 375, 20190061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Jones, G. and Holderied, M.W.. 2007, Bat echolocation calls: adaptation and convergent evolution, Proc. Biol. Sci., 274, 905–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Banerjee, A., Baker, M.L., Kulcsar, K., Misra, V., Plowright, R., and Mossman, K.. 2020, Novel insights into immune systems of bats, Front. Immunol., 11, 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Voigt, C.C., Kionka, J., Koblitz, J.C., Stilz, P.C., Pētersons, G., and Lindecke, O.. 2023, Bidirectional movements of Nathusius’ pipistrelle bats (Pipistrellus nathusii) during autumn at a major migration corridor, Global Ecol. Conserv., 48, e02695. [Google Scholar]
  • 7. Dingle, H. and Drake, V.A.. 2007, What Is Migration? BioScience, 57, 113–21. [Google Scholar]
  • 8. Strelkov, P. 1969, Migratory and stationary bats (Chiroptera) of the European part of the Soviet Union, Acta Zool. Cracov., 14, 393–438. [Google Scholar]
  • 9. Vasenkov, D., Desmet, J.-F., Popov, I., and Sidorchuk, N.. 2022, Bats can migrate farther than it was previously known: a new longest migration record by Nathusius’ pipistrelle Pipistrellus nathusii (Chiroptera: Vespertilionidae), Mammalia, 86, 524–6. [Google Scholar]
  • 10. Pētersons, G. 2004, Seasonal migrations of north-eastern populations of Nathusius’ bat Pipistrellus nathusii (Chiroptera), Myotis, 41, 29–56. [Google Scholar]
  • 11. Kruszynski, C., Bailey, L.D., Courtiol, A., et al. 2021, Identifying migratory pathways of Nathusius’ pipistrelles (Pipistrellus nathusii) using stable hydrogen and strontium isotopes, Rapid Commun. Mass Spectrom., 35, e9031. [DOI] [PubMed] [Google Scholar]
  • 12. Heim, O., Schröder, A., Eccard, J., Jung, K., and Voigt, C.C.. 2016, Seasonal activity patterns of European bats above intensively used farmland. Agric. Ecosyst. Environ. 233, 130–9. [Google Scholar]
  • 13. Voigt, C.C., Borissov, I.M., and Voigt-Heucke, S.L.. 2012, Terrestrial locomotion imposes high metabolic requirements on bats, J. Exp. Biol, 215, 4340–4. [DOI] [PubMed] [Google Scholar]
  • 14. Troxell, S.A., Holderied, M.W., Pētersons, G., and Voigt, C.C.. 2019, Nathusius’ bats optimize long-distance migration by flying at maximum range speed, J. Exp. Biol., 222, jeb176396. [DOI] [PubMed] [Google Scholar]
  • 15. Currie, S.E., Johansson, L.C., Aumont, C., Voigt, C.C., and Hedenström, A.. 2023, Conversion efficiency of flight power is low, but increases with flight speed in the migratory bat Pipistrellus nathusii, Proc. R. Soc. B: Biol. Sci. 290, 20230045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. McGuire, L.P., Jonasson, K.A., and Guglielmo, C.G.. 2014, Bats on a budget: torpor-assisted migration saves time and energy, PLoS One, 9, e115724. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Lindecke, O., Voigt, C.C., Pētersons, G., and Holland, R.A.. 2015, Polarized skylight does not calibrate the compass system of a migratory bat, Biol. Lett., 11, 20150525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Lindecke, O., Elksne, A., Holland, R.A., Pētersons, G., and Voigt, C.C.. 2019, Experienced migratory bats integrate the sun’s position at dusk for navigation at night, Curr. Biol., 29, 1369–73. [DOI] [PubMed] [Google Scholar]
  • 19. Paunović, M. and Juste, J.. 2016.Pipistrellus nathusii. The IUCN Red List of Threatened Species 2016, e.T17316A22132621. 10.2305/IUCN.UK.2016-2.RLTS.T17316A22132621.en(4 June 2004, date last accessed). [DOI] [Google Scholar]
  • 20. Rydell, J., Bach, L., Dubourg-Savage, M.-J., Green, M., Rodrigues, L., and Hedenström, A.. 2010, Mortality of bats at wind turbines links to nocturnal insect migration? Eur. J. Wildlife Res., 56, 823–7. [Google Scholar]
  • 21. Frick, W.F., Baerwald, E.F., Pollock, J.F., et al. 2017, Fatalities at wind turbines may threaten population viability of a migratory bat, Biol. Conserv., 209, 172–7. [Google Scholar]
  • 22. Baid, G., Cook, D.E., Shafin, K., et al. 2022, DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer, Nat. Biotechnol., 41, 232–8. [DOI] [PubMed] [Google Scholar]
  • 23. Martin, M. 2011, Cutadapt removes adapter sequences from high-throughput sequencing reads, EMBnet.journal, 17, 10. [Google Scholar]
  • 24. Krueger, F., James, F., Ewels, P., Afyounian, E., and Schuster-Boeckler, B.. 2021, July 23, FelixKrueger/TrimGalore: V0.6.7 - Zenodo.
  • 25. Larivière, D., Abueg, L., Brajuka, N., et al. 2023, Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy. Cold Spring Harbor Laboratory. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Cheng, H., Concepcion, G.T., Feng, X., Zhang, H., and Li, H.. 2021, Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm, Nat. Methods, 18, 170–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Guan, D., McCarthy, S.A., Wood, J., Howe, K., Wang, Y., and Durbin, R.. 2020, Identifying and removing haplotypic duplication in primary genome assemblies, Bioinformatics, 36, 2896–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Zhou, C., McCarthy, S.A., and Durbin, R.. 2022, YaHS: yet another Hi-C scaffolding tool, Bioinformatics, 39, 1–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Astashyn, A., Tvedte, E. S., Sweeney, D., et al. 2023, Rapid and sensitive detection of genome contamination at scale with FCS-GX. Cold Spring Harbor Laboratory. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Kerpedjiev, P., Abdennur, N., Lekschas, F., et al. 2018, HiGlass: web-based visual exploration and analysis of genome interaction maps, Genome Biol., 19, 125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Zhou, C. 2023, February 11, c-zhou/oatk: Oatk-0.1. Zenodo.
  • 32. Flynn, J.M., Hubley, R., Goubert, C., et al. 2020, RepeatModeler2 for automated genomic discovery of transposable element families, Proc. Natl. Acad. Sci. U.S.A., 117, 9451–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Storer, J., Hubley, R., Rosen, J., Wheeler, T.J., and Smit, A.F.. 2021, The Dfam community resource of transposable element families, sequence models, and genome annotations, Mobile DNA, 12, 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Smit, A., FA, Hubley, R., and Green, P., FA.. 2013, RepeatMaker. RepeatMasker Open-4.0. https://repeatmasker.org [Google Scholar]
  • 35. Kirilenko, B.M., Munegowda, C., Osipova, E., et al. ; Zoonomia Consortium‡. 2023, Integrating gene annotation with orthology inference at scale, Science, 380, eabn3107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.-C., Mendell, J.T., and Salzberg, S.L.. 2015, StringTie enables improved reconstruction of a transcriptome from RNA-seq reads, Nat. Biotechnol., 33, 290–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Gabriel, L., Brůna, T., Hoff, K. J., et al. 2023, BRAKER3: Fully Automated Genome Annotation Using RNA-Seq and Protein Evidence with GeneMark-ETP, AUGUSTUS and TSEBRA. Cold Spring Harbor Laboratory. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Harris, R.S. 2007, Improved pairwise alignment of genomic DNA. PhD Thesis, The Pennsylvania State University. [Google Scholar]
  • 39. Kim, D., Paggi, J.M., Park, C., Bennett, C., and Salzberg, S.L.. 2019, Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype, Nat. Biotechnol., 37, 907–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Kuznetsov, D., Tegenfeldt, F., Manni, M., et al. 2022, OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity, Nucleic Acids Res., 51, D445–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Haas, B.J., Salzberg, S.L., Zhu, W., et al. 2008, Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments, Genome Biol., 9, R7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Jebb, D., Huang, Z., Pippel, M., et al. 2020, Six reference-quality genomes reveal evolution of bat adaptations, Nature, 583, 578–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Buchfink, B., Reuter, K., and Drost, H.-G.. 2021, Sensitive protein alignments at tree-of-life scale using DIAMOND, Nat. Methods, 18, 366–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Jones, P., Binns, D., Chang, H.-Y., et al. 2014, InterProScan 5: genome-scale protein function classification, Bioinformatics, 30, 1236–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Dainat J., AGAT: Another Gff Analysis Toolkit to handle annotations in any GTF/GFF format. (Version v0.7.0). Zenodo. 10.5281/zenodo.3552717 [DOI] [Google Scholar]
  • 46. Emms, D.M. and Kelly, S.. 2019, OrthoFinder: Phylogenetic orthology inference for comparative genomics, Genome Biol., 20, 238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Katoh, K. and Standley, D.M.. 2013, MAFFT multiple sequence alignment software version 7: Improvements in performance and usability, Mol. Biol. Evol., 30, 772–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Capella-Gutiérrez, S., Silla-Martínez, J.M., and Gabaldón, T.. 2009, trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses, Bioinformatics, 25, 1972–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Minh, B.Q., Schmidt, H.A., Chernomor, O., et al. 2020, IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era, Mol. Biol. Evol., 37, 1530–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Yang, Z. 2007, PAML 4: Phylogenetic analysis by maximum likelihood, Mol. Biol. Evol., 24, 1586–91. [DOI] [PubMed] [Google Scholar]
  • 51. Sun, J., Lu, F., Luo, Y., Bie, L., Xu, L., and Wang, Y.. 2023, OrthoVenn3: An integrated platform for exploring and visualizing orthologous data across genomes, Nucleic Acids Res., 51, W397–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Vine, C., Teeling, E.C., Smith, M., et al. 2021, The genome sequence of the common pipistrelle, Pipistrellus pipistrellus Schreber 1774, Wellcome Open Res. 6, 117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Cabanettes, F. and Klopp, C.. 2018, D-GENIES: Dot plot large genomes in an interactive, efficient and simple way, PeerJ, 6, e4958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and Zdobnov, E.M.. 2015, BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs, Bioinformatics, 31, 3210–2. [DOI] [PubMed] [Google Scholar]
  • 55. Rhie, A., Walenz, B.P., Koren, S., and Phillippy, A.M.. 2020, Merqury: Reference-free quality, completeness, and phasing assessment for genome assemblies, Genome Biol., 21, 1–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Nevers, Y., Rossier, V., Train, C. M., Altenhoff, A., Dessimoz, C., and Glover, N.. 2022, Multifaceted quality assessment of gene repertoire annotation with OMArk. Cold Spring Harbor Laboratory. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Bennett, L., Melchers, B., and Proppe, B.. 2020, January 1, Curta: A General-purpose High-Performance Computer at ZEDAT, Freie Universität Berlin. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

dsae018_suppl_Supplementary_Data

Data Availability Statement

Raw sequencing data and genome assembly are available on INSDC at the following BioProjects PRJEB70415, accession GCA_963693515 (Primary assembly and annotation), PRJEB70416, accession GCA_963693525 (Alternate assembly), PRJEB71102, accession GCA_963855505 (mitochondrial assembly) and PRJEB70389 (Raw genome sequencing data). Software versions used are listed in Table 4. Scripts used to generate assemblies, protein-coding annotation and phylogenetic tree are available here: https://git.imp.fu-berlin.de/begendiv/mpipnat_genome. Repositories for TOGA and make_lastz_changes are available here: https://github.com/hillerlab/TOGA, https://github.com/hillerlab/make_lastz_chains

Table 4.

Software versions used.

Software Version
CCS 6.0.0
Cutadapt 2.3
TrimGalore 0.6.7
hifiasm 0.17.4-r455
purge_dups 1.2.6
YaHS 1.2a.1-patch
bwa-mem2 2.2.1
pairtools 1.0.2
tabix 1.11
pairix 0.3.7
cooler 0.9.3
HiGlass 1.4
fcs-gx 0.4.0-3-g8096f62
oatk 368a247
BUSCO 5.4.7
Merqury 1.3
D-GENIES 1.5.0
samtools 1.10
Ncbi-datasets (access date: 2023-10-02) 15.20.0
RepeatModeler 2.0.4
RepeatMasker 4.1.5
TOGA 1.1.4
Hisat2 2.2.1
Stringtie 2.2.1
TransDecoder 5.7.1
Braker 3.0.2
EvidenceModeler 2.1.0
diamond 2.1.8
InterProScan 5.55-88.0
agat 1.0.0
OMArk 0.3.0
OrthoVenn3-api 1433b6b83e33e441c85db8577719fa3089eb968535e5451e9e5833e0e1cc79cc
OrthoVenn3-web a0651994eb365d0138ccbafec60e4b62b5b68803489d4480186496a0722dcda0
OrthoVenn3-mysql 081da04389be0fc86630feb55bb263b39f0f01ae185fac9bd6cb2d77e354b1f9
Diamond 2.1.8
Orthofinder 2.5.5
MAFFT 7.475
trimAL 1.4.rev15
geneStitcher Commit 974bd21
iqtree 2.2.5
mcmctree 4.10.7

Articles from DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes are provided here courtesy of Oxford University Press

RESOURCES