Abstract
We present a genome assembly from an individual female Boloria selene (the small pearl-bordered fritillary, also known as the silver meadow fritillary; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 400 megabases in span. The complete assembly is scaffolded into 31 chromosomal pseudomolecules, with the W and Z sex chromosome assembled.
Keywords: Boloria selene, small pearl-bordered fritillary, silver meadow fritillary, genome sequence, chromosomal, Lepidoptera
Species taxonomy
Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Endopterygota; Lepidoptera; Glossata; Ditrysia; Papilionoidea; Nymphalidae; Heliconiinae; Argynnini; Boloria; Boloria selene (Schiffermüller, 1775) (NCBI:txid191398).
Background
The small pearl-bordered fritillary ( Boloria selene) is a widespread butterfly of boreal habitats in Northern Europe, North America, and Asia ( Filz et al., 2012; Roy et al., 2015). Nearctic and Palaearctic populations have been estimated to have diverged during the early Pleistocene, approximately 2.5 Mya ( Maresova et al., 2019). Within the UK, B. selene is endemic to mainland Scotland, Wales, and the West coast of England. The species is known as enw gwyddonol in Welsh and an neamhnaideach beag in Scottish Gaelic. Larvae feed exclusively on violets ( Swengel, 1997), while adults use a wide range of floral resources including buttercups and bird’s-foot-trefoil ( Tudor et al., 2004) and may migrate short distances for reproduction ( Dapporto & Dennis, 2013; Kuussaari et al., 2014). B. selene is a species that inhabits damp grassland clearings within deciduous woodland where the larval food source is abundant ( Filz et al., 2012). Although the species is classed as Least Concern in the IUCN Red List ( van Swaay et al., 2010), habitat loss and fragmentation due to agriculture in combination with diet specificity and limited dispersal have led to significant population declines over the last half-century ( Filz et al., 2012). In the UK, B. selene is classified as a priority species under the Biodiversity Action Plan ( Fox et al., 2015; Stewart et al., 2004). B. selene has 30 chromosome pairs ( Federley, 1938; Lorković, 1941).
Genome sequence report
The genome was sequenced from a single female B. selene collected from Carrifran Wildwood, Scotland (latitude 55.4001, longitude -3.3352). A total of 60-fold coverage in Pacific Biosciences single-molecule circular consensus (HiFi) long reads and 77-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 8 missing/misjoins, reducing the scaffold number by 16.22% and increasing the scaffold N50 by 0.16%.
The final assembly has a total length of 400 Mb in 31 sequence scaffolds with a scaffold N50 of 13.6 Mb ( Table 1). The complete assembly sequence was assigned to 31 chromosomal-level scaffolds, representing 29 autosomes (numbered by sequence length), and the W and Z sex chromosome ( Figure 2– Figure 5; Table 2). The assembly has a BUSCO v5.1.2 ( Manni et al., 2021) completeness of 98.6% (single 98.3%, duplicated 0.3%) using the lepidoptera_odb10 reference set (n=5286). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.
Figure 2. Genome assembly of Boloria selene, ilBolSele5.2: metrics.
The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness. The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 399,917,640 bp assembly. The distribution of chromosome lengths is shown in dark grey with the plot radius scaled to the longest chromosome present in the assembly (25,989,679 bp, shown in red). Orange and pale-orange arcs show the N50 and N90 chromosome lengths (13,620,786 and 9,102,699 bp), respectively. The pale grey spiral shows the cumulative chromosome count on a log scale with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot. A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilBolSele5.2/dataset/ilBolSele5_2/snail.
Figure 5. Genome assembly of Boloria selene, ilBolSele5.2: Hi-C contact map.
Hi-C contact map of the ilBolSele5.2 assembly, visualised in HiGlass. Chromosomes are shown in size order from left to right and top to bottom.
Table 1. Genome data for Boloria selene, ilBolSele5.2.
Project accession data | |
---|---|
Assembly identifier | ilBolSele5.2 |
Species | Boloria selene |
Specimen | ilBolSele5 (genome assembly,
Hi-C), ilBolSele2 (additional 10X reads) |
NCBI taxonomy ID | NCBI:txid191398 |
BioProject | PRJEB43033 |
BioSample ID | SAMEA7523131 |
Isolate information | Female, whole organism
(ilBolSele5); male, whole organism (ilBolSele2) |
Raw data accessions | |
PacificBiosciences SEQUEL II | ERR6412358 |
10X Genomics Illumina | ERR6054466-ERR6054469
(ilBolSele5); ERR6054462- ERR6054465 (ilBolSele2) |
Hi-C Illumina | ERR6054470 |
Genome assembly | |
Assembly accession | GCA_905231865.2 |
Accession of alternate haplotype | GCA_905231875.1 |
Span (Mb) | 400 |
Number of contigs | 41 |
Contig N50 length (Mb) | 13.4 |
Number of scaffolds | 31 |
Scaffold N50 length (Mb) | 13.6 |
Longest scaffold (Mb) | 16.1 |
BUSCO * genome score | C:98.6%[S:98.3%,D:0.3%],F:
0.2%,M:1.2%,n:5286 |
*BUSCO scores based on the lepidoptera_odb10 BUSCO set using v5.1.2. C= complete [S= single copy, D=duplicated], F=fragmented, M=missing, n=number of orthologues in comparison. A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/ilBolSele5.2/dataset/ilBolSele5_2/busco.
Table 2. Chromosomal pseudomolecules in the genome assembly of Boloria selene, ilBolSele5.2.
INSDC
accession |
Chromosome | Size (Mb) | GC% |
---|---|---|---|
HG993132.1 | 1 | 16.08 | 32.5 |
HG993133.1 | 2 | 16.07 | 32.8 |
HG993134.1 | 3 | 15.71 | 32.8 |
HG993135.1 | 4 | 15.18 | 32.7 |
HG993137.1 | 5 | 15.16 | 32.6 |
HG993138.1 | 6 | 15.13 | 32.4 |
HG993139.1 | 7 | 14.44 | 32.4 |
HG993140.1 | 8 | 14.43 | 32.3 |
HG993141.1 | 9 | 14.36 | 32.5 |
HG993142.1 | 10 | 13.68 | 32.2 |
HG993143.1 | 11 | 13.62 | 32.1 |
HG993144.1 | 12 | 13.60 | 32.6 |
HG993145.1 | 13 | 13.40 | 32.4 |
HG993146.1 | 14 | 13.16 | 32.2 |
HG993147.1 | 15 | 12.92 | 32.5 |
HG993148.1 | 16 | 12.72 | 32.5 |
HG993149.1 | 17 | 12.53 | 32.6 |
HG993150.1 | 18 | 12.34 | 32.8 |
HG993151.1 | 19 | 11.81 | 33.1 |
HG993152.1 | 20 | 11.69 | 32.5 |
HG993153.1 | 21 | 11.14 | 32.7 |
HG993154.1 | 22 | 10.33 | 33.4 |
HG993155.1 | 23 | 10.05 | 32.7 |
HG993156.1 | 24 | 9.14 | 32.8 |
HG993157.1 | 25 | 9.10 | 32.5 |
HG993158.1 | 26 | 8.95 | 34.1 |
HG993159.1 | 27 | 8.17 | 35.4 |
HG993160.1 | 28 | 6.92 | 35.0 |
HG993161.1 | 29 | 6.91 | 34.2 |
HG993136.1 | W | 15.17 | 36.7 |
HG993131.1 | Z | 25.99 | 32.0 |
HG998571.1 | MT | 0.02 | 19.0 |
Methods
Sample acquisition, DNA extraction and sequencing
A female B. selene specimen (ilBolSele5) was collected from Carrifran Wildwood, Scotland (latitude 55.4001, longitude -3.3352) using a net by Konrad Lohse, who also identified the sample. Another male specimen (ilBolSele2; Figure 1) was also collected from the same location by the same individual. Specimens were snap-frozen at -80°C.
Figure 1. Fore and hind wings of the Boloria selene specimen from which the genome was sequenced.
Dorsal (left) and ventral (right) surface view of wings from specimen UK_BS_1216 (ilBolSele5) from Carrifran Wildwood, Scotland, used to generate 10X, HiFi and Hi-C data.
Figure 3. Genome assembly of Boloria selene, ilBolSele5.2: GC coverage.
BlobToolKit GC-coverage plot. Scaffolds are coloured by phylum. Circles are sized in proportion to scaffold length. Histograms show the distribution of scaffold length sum along each axis. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilBolSele5.2/dataset/ilBolSele5_2/blob
Figure 4. Genome assembly of Boloria selene, ilBolSele5.2: cumulative sequence.
BlobToolKit cumulative sequence plot. The grey line shows cumulative length for all scaffolds. Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilBolSele5.2/dataset/ilBolSele5_2/cumulative.
DNA was extracted from the whole organism of ilBolSele5 and ilBolSele2 at the Wellcome Sanger Institute (WSI) Scientific Operations core from the whole organism using the Qiagen MagAttract HMW DNA kit, according to the manufacturer’s instructions. Pacific Biosciences HiFi circular consensus (for ilBolSele5) and 10X Genomics read cloud (ilBolSele5 and ilBolSele2) DNA sequencing libraries were constructed according to the manufacturers’ instructions. DNA and RNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi) and Illumina HiSeq X (10X) instruments. Hi-C data were also generated from remaining whole organism tissue of ilBolSele5 using the Arima v2 Hi-C kit and sequenced on an Illumina NovaSeq 6000 instrument.
Genome assembly
Assembly was carried out with Hifiasm ( Cheng et al., 2021) haplotypic duplication was identified and removed with purge_dups ( Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes ( Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data ( Rao et al., 2014) using SALSA2 ( Ghurye et al., 2019). The assembly was checked for contamination and corrected using the gEVAL system ( Chow et al., 2016) as described previously ( Howe et al., 2021). Manual curation ( Howe et al., 2021) was performed using gEVAL, HiGlass ( Kerpedjiev et al., 2018) and Pretext. The mitochondrial genome was assembled using MitoHiFi ( Uliano-Silva et al., 2021), which performed annotation using MitoFinder ( Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment ( Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.
Table 3. Software tools used.
Software tool | Version | Source |
---|---|---|
Hifiasm | 0.12 | Cheng et al., 2021 |
purge_dups | 1.2.3 | Guan et al., 2020 |
SALSA2 | 2.2 | Ghurye et al., 2019 |
longranger align | 2.2.2 |
https://support.10xgenomics.com/
genome-exome/software/pipelines/ latest/advanced/other-pipelines |
freebayes | 1.3.1-17-gaa2ace8 | Garrison & Marth, 2012 |
MitoHiFi | 1 | Uliano-Silva et al., 2021 |
gEVAL | N/A | Chow et al., 2016 |
HiGlass | 1.11.6 | Kerpedjiev et al., 2018 |
PretextView | 0.1.x | https://github.com/wtsi-hpag/PretextView |
BlobToolKit | 2.6.4 | Challis et al., 2020 |
Data availability
European Nucleotide Archive: Boloria selene (silver meadow fritillary). Accession number PRJEB43033; https://identifiers.org/ena.embl/PRJEB43033.
The genome sequence is released openly for reuse. The B. selene genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. The genome will be annotated and presented through the Ensembl pipeline at the European Bioinformatics Institute. All raw sequence data and the assembly have been deposited in INSDC databases. Raw data and assembly accession identifiers are reported in Table 1.
Author information
Members of the Darwin Tree of Life Barcoding collective are listed here: https://doi.org/10.5281/zenodo.5744972.
Members of the Wellcome Sanger Institute Tree of Life programme are listed here: https://doi.org/10.5281/zenodo.6125027.
Members of Wellcome Sanger Institute Scientific Operations: DNA Pipelines collective are listed here: https://doi.org/10.5281/zenodo.5746904.
Members of the Tree of Life Core Informatics collective are listed here: https://doi.org/10.5281/zenodo.6125046.
Members of the Darwin Tree of Life Consortium are listed here: https://doi.org/10.5281/zenodo.5638618.
Funding Statement
This work was supported by Wellcome through core funding to the Wellcome Sanger Institute (206194) and the Darwin Tree of Life Discretionary Award (218328).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
[version 1; peer review: 2 approved]
References
- Allio R, Schomaker-Bastos A, Romiguier J, et al. : MitoFinder: Efficient Automated Large-Scale Extraction of Mitogenomic Data in Target Enrichment Phylogenomics. Mol Ecol Resour. 2020;20(4):892–905. 10.1111/1755-0998.13160 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Challis R, Richards E, Rajan J, et al. : BlobToolKit - Interactive Quality Assessment of Genome Assemblies. G3 (Bethesda). 2020;10(4):1361–74. 10.1534/g3.119.400908 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng H, Concepcion GT, Feng X, et al. : Haplotype-Resolved de Novo Assembly Using Phased Assembly Graphs with Hifiasm. Nat Methods. 2021;18(2):170–75. 10.1038/s41592-020-01056-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chow W, Brugger K, Caccamo M, et al. : gEVAL - a web-based browser for evaluating genome assemblies. Bioinformatics. 2016;32(16):2508–10. 10.1093/bioinformatics/btw159 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dapporto L, Dennis RLH: The generalist–specialist continuum: Testing predictions for distribution and trends in British butterflies. Biol Conserv. 2013;157:229–36. 10.1016/j.biocon.2012.09.016 [DOI] [Google Scholar]
- Federley H: Chromosomenzahlen Finnlän-discher Lepidopteren. Hereditas. 1938;24(4):397–464. 10.1111/j.1601-5223.1938.tb03219.x [DOI] [Google Scholar]
- Filz KJ, Engler JO, Stoffels J, et al. : Missing the target? A critical view on butterfly conservation efforts on calcareous grasslands in south-western Germany. Biodivers Conserv. 2012;22(10):2223–41. 10.1007/s10531-012-0413-0 [DOI] [Google Scholar]
- Fox R, Brereton TM, Asher J, et al. : The state of the UK’s butterflies 2015.Wareham, Dorset, Butterfly Conservation,2015. Reference Source [Google Scholar]
- Garrison E, Marth G: Haplotype-Based Variant Detection from Short-Read Sequencing.July, arXiv: 1207.3907,2012. Reference Source [Google Scholar]
- Ghurye J, Rhie A, Walenz BP, et al. : Integrating Hi-C Links with Assembly Graphs for Chromosome-Scale Assembly. PLoS Comput Biol. 2019;15(8):e1007273. 10.1371/journal.pcbi.1007273 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guan D, McCarthy SA, Wood J, et al. : Identifying and Removing Haplotypic Duplication in Primary Genome Assemblies. Bioinformatics. 2020;36(9):2896–98. 10.1093/bioinformatics/btaa025 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Howe K, Chow W, Collins J, et al. : Significantly Improving the Quality of Genome Assemblies through Curation. GigaScience. 2021;10(1):giaa153. 10.1093/gigascience/giaa153 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kerpedjiev P, Abdennur N, Lekschas F, et al. : HiGlass: Web-Based Visual Exploration and Analysis of Genome Interaction Maps. Genome Biol. 2018;19(1):125. 10.1186/s13059-018-1486-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuussaari M, Saarinen M, Korpela EL, et al. : Higher mobility of butterflies than moths connected to habitat suitability and body size in a release experiment. Ecol Evol. 2014;4(19):3800–11. 10.1002/ece3.1187 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lorković Z: Die Chromosomenzahlen in der Spermatogenese der Tagfalter. Chromosoma. 1941;2:155–91. 10.1007/BF00325958 [DOI] [Google Scholar]
- Manni M, Berkeley MR, Seppey M, et al. : BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol Biol Evol. 2021;38(10):4647–54. 10.1093/molbev/msab199 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maresova J, Habel JC, Neve G, et al. : Cross-continental phylogeography of two Holarctic Nymphalid butterflies, Boloria eunomia and Boloria selene. PLoS One. 2019;14(3):e0214483. 10.1371/journal.pone.0214483 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rao SS, Huntley MH, Durand NC, et al. : A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell. 2014;159(7):1665–80. 10.1016/j.cell.2014.11.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roy DB, Oliver TH, Botham MS, et al. : Similarities in butterfly emergence dates among populations suggest local adaptation to climate. Glob Chang Biol. 2015;21(9):3313–22. 10.1111/gcb.12920 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stewart K, Bourn N, Watts K, et al. : The persistence of a key butterfly species Boloria selene (small pearl-bordered fritillary) in Clocaenog Forest and the impact of landscape-scale processes.In: Landscape Ecology of Trees and Forests–Proceedings of the 2004 IALE (UK). Conference, held at Cirencester Agricultural College, Gloucestershire, 21st–24th June 2004. IALE (UK),2004. Reference Source [Google Scholar]
- Swengel AB: Habitat Associations of Sympatric Violet-Feeding Fritillaries (Euptoieta, Speyeria, Boloria) (Lepidoptera: Nymphalidae) in Tallgrass Prairie. The Grate Lakes Entomologist. 1997;30(1). Reference Source [Google Scholar]
- Tudor O, Dennis RLH, Greatorex-Deavies JN, et al. : Flower preferences of woodland butterflies in the UK: nectaring specialists are species of conservation concern. Biol Conserv. 2004;199(3):397–403. 10.1016/j.biocon.2004.01.002 [DOI] [Google Scholar]
- Uliano-Silva M, Nunes JGF, Krasheninnikova K, et al. : marcelauliano/MitoHiFi: mitohifi_v2.0.2021. 10.5281/zenodo.5205678 [DOI] [Google Scholar]
- Van Swaay C, Wynhoff I, Verovnik R, et al. : IUCN Red List of Threatened Species: Melitaea Cinxia. IUCN Red List of Threatened Species. 2010. Reference Source [Google Scholar]