Abstract
The clouded apollo (Parnassius mnemosyne) is a palearctic butterfly distributed over a large part of western Eurasia, but population declines and fragmentation have been observed in many parts of the range. The development of genomic tools can help to shed light on the genetic consequences of the decline and to make informed decisions about direct conservation actions. Here, we present a high-contiguity, chromosome-level genome assembly of a female clouded apollo butterfly and provide detailed annotations of genes and transposable elements. We find that the large genome (1.5 Gb) of the clouded apollo is extraordinarily repeat rich (73%). Despite that, the combination of sequencing techniques allowed us to assemble all chromosomes (nc = 29) to a high degree of completeness. The annotation resulted in a relatively high number of protein-coding genes (22,854) compared with other Lepidoptera, of which a large proportion (21,635) could be assigned functions based on homology with other species. A comparative analysis indicates that overall genome structure has been largely conserved, both within the genus and compared with the ancestral lepidopteran karyotype. The high-quality genome assembly and detailed annotation presented here will constitute an important tool for forthcoming efforts aimed at understanding the genetic consequences of fragmentation and decline, as well as for assessments of genetic diversity, population structure, inbreeding, and genetic load in the clouded apollo butterfly.
Keywords: Parnassius mnemosyne, clouded apollo butterfly, genome assembly, gene annotation, repeat annotation, genome size
Significance.
The quality of the assembly of the clouded apollo (Parnassius mnemosyne) genome and the annotation of genes are well in line with the standards of the Earth BioGenome Project (https://www.earthbiogenome.org/analysis-standards-report). The analyses revealed a comparatively large genome with a high repeat content for a butterfly and considerable synteny compared with both a close and a distant lepidopteran relative. We therefore predict that the genomic resources provided here will be important tools for forthcoming investigations aimed at understanding demographic history, loss of genetic variation, population structure, and mutation load in the clouded apollo butterfly in particular and for comparative genomic approaches in Lepidoptera in general.
Introduction
Genomic data can be used to inform practical conservation efforts of threatened species, a research field generally referred to as conservation genomics (e.g. Hohenlohe et al. 2021; DeWoody et al. 2022). Genomic approaches have, for example, proven useful to infer demographic histories, to estimate genetic diversity and inbreeding, and to quantify genetic load (Supple and Shapiro 2018; Wright et al. 2020; Bortoluzzi et al. 2023; Theissinger et al. 2023), information that is crucial for informed conservation actions for species that are rare or endangered and/or difficult to monitor with traditional methods (DeWoody et al. 2022; Hogg et al. 2022). Conservation genomic efforts have traditionally been focused on flagship species that comprise a small proportion of the extant biodiversity (Bortoluzzi et al. 2023). Butterflies and moths (Lepidoptera), for example, comprise over 150,000 species (Triant et al. 2018), but compared with mammals and birds, they are extensively overlooked in assessments of species under conservation concern (Cardoso et al. 2011; Podsiadlowski et al. 2021; Duffus and Morimoto 2022; Bortoluzzi et al. 2023). Many lepidopterans have also been reported to be rapidly declining in number (e.g. Cardoso et al. 2020; Warren et al. 2021), a major concern since they often constitute key indicator species for ecosystem functioning (Schowalter et al. 2018; Warren et al. 2021).
Several initiatives have been started, aiming at generating genomic data for specific organism groups (e.g. 5,000 insect genomes [i5k]; Sills et al. 2011), for biodiversity in certain countries (e.g. The Darwin Tree of Life [DToL] Project Consortium et al. 2022) or larger geographical regions (e.g. The European Reference Genome Atlas [ERGA]; Mc Cartney et al. 2023). While such initiatives are promising for future access to genome information for a large proportion of the extant biodiversity, targeted efforts are needed to swiftly get data for species under immediate conservation concern. As part of the ERGA pilot effort (Mc Cartney et al. 2023), we present a high-contiguity genome assembly and detailed annotation of the clouded apollo (Parnassius mnemosyne), a butterfly of general conservation concern (Gratton et al. 2008; Bolotov et al. 2013; Talla et al. 2023). The clouded apollo is widely distributed across the western Palearctic, but population numbers have declined dramatically in many regions (Kuussaari et al. 2015; Johansson et al. 2017; Talla et al. 2023), likely a consequence of a decline in the preferred habitat—mosaics of low-intensity grazed pastures intermixed with sheltering vegetation and rich access to host plants Corydalis sp. (Konvička and Kuras 1999; Luoto et al. 2001; Kuussaari et al. 2015; Johansson et al. 2017). Previous genetic analyses have unveiled considerable population structure, both across the distribution range in general (Gratton et al. 2008) and between populations in specific regions (e.g. Sweden; Talla et al. 2023), but currently available data are insufficient for detailed assessment of demography, genetic load, and levels of genetic diversity. The chromosome-level genome assembly we developed here will therefore be a useful tool for forthcoming investigations of inbreeding, local adaptation and loss of genetic diversity in the clouded apollo, and comparative genomic analyses in Lepidoptera in general (Bortoluzzi et al. 2023).
Results
Assembly statistics are presented in Table 1. We predicted a genome size of ∼1.4 Gbp and a genome-wide heterozygosity (π) of 0.74% (supplementary fig. S1, Supplementary Material online). The mean GC content was 37.8%, and nucleotide composition was stable across all chromosomes (Fig. 1). The assembly had high contiguity, with the longest 29 scaffolds (≥25 Mbp) comprising 95.2% of the total length. Of the 29 longest scaffolds, 10 contained telomere sequences at both ends, 13 at 1 end, and only 5 lacked telomere sequences completely (Fig. 1). We estimated the quality metrics recommended by the Earth BioGenome Project (https://www.earthbiogenome.org/analysis-standards-report) and found high base pair quality scores and a low rate of false duplications (supplementary table S1, Supplementary Material online).
Table 1.
Summary statistics for the P. mnemosyne genome assembly
| Assembly statistics | Value |
|---|---|
| Assembly length (bp) | 1,494,393,142 |
| # Scaffolds | 169 |
| Scaffold N50 (Mb) | 49.64 |
| Scaffold N90 (Mb) | 34.77 |
| # Contigs | 285 |
| Contig N50 (Mb) | 23.93 |
| Contig N90 (Mb) | 4.73 |
| GC content (%) | 38 |
| Gene annotation | |
| # Protein-coding genes | 22,854 |
| # Genes with functional information | 21,635 |
| # tRNA genes | 1,076 |
| # Putative pseudogenes | 4,788 |
| # Putative ncRNA genes | 16,610 |
| BUSCO analysis | |
| Complete | 99.0% |
| Single-copy | 97.8% |
| Duplicated | 1.2% |
| Fragmented | 0.2% |
| Missing | 0.8% |
bp, base pairs; Mb, mega bases; #, number of. The BUSCO analysis was based on 5,286 lepidopteran genes.
Fig. 1.
The structure of the clouded apollo (P. mnemosyne) genome assembly for the 29 largest scaffolds (>25 Mb). Scaffolds are scaled by relative size and regional repeat (heat map) and GC content (blue polygons to the right of each scaffold) are indicated. Colors below each scaffold indicate if the scaffold contains both telomeres (n = 10, orange), 1 telomere (n = 13, blue), or no telomeres (n = 6, pale yellow). Dark horizontal bars indicate the center of each scaffold.
The results from the Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis indicate that the assembly completeness is very high, with 99.0% of the 5,286 analyzed genes completely covered (Table 1, supplementary table S2, Supplementary Material online). A BUSCO analysis using the bacteria_odb10 gene set did not identify any genes of bacterial origin in the final assembly.
The mtDNA contig was 15,425 bp, with 18.3% GC. The gene composition was typical for an insect mtDNA, with 13 protein-coding genes, 22 transfer RNA (tRNA) genes, and 2 ribosomal RNA (rRNA) genes (supplementary fig. S2, Supplementary Material online). The gene order was similar to that of the close relative Parnassius apollo (Chen et al. 2014) and more distantly related Lepidoptera such as the Monarch (Danaus plexippus; https://www.ncbi.nlm.nih.gov/nuccore/NC_021452.1) and the silkworm (Bombyx mori; https://www.ncbi.nlm.nih.gov/nuccore/NC_002355.1).
The automatized repeat annotation revealed that the majority of classified repeats was retroelements but also a high proportion of unclassified repeats (supplementary table S3, Supplementary Material online). We therefore made a more detailed analysis of repeats using partly manually curated repeat libraries for other butterfly species. This analysis showed that the overall proportion of repeats was significantly higher (72.90%) than with the automatized pipeline (61.53%) (supplementary table S4, Supplementary Material online). Both the clouded apollo and the close relative, apollo P. apollo (Podsiadlowski et al. 2021), have comparatively large genome sizes and a high repeat content. We therefore compared the relative contribution of different repeat classes to the genome sizes in both species using the same pipeline. This analysis revealed that the apollo has a higher repeat content than the clouded apollo and that the difference predominantly has been driven by expansions of long interspersed nuclear elements (LINEs), long terminal repeats (LTRs), and DNA transposons in the apollo (supplementary fig. S3, Supplementary Material online).
The structural annotation of coding sequences identified 22,854 genes and 27,378 transcripts (Table 1). A completeness assessment of the predicted proteins (longest isoform used) indicated complete coverage of 96.3% of the 5,286 lepidopteran BUSCOs (supplementary table S5, Supplementary Material online). A total of 21,635 (94.7%) of the identified genes could be assigned a functional annotation with 14,890 unique gene names (Table 1, supplementary table S6, Supplementary Material online). We also predicted 1,076 putative tRNA genes, 4,788 putative pseudogenes, and 16,610 ncRNAs (including the tRNAs; Table 1).
The alignments between larger scaffolds revealed a high degree of synteny conservation between the 2 Apollo species (supplementary fig. S4, Supplementary Material online). The comparison between P. mnemosyne and B. mori unveiled a high degree of collinearity of aligned fragments within most scaffolds (supplementary fig. S5, Supplementary Material online).
Discussion
The assembly of the clouded apollo genome resulted in 29 chromosome-sized scaffolds where the majority seems to have sequence information including the telomeric repeats (data not shown). This is compatible with a chromosome-scale assembly since the clouded apollo has been shown to have 29 chromosomes (Vlašánek et al. 2017). The shorter contigs, not associated with the larger scaffolds, constituted a minute fraction of the entire assembly and can be omitted from most types of downstream analyses without risks for biased results. We predicted a considerably larger set of coding genes (∼22,800) compared with most other lepidopterans with annotated genome sequences (e.g. Shipilina et al. 2022; Smolander et al. 2022; Höök et al. 2023). However, the gene number was lower than the estimated number (∼28,300) in the close relative P. apollo (Podsiadlowski et al. 2021), likely a consequence of the more fragmented genome assembly and/or the less strict annotation of repeats in this species (Shipilina et al. 2022). The repeat content is high in P. mnemosyne compared with other lepidopterans, and this can lead to overestimation of the number of coding genes (e.g. Baril and Hayward 2022; Shipilina et al. 2022; Höök et al. 2023). It should also be noted that, with the exception of well-conserved families, many of the predicted ncRNAs should be considered with care.
The detailed repeat annotation of both Parnassius species confirmed that the extreme genome sizes are a consequence of accumulation of a high number of transposable elements (TEs) (Podsiadlowski et al. 2021). LINEs in particular constitute a large portion of the genome in both species, but also LTRs, DNA transposons, and short interspersed nuclear elements (SINEs) are present in high numbers. Another Parnassius species (Parnassius orleans) has an estimated genome size of 1.2 Gb (Liu et al. 2020), which shows that large genomes are ubiquitous in the genus and that the accumulation of TEs predominantly has occurred before the radiation of currently extant Parnassius lineages (Liu et al. 2020; Wiemers et al. 2020; Podsiadlowski et al. 2021). However, in contrast to other butterflies with large genomes and high TE content (e.g. Leptidea sp.; Talla et al. 2017; Höök et al. 2023) and in agreement with previous results (Podsiadlowski et al. 2021), our results show that the overall genome structure is relatively conserved in Parnassius.
Methods
DNA and RNA Extractions and Library Preparations
Two adult female clouded apollos from a captive population, recently established (2019) from a natural population (Blekinge; Talla et al. 2023) to aid the conservation actions in Sweden (https://en.nordensark.se/conservation/), were used for sequencing (sampled 2021 May 11). One individual (PM1) was used for PacBio HiFi sequencing and the other individual (PM2) for Illumina Hi-C and RNA-sequencing. For the PacBio HiFi and Illumina Hi-C libraries, frozen muscle tissue from the thorax was ground with pestle and mortar. Muscle tissue was also used for RNA-sequencing with Illumina short inserts and PacBio Iso-Seq. High molecular weight DNA was extracted from the muscle tissue using SDS lysis, followed by phenol:chloroform:isoamyl alcohol (25:24:1) treatment. DNA was precipitated with a high salt/low ethanol solution, washed twice with 70% ethanol, and eluted in TE buffer. RNA was also extracted from muscle tissue using a standard TRIzol protocol including DNase treatment (RNeasy, Invitrogen). Eluted RNA was stored at −70 °C. The PacBio HiFi library preparation and sequencing was performed at the National Genomics Infrastructure (NGI) node in Uppsala, Sweden. All DNA and RNA extractions were done at Uppsala Genome Center and the University of Antwerp ERGA hub. Library preparation for OmniC was done at the University of Antwerp and RNA sequencing was done by the University of Florence hub. Both RNA-Seq libraries and the Illumina Hi-C library were prepared at the ERGA node in Antwerp, Belgium, and sequenced at the node in Florence, Italy. Both individuals were kept as voucher specimens (PM1; ERGA-ID: ilParMnem1 as UPSZTY 184741 and PM2 as UPSZTY 184901) and stored at the Evolution Museum at Uppsala University (supplementary fig. S6, Supplementary Material online).
Genome Assembly
The genome was assembled using a combination of PacBio (HiFi) and Illumina Hi-C data (supplementary table S7, Supplementary Material online). The HiFi reads had a coverage of ∼30× and were assembled using Hifiasm v0.16.0 (Cheng et al. 2021). Purge_Dups v1.2.5 (Guan et al. 2020) was used to remove putative duplications. The Hi-C sequence reads were aligned to the purged assembly and processed with pairtools v0.3.0 (Abdennur et al. 2023), and contigs were scaffolded with YaHS v1.1a (Zhou et al. 2022). Hi-C scaffolds were manually edited with JBAT v2.20.00 (Dudchenko et al. 2018), using the Hi-C contact maps and telomere motif annotation from tidk v0.2.31 (https://github.com/tolkit/telomeric-identifier) to produce the final assembly.
Potential contamination was assessed using Mash v2.3 (Ondov et al. 2019) and the National Center for Biotechnology Information (NCBI) RefSeq database (https://gembox.cbcb.umd.edu/mash/refseq.genomes.k21s1000.msh), and no substantial contamination was found. Genome properties were estimated from both the HiFi reads and the final assembly using the k-mer counter FastK (https://github.com/thegenemyers/FASTK) with 31 mers and GeneScopeFK (https://github.com/thegenemyers/GENESCOPE.FK), a modified version of GenomeScope v2.0 (Ranallo-Benavidez et al. 2020). Additionally, MERQURY.FK (https://github.com/thegenemyers/MERQURY.FK) was used to estimate the k-mer completeness and the false duplication rate. Assembly quality was assessed with general statistics (Table 1), and completeness and duplication rate were evaluated using BUSCO v5.4.6 (Manni et al. 2021), with the lepidoptera_odb10 gene set from OrthoDB v10 (Kriventseva et al. 2018). The mitochondrial genome was recovered from the primary assembly after haplotig removal using MitoHiFi v2.2 (Uliano-Silva et al. 2023) and annotated using MitoFinder v1.4.1 (Allio et al. 2020). The mtDNA assembly was reoriented to match the P. apollo mtDNA assembly (NCBI accession NC_024727.1; Chen et al. 2014).
Genome Annotation
A custom repeat library was created with RepeatModeler2 v2.0.2a (Flynn et al. 2020), and the genome was soft-masked with RepeatMasker (https://www.repeatmasker.org/) v4.1.5 prior to gene annotation. The candidate repeats obtained by RepeatModeler were vetted against UniProt/SwissProt to exclude nucleotide motifs stemming from protein-coding sequences. For comparative purposes, we also generated a detailed annotation of the different classes of repeats present in the genomes of both the apollo (Podsiadlowski et al. 2021) and the clouded apollo butterfly and estimated the proportions of the genomes covered by each class of repeat, again using RepeatModeler2 v2.0.2a (Flynn et al. 2020), but including manually curated repeat libraries from other butterfly species (Shipilina et al. 2022; Höök et al. 2023).
Gene prediction was performed in 3 steps that were later combined, incorporating standard RNA-seq, protein sequences from multiple organisms, and PacBio Iso-Seq as evidence. (i) The RNA-seq reads for P. mnemosyne (Table 1) were aligned to the assembly with HiSat2 v2.1.0 (Kim et al. 2019), and BRAKER v3.0.3 (Gabriel et al. 2023) was used to extract splicing signals and to train and predict genes using Augustus. (ii) Arthropod proteins from OrthoDB v11 (https://bioinf.uni-greifswald.de/bioinf/partitioned_odb11; Kriventseva et al. 2018) were aligned to the assembly with miniprot v0.10-r226-dirty (Li, 2023) and used to train and predict genes with Augustus within the GALBA pipeline v1.0.6 (Brůna et al. 2023). (iii) High-quality transcripts from PacBio Iso-Seq were aligned to the genome using minimap2 v2.26 (https://github.com/lh3/minimap2/releases) and used to obtain gene predictions with GeneMarkS-T v5.1 (Tang et al. 2015), following the long-read protocol from BRAKER (https://github.com/Gaius-Augustus/BRAKER/blob/master/docs/long_reads/long_read_protocol.md). Finally, the gene models from BRAKER, GALBA, and GeneMarkS-T were combined and filtered using TSEBRA (long_reads branch, commit 1f2614c; Gabriel et al. 2021). The combined gene models were processed with AGAT v1.2.0 (https://zenodo.org/record/8178877) to remove overlapping genes, and functional annotation was done using the NBIS functional_annotation nextflow pipeline v2.0.0 (https://github.com/NBISweden/pipelines-nextflow) which uses BLAST for similarity searches between the annotated proteins and the UniProtKB/SwissProt database (Magrane and UniProt Consortium 2011) (downloaded 2022 to 2012; 568,363 proteins), InterProScan (Jones et al. 2014) to query the proteins against InterPro v59-91 (Paysan-Lafosse et al. 2022), and merge results using AGAT (https://zenodo.org/record/8178877). To reduce potential false-positives, single-exon genes without any InterPro annotation were removed. Each predicted protein sequence was blasted against the UniProt/SwissProt reference data set (downloaded 2022 to 2012) in order to infer, when available, the gene and protein name. The inference was made using the best hit with a maximum e-value cutoff of 1*10−6. Gene names that were encountered several times were suffixed to keep each gene name unique (complete list of gene names is available upon request). tRNAs were predicted using tRNAscan-SE v2.0.12 (https://github.com/UCSC-LoweLab/tRNAscan-SE) with default parameters for eukaryotes. Noncoding RNAs (ncRNAs) were predicted with Infernal v1.1.4 (Nawrocki and Eddy 2013) and the Rfam v14.1 covariance models (Nawrocki et al. 2014).
To assess synteny conservation, whole genome alignments between P. mnemosyne and P. apollo and B. mori, respectively, were done using D-GENIES (Cabanettes and Klopp 2018) with default settings. The scaffold plot in Fig. 1 was generated with the R package RIdeogram (Hao et al. 2020).
Supplementary Material
Acknowledgments
The authors would also like to acknowledge the support from the Science for Life Laboratory (SciLifeLab), NGI, and Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) for providing assistance in massive parallel sequencing and computational infrastructure. The computations were performed on resources provided by the SNIC through the Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) under the Project SNIC 2020-5-20. We thank Veronika Mrazek for the help with the dissections. We thank the ERGA hubs at the University of Antwerp (Henrique Leitão, Genevieve Diedericks, Hannes Svardal) and University of Florence (Claudio Ciofi) for help with OmniC and RNA-seq. Agilent, Dovetail, and Illumina kits were sponsored by the manufacturers.
Contributor Information
Jacob Höglund, Animal Ecology Program, Department of Ecology and Genetics (IEG), Uppsala University, Uppsala SE-752 36, Sweden.
Guilherme Dias, National Bioinformatics Infrastructure Sweden (NBIS), Science for Life Laboratory, Uppsala 752 37, Sweden.
Remi-André Olsen, Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna 17165, Sweden.
André Soares, National Bioinformatics Infrastructure Sweden (NBIS), Science for Life Laboratory, Uppsala 752 37, Sweden.
Ignas Bunikis, Uppsala Genome Center, Department of Immunology, Genetics and Pathology, Uppsala University, National Genomics Infrastructure hosted by SciLifeLab, Uppsala, Sweden; Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala 752 37, Sweden.
Venkat Talla, Evolutionary Biology Program, Department of Ecology and Genetics (IEG), Uppsala University, Uppsala SE-752 36, Sweden.
Niclas Backström, Evolutionary Biology Program, Department of Ecology and Genetics (IEG), Uppsala University, Uppsala SE-752 36, Sweden.
Supplementary Material
Supplementary material is available at Genome Biology and Evolution online.
Author Contributions
J.H.: conceived of the project, project management, manuscript writing, and acquisition of funding. G.D.: assembly QC, structural and functional annotation, and manuscript writing. R.-A.O.: Hi-C scaffolding and Hi-C curation. A.S.: assembly QC, assembly curation, and mitogenome assembly. I.B.: contig assembly with Hifiasm. V.T.: analysis of repeat content and report writing. N.B.: conceived of the project, project management, synteny analysis, manuscript writing, and acquisition of funding.
Funding
This work was supported by the Swedish Research Council (VR research grant #019-04791 to N.B.), NBIS/SciLifeLab long-term bioinformatics support (WABI to N.B.), and Swedish Rescue Program for P. mnemosyne through the local administrative board (Länsstyrelsen) of Blekinge (to J.H. and N.B.).
Data Availability
All data have been deposited at the European Nucleotide Archive under accession PRJEB67749. Scripts used for the TE analysis are available at GitHub (https://github.com/EBC-butterfly-genomics-team/clouded_apollo_genomics). Assembly and annotation files are also available via the ERGA portal (https://portal.erga-biodiversity.eu/data_portal/Parnassius%20mnemosyne).
Literature Cited
- Abdennur N, Fudenberg G, Flyamer IM, Galitsyna AA, Goloborodko A, Imakaev M, Venev SV. Pairtools: from sequencing data to chromosome contacts. bioRxiv 2023.02.13.528389. 10.1101/2023.02.13.528389., 15 February 2023, preprint: not peer reviewed. [DOI] [PMC free article] [PubMed]
- Allio R, Schomaker-Bastos A, Romiguier J, Prosdocimi F, Nabholz B, Delsuc F. MitoFinder: efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics. Mol Ecol Resour. 2020:20(4):892–905. 10.1111/1755-0998.13160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baril T, Hayward A. Migrators within migrators: exploring transposable element dynamics in the monarch butterfly, Danaus plexippus. Mob DNA. 2022:13(1):5. 10.1186/s13100-022-00263-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bolotov I, Gofarov M, Frolov A, Kogut Y. Northern boundary of the range of the clouded apollo butterfly Parnassius mnemosyne (L.) (Papilionidae): climate influence or degradation of larval host plants? Nota Lepidopterol. 2013:36:19–33. [Google Scholar]
- Bortoluzzi C, Wright CJ, Lee S, Cousins T, Genez TAL, Thybert D, Martin FJ, Haggerty L, The Darwin Tree of Life Project Consortium, M., Blaxter M, et al. (2023). Lepidoptera genomics based on 88 chromosomal reference sequences informs population genetic parameters for conservation. bioRxiv 2023.04.14.536868. 10.1101/2023.04.14.536868., 14 April, 2023, preprint: not peer reviewed. [DOI]
- Brůna T, Li H, Guhlin J, Honsel D, Herbold S, Stanke M, Nenasheva N, Ebel M, Gabriel L, Hoff KJ. Galba: genome annotation with miniprot and AUGUSTUS. BMC Bioinformatics. 2023:24(1):327. 10.1186/s12859-023-05449-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cabanettes F, Klopp C. D-GENIES: dot plot large genomes in an interactive, efficient and simple way. PeerJ. 2018:6:e4958. 10.7717/peerj.4958. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cardoso P, Barton PS, Birkhofer K, Chichorro F, Deacon C, Fartmann T, Fukushima CS, Gaigher R, Habel JC, Hallmann CA, et al. Scientists’ warning to humanity on insect extinctions. Biol Conserv. 2020:242:108426. 10.1016/j.biocon.2020.108426. [DOI] [Google Scholar]
- Cardoso P, Borges PAV, Triantis KA, Ferrández MA, Martín JL. Adapting the IUCN Red List criteria for invertebrates. Biol Conserv. 2011:144(10):2432–2440. 10.1016/j.biocon.2011.06.020. [DOI] [Google Scholar]
- Chen Y, Huang D, Wang Y, Zhu C, Hao J. The complete mitochondrial genome of the endangered apollo butterfly, Parnassius apollo (Lepidoptera: Papilionidae) and its comparison to other Papilionidae species. J Asia Pac Entomol. 2014:17(4):663–671. 10.1016/j.aspen.2014.06.002. [DOI] [Google Scholar]
- Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with Hifiasm. Nat Methods. 2021:18(2):170–175. 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DeWoody JA, Jeon JY, Bickham JW, Heenkenda EJ, Janjua S, Lamka GF, Mularo AJ, Black A, Brüniche-Olsen A, Willoughby JR. The threatened species imperative: conservation assessments would benefit from population genomic insights. Proc Natl Acad Sci. 2022:119(35):e2210685119. 10.1073/pnas.2210685119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dudchenko O, Shamim MS, Batra SS, Durand NC, Musial NT, Mostofa R, Pham M, Hilaire BGS, Yao W, Stamenova E, et al. The Juicebox assembly tools module facilitates de novo assembly of mammalian genomes with chromosome-length scaffolds for under $1000. bioRxiv 254797. 10.1101/254797, 28 January, 2018, preprint: not peer reviewed. [DOI]
- Duffus NE, Morimoto J. Current conservation policies in the UK and Ireland overlook endangered insects and are taxonomically biased towards Lepidoptera. Biol Conserv. 2022:266:109464. 10.1016/j.biocon.2022.109464. [DOI] [Google Scholar]
- Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, Smit AF. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci. 2020:117(17):9451–9457. 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gabriel L, Brůna T, Hoff KJ, Ebel M, Lomsadze A, Borodovsky M, Stanke M. BRAKER3: fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA. bioRxiv 2023.06.10.544449. 10.1101/2023.06.10.544449., 12 June, 2023, preprint: not peer reviewed. [DOI] [PMC free article] [PubMed]
- Gabriel L, Hoff KJ, Brůna T, Borodovsky M, Stanke M. TSEBRA: transcript selector for BRAKER. BMC Bioinformatics. 2021:22(1):566. 10.1186/s12859-021-04482-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gratton P, Konopinski MK, Sbordoni V. Pleistocene evolutionary history of the clouded apollo (Parnassius mnemosyne): genetic signatures of climate cycles and a ‘time-dependent’ mitochondrial substitution rate. Mol Ecol. 2008:17(19):4248–4262. 10.1111/j.1365-294X.2008.03901.x. [DOI] [PubMed] [Google Scholar]
- Guan D, McCarthy SA, Wood J, Howe K, Wang Y, Durbin R. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 2020:36(9):2896–2898. 10.1093/bioinformatics/btaa025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hao Z, Lv D, Ge Y, Shi J, Weijers D, Yu G, Chen J. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. Peer J Comput Sci. 2020:6:e251. 10.7717/peerj-cs.251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hogg CJ, Ottewell K, Latch P, Rossetto M, Biggs J, Gilbert A, Richmond S, Belov K. Threatened Species Initiative: empowering conservation action using genomic resources. Proc Natl Acad Sci. 2022:119(4):e2115643118. 10.1073/pnas.2115643118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hohenlohe PA, Funk WC, Rajora OP. Population genomics for wild-life conservation and management. Mol Ecol. 2021:30(1):62–82. 10.1111/mec.15720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Höök L, Näsvall K, Vila R, Wiklund C, Backström N. High-density linkage maps and chromosome level genome assemblies unveil direction and frequency of extensive structural rearrangements in wood white butterflies (Leptidea spp.). Chromosome Res. 2023:31(1):2. 10.1007/s10577-023-09713-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johansson V, Knape J, Franzén M. Population dynamics and future persistence of the clouded apollo butterfly in southern Scandinavia: the importance of low intensity grazing and creation of habitat patches. Biol Conserv. 2017:206:120–131. 10.1016/j.biocon.2016.12.029. [DOI] [Google Scholar]
- Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014:30(9):1236–1240. 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019:37(8):907–915. 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Konvička M, Kuras T. Population structure, behaviour and selection of oviposition sites of an endangered butterfly, Parnassius mnemosyne, in Litovelské Pomoravíl. Czech Republic. J Insect Conserv. 1999:3(3):211–223. https://rdcu.be/dnSNW 10.1023/A:1009641618795. [DOI] [Google Scholar]
- Kriventseva EV, Kuznetsov D, Tegenfeldt F, Manni M, Dias R, Simão FA, Zdobnov EM. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 2018:47(D1):D807–D811. 10.1093/nar/gky1053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuussaari M, Heikkinen RK, Heliölä J, Luoto M, Mayer M, Rytteri S, vonBagh P. Successful translocation of the threatened clouded apollo butterfly (Parnassius mnemosyne) and metapopulation establishment in southern Finland. Biol Conserv. 2015:190:51–59. 10.1016/j.biocon.2015.05.011. [DOI] [Google Scholar]
- Li H. Protein-to-genome alignment with miniprot. Bioinformatics. 2023:39(1):btad014. 10.1093/bioinformatics/btad014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu G, Chang Z, Chen L, He J, Dong Z, Yang J, Lu S, Zhao R, Wan W, Ma G, et al. Genome size variation in butterflies (Insecta, Lepidotera, Papilionoidea): a thorough phylogenetic comparison. Syst Entomol. 2020:45(3):571–582. 10.1111/syen.12417. [DOI] [Google Scholar]
- Luoto M, Kuussaari M, Rita H, Salminen J, von Bonsdorff T. Determinants of distribution and abundance in the clouded apollo butterfly: a landscape ecological approach. Ecography 2001:24(5):601–617. 10.1111/j.1600-0587.2001.tb00494.x. [DOI] [Google Scholar]
- Magrane M, UniProt Consortium . UniProt knowledgebase: a hub of integrated protein data. Database (Oxford). 2011:2011(0):bar009. 10.1093/database/bar009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 2021:38(10):4647–4654. 10.1093/molbev/msab199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mc Cartney AM, Formenti G, Mouton A, Panis DD, Marins LS, Leitao HG, Diedericks G, Kirangwa J, Morselli M, Salces J, et al. The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics. bioRxiv 2023.09.25.559365. 10.1101/2023.09.25.559365., 28 September, 2023, preprint: not peer reviewed. [DOI]
- Nawrocki EP, Burge SW, Bateman A, Daub J, Eberhardt RY, Eddy SR, Floden EW, Gardner PP, Jones TA, Tate J, et al. Rfam 12.0: updates to the RNA families database. Nucleic Acids Res. 2014:43(D1):D130–D137. 10.1093/nar/gku1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013:29(22):2933–2935. 10.1093/bioinformatics/btt509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ondov BD, Starrett GJ, Sappington A, Kostic A, Koren S, Buck CB, Phillippy AM. Mash screen: high-throughput sequence containment estimation for genome discovery. Genome Biol. 2019:20(1):232. 10.1186/s13059-019-1841-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paysan-Lafosse T, Blum M, Chuguransky S, Grego T, Pinto BL, Salazar GA, Bileschi ML, Bork P, Bridge A, Colwell L, et al. InterPro in 2022. Nucleic Acids Res. 2022:51(D1):D418–D427. 10.1093/nar/gkac993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Podsiadlowski L, Tunström K, Espeland M, Wheat CW. The genome assembly and annotation of the apollo butterfly Parnassius apollo, a flagship species for conservation biology. Genome Biol Evol. 2021:13(8):evab122. 10.1093/gbe/evab122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ranallo-Benavidez TR, Jaron KS, Schatz MC. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 2020:11(1):1432. 10.1038/s41467-020-14998-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schowalter TD, Noriega JA, Tscharntke T. Insect effects on ecosystem services—introduction. Basic Appl Ecol. 2018:26:1–7. 10.1016/j.baae.2017.09.011. [DOI] [Google Scholar]
- Shipilina D, Näsvall K, Höök L, Vila R, Talavera G, Backström N. Linkage mapping and genome annotation give novel insights into gene family expansions and regional recombination rate variation in the painted lady (Vanessa cardui) butterfly. Genomics 2022:114(6):110481. 10.1016/j.ygeno.2022.110481. [DOI] [PubMed] [Google Scholar]
- Sills J, Robinson GE, Hackett KJ, Purcell-Miramontes M, Brown SJ, Evans JD, Goldsmith MR, Lawson D, Okamuro J, Robertson HM, et al. Creating a buzz about insect genomes. Science 2011:331(6023):1386–1386. 10.1126/science.331.6023.1386. [DOI] [PubMed] [Google Scholar]
- Smolander O-P, Blande D, Ahola V, Rastas P, Tanskanen J, Kammonen JI, Oostra V, Pellegrini L, Ikonen S, Dallas T, et al. Improved chromosome-level genome assembly of the Glanville fritillary butterfly (Melitaea cinxia) integrating Pacific Biosciences long reads and a high-density linkage map. GigaScience 2022:11:giab097. 10.1093/gigascience/giab097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Supple MA, Shapiro B. Conservation of biodiversity in the genomics era. Genome Biol. 2018:19(1):131. 10.1186/s13059-018-1520-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Talla V, Mrazek V, Höglund J, Backström N. Whole genome re-sequencing uncovers significant population structure and low genetic diversity in the endangered clouded apollo (Parnassius mnemosyne) in Sweden. Conserv Genet. 2023:24(3):305–314. 10.1007/s10592-023-01502-9. [DOI] [Google Scholar]
- Talla V, Suh A, Kalsoom F, Dincă V, Vila R, Friberg M, Wiklund C, Backström N. Rapid increase in genome size as a consequence of transposable element hyperactivity in wood-white (Leptidea) butterflies. Genome Biol Evol. 2017:9(10):2491–2505. 10.1093/gbe/evx163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tang S, Lomsadze A, Borodovsky M. Identification of protein coding regions in RNA transcripts. Nucleic Acids Res. 2015:43(12):e78–e78. 10.1093/nar/gkv227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The Darwin Tree of Life Project Consortium, Mieszkowska N, Palma FD, Holland P, Durbin R, Richards T, Berriman M, Kersey P, Hollingsworth P, Wilson W, et al. Sequence locally, think globally: The Darwin Tree of Life Project. Proc Natl Acad Sci. 2022:119(4):e2115642118. 10.1073/pnas.2115642118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Theissinger K, Fernandes C, Formenti G, Bista I, Berg PR, Bleidorn C, Bombarely A, Crottini A, Gallo GR, Godoy JA, et al. How genomics can help biodiversity conservation. Trends Genet. 2023:39(7):545–559. 10.1016/j.tig.2023.01.005. [DOI] [PubMed] [Google Scholar]
- Triant DA, Cinel SD, Kawahara AY. Lepidoptera genomes: current knowledge, gaps and future directions. Curr Opin Insect Sci. 2018:25:99–105. 10.1016/j.cois.2017.12.004. [DOI] [PubMed] [Google Scholar]
- Uliano-Silva M, Ferreira JGRN, Krasheninnikova K, Blaxter M, Mieszkowska N, Hall N, Holland P, Durbin R, Richards T, Kersey P, et al. MitoHiFi: a python pipeline for mitochondrial genome assembly from PacBio high fidelity reads. BMC Bioinformatics. 2023:24(1):288. 10.1186/s12859-023-05385-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vlašánek P, Bartonova AS, Marec F, Konvicka M. Elusive Parnassius mnemosyne (Linnaeus, 1758) larvae: habitat selection, sex determination and sex ratio (Lepidoptera: Papilionidae). SHILAP Revista de Lepidopterología. 2017:45:180. https://www.redalyc.org/journal/455/45553890003/movil/ [Google Scholar]
- Warren MS, Maes D, van Swaay CAM, Goffart P, Dyck HV, Bourn NAD, Wynhoff I, Hoare D, Ellis S. The decline of butterflies in Europe: problems, significance, and possible solutions. Proc Natl Acad Sci. 2021:118(2):e2002551117. 10.1073/pnas.2002551117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wiemers M, Chazot N, Wheat CW, Schweiger O, Wahlberg N. A complete time-calibrated multi-gene phylogeny of the European butterflies. ZooKeys 2020:938:97–124. 10.3897/zookeys.938.50878. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wright BR, Farquharson KA, McLennan EA, Belov K, Hogg CJ, Grueber CE. A demonstration of conservation genomics for threatened species management. Mol Ecol Resour. 2020:20(6):1526–1541. 10.1111/1755-0998.13211. [DOI] [PubMed] [Google Scholar]
- Zhou C, McCarthy SA, Durbin R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics. 2022:39(1):btac808. 10.1093/bioinformatics/btac808. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data have been deposited at the European Nucleotide Archive under accession PRJEB67749. Scripts used for the TE analysis are available at GitHub (https://github.com/EBC-butterfly-genomics-team/clouded_apollo_genomics). Assembly and annotation files are also available via the ERGA portal (https://portal.erga-biodiversity.eu/data_portal/Parnassius%20mnemosyne).

