Skip to main content
Scientific Data logoLink to Scientific Data
. 2024 Aug 3;11:844. doi: 10.1038/s41597-024-03666-6

Chromosome-level genome assembly of marmalade hoverfly Episyrphus balteatus (Diptera: Syrphidae)

Jichao Ji 1,2,3,✉,#, Yue Gao 1,4,#, Chao Xu 1,4, Kaixin Zhang 1,3, Dongyang Li 1,3, Bingbing Li 5, Lulu Chen 6, Mengxue Gao 5, Ningbo Huangfu 1,4, Punniyakotti Elumalai 1, Xueke Gao 1,3, Xiangzhen Zhu 1,3,, Li Wang 1,3,, Junyu Luo 1,3,, Jinjie Cui 1,3,
PMCID: PMC11298007  PMID: 39097648

Abstract

Episyrphus balteatus can provide dual ecosystem services including pest control and pollination, which the larvae are excellent predators of aphid pest whereas adults are efficient pollinator. In this study, we assembled a high-quality genome of E. balteatus from northern China geographical population at the chromosome level by using Illumina, PacBio long reads, and Hi-C technologies. The 467.42 Mb genome was obtained from 723 contigs, with a contig N50 of 9.16 Mb and Scaffold N50 of 118.85 Mb, and 90.25% (431.75 Mb) of the assembly was anchored to 4 pseudo-autosomes and one pseudo-heterosome. In total, 14,848 protein-coding genes were annotated, and 95.14% of genes were fully represented in NR, GO, KEGG databases. Besides, we also obtained the mitochondrial genome of E. balteatus of 16, 837 bp in length with 37 typical mitochondrial genes. Overall, this high-quality genome is valuable for evolutionary and genetic studies of E. balteatus and other Syrphidae hoverfly species.

Subject terms: Comparative genomics, DNA sequencing, Entomology

Background & Summary

Episyrphus balteatus, also known as the marmalade hoverfly, acts as pollinator and predator, which the larvae can control aphid pests and adults feeding on pollen and nectar, can be used as pollinators in plants (Fig. 1a). As a long-distance migratory insect, it travels high above between high- and low-latitude regions seasonally each year, transports billions of pollen grains, consumes trillions of aphids, and make billions of flower visits on the annual fluxes1. Considering that the populations of many pollinator insects, especially bees (Fig. 1b), are seriously declining2. Hoverflies including E. balteatus are becoming more important in the ecosystem3. To date, plentiful studies about genome, reproduction, behavior, and phylogenetic relationship have been performed on various hoverfly species. However, the lack of high-quality reference genomes has hindered deeper gene function exploration of this species.

Fig. 1.

Fig. 1

Morphological characteristics of Episyrphus balteatus (a), honeybee (b), and genome scope profiles of 19-mer analysis (c) and Hi-C interactive heatmap of four linkage pseudo-chromosomes in Episyrphus balteatus genome (d). Color indicates the intensity of the interaction signal. The darker the color, the higher the intensity.

In this study, we propose a high-quality genome-assembly at chromosome level and conduct a whole life cycle transcriptome of E. balteatus using a combination of Illumina short-read sequencing, PacBio continuous long read (CLR), and chromosome conformation capture (Hi-C) techniques (Table 1). Through CLR sequencing, 8,740, 850 continuous long reads with N50 of 39.791 kb were obtained, and then a total of 517.92 Mb genome was assembled from 729 contigs with a contig N50 of 9.16 Mb (Tables 2, 3). In Hi-C strategy, 90.25% (467.42 Mb) of the assembly was anchored to 4 chromosomes (Table 4) with a scaffold N50 of 118.85 Mb (Tables 2, 3), while Contig 1996 was proved as the pseudo-X chromosome via whole-genome synteny with other hoverfly species (Fig. 3). We also predicted that transposable elements and tandem repeats accounted for 35.23% (13.90% retroelement and 21.33% DNA transposon) and 8.31% of the total genome (Table 5), respectively. 1611 noncoding RNA, 29 pseudogene, and 14,848 protein-coding genes were obtained (Tables 6, 7), of which 95.14% gene sequences could be annotated to NR, GO, KEGG and other databases (Table 8). We compared the genomic characteristics of E. balteatus with those other insects and identified 1060 expanded gene families and 1535 contracted gene families in the genome of E. balteatus (Fig. 7). In addition to, the mitochondrial genome of E. balteatus was also assembled and annotated with the length of 16837 bp encoding 37 typical mitochondrial genes (Fig. 10; Table 11). Overall, the high-quality E. balteatus nuclear and mitochondrial genome produced here provide a genetic basis for further studies of the biology and ecology of this hoverfly species.

Table 1.

Statistics of sequencing data of Episyrphus balteatus genome.

Library type Usage Insert Size (bp) Clean Data (Gb) Coverage (×)
Illumina Genome survey 350 57.50 123.68
PacBio Genome assembly 20000 247.38 484.58
Hi-C Hi-C assembly 150 63.97 140.14
RNA-Seq (Illumina) Anno-evidence 150 123.90 271.44

Table 2.

Statistics of genome assembly of Episyrphus balteatus at the chromosomal level.

Features Values
Genome size (Mb) in estimated survey 464.94
Genome size (Mb) in Hi-C assembly 517.92
Anchored to chromosomes (Mb, %) 467.42 (90.25%)
Longest contig length 48,396,096
Contig numbers 729
Contig N50 (bp) 9,160,856
Longest scaffold length (bp) 120,199,706
Scaffold N50 (bp) 118,853,273
Scaffold N90 (bp) 217,961
GC (%) 31.52

Table 3.

Statistics of PacBio assembly of Episyrphus balteatus results.

Number of total reads Base number of total reads N50 length of total reads Mean length of total reads Max length in total reads
8,740,850 247,378,379,889 39,791 28,301 438,334

Table 4.

Statistics of Hi-C assembly of Episyrphus balteatus results.

Pseudo-chromosomes No. Cluster Cluster Length (bp) No. Order Order Length (bp)
Chr1 114 130,705,984 25 118,850,873
Chr2 75 124,230,291 38 120,196,006
Chr3 158 130,667,350 43 119,801,647
Chr4 72 81,817,590 10 72,900,320
Total (Ratio %) 419 (57.95) 467,421,215 (90.25) 116 (27.68) 431,748,846 (92.37)

Fig. 3.

Fig. 3

Whole-genome synteny between Episyrphus balteatus and other hoverfly species. E. balteatus Ay, Episyrphus balteatus population from Anyang City, Henan Province, China; E. balteatus Orf, Episyrphus balteatus population from Wytham Woods, Oxfordshire, UK; E. corollae, Eupeodes corollae; X. sylvarum, Xylota sylvarum; E. tenax, Eristalis tenax; V. inanis, Volucella inanis; S. pyrastri, Scaeva pyrastri; S. pipiens, Syritta pipiens.

Table 5.

Statistics of repeat elements in Episyrphus balteatus genome.

Repeat types Number Length (bp) Percent (%)
Retroelement LINE 54,895 14,584,724 2.82 13.90
SINE 3,591 393,983 0.08
LTR/Copia 19,081 5,602,513 1.08
LTR/ERV 473 28,346 0.01
LTR/Gypsy 104,641 37,687,552 7.28
LTR/Ngaro 146 13,894 2.56E-3
LTR/Pao 771 165,882 0.03
LTR/Unknown 62,749 13,523,276 2.61
DNA transposon Academ 2,404 556,953 0.11 21.33
CACTA 4,146 924,985 0.18
Crypton 5 255 4.70E-3
Dada 68 3,455 6.36E-3
Ginger 70 7,498 1.38E-3
Helitron 82,587 13,068,574 2.52
IS3EU 14 978 1.80E-4
Kolobok 53 7,132 1.31E-3
Maverick 176 240,051 0.05
Merlin 18 1,217 2.24E-4
Mutator 1,229 152,024 0.03
P 237 26,204 0.01
PIF-Harbinger 593 103,241 0.02
PiggyBac 1,473 475,840 0.09
Sola 6 318 5.86E-5
Tc1-Mariner 19,996 7,392,204 1.43
Unknown 336,237 85,796,409 16.57
Zisupton 24 1,660 3.06E-4
hAT 6,419 1,700,529 0.33
Unknown Unknown 213 27,648 0.01 0.01
Tandem repeat Microsatellite (1–9 bp units) 532,332 11,416,388 2.20 8.31
Minisatellite (10–99 bp units) 93,334 12,988,803 2.51
Satellite (> = 100 bp units) 11,171 18,649,395 3.60
Total 1,339,152 225,541,931 43.55

Table 6.

Statistics of noncoding RNA and pseudogene in Episyrphus balteatus genome.

Number of tRNA Number of rRNA Number of miRNA Number of snRNA Number of snoRNA Number of pseudogene
1,103 402 49 33 24 29

Table 7.

Gene annotation statistics of Episyrphus balteatus genome.

Features Results
Number of annotated genes 14,848
Number of core genes number of diptera insect in BUSCOs database 3,285
Complete BUSCOs and Ratio 3,095 (94.22%)
Complete and single-copy BUSCOs (S) 2710 (82.50%)
Complete and duplicated BUSCOs (D) 385 (11.72%)
Fragmented BUSCOs (F) 9 (0.27%)
Missing BUSCOs (M) 181 (5.51%)
Average gene length (bp) 14629.57
Number of Exon 75,039
Average Total Exon length (bp) per gene 1714.63
Average Exon count per gene 5.05
Number of CDS 75,039
Average CDS length (bp) 1714.47
Average CDS per gene 5.05
Number of Intron 60,191
Average Intron length (bp) 12914.95
Average Intron per gene 4.05

Table 8.

Functional annotation statistics of Episyrphus balteatus genome.

Annotation type Genes number Percent (%)
GO 11,956 80.52
KEGG 11,149 75.09
KOG 9,487 63.89
Pfam 12,429 83.71
Swissprot 9,204 61.99
TrEMBL 13,984 94.18
eggNOG 11,797 79.45
NR 13,706 92.31
Total annotated genes 14,126 95.14

Fig. 7.

Fig. 7

Genome evolution of Episyrphus balteatus and 19 other insects. Time-calibrated phylogenetic tree inferred from 545 single-copy orthologs using IQ-TREE version 1.6.11 was constructed based on maximum likelihood. The branch node values indicate the inferred divergence time between species. At the base of the tree is geological time, and at the top of the tree is absolute age in millions of years, with shadows defining each geological period. D. is Devonian; Ca. is Carboniferous; Pe. is Permian; Tr. is Triassic; Ju. is Jurassic; Cr. is Cretaceous; Pa. is Paleogene; N. is Neogene. The numbers of expanded gene families (red) and contracted gene families (blue) are displayed to the right of each species branch.

Fig. 10.

Fig. 10

Circular map of Episyrphus balteatus mitochondrial genome with 37 annotated genes of different functional groups.

Table 11.

Annotation of Episyrphus balteatus mitochondrial genome.

Gene Start Position (bp) End Position (bp) Length (bp) Strand
trnI(gat) 1 66 66 +
trnQ(ttg) 137 69 69
trnM(cat) 152 220 69 +
nad2 221 1252 1032 +
trnW(tca) 1256 1323 68 +
trnC(gca) 1398 1332 67
trnY(gta) 1478 1413 66
cox1 1484 3038 1555 +
trnL2(taa) 3039 3104 66 +
cox2 3107 3790 684 +
trnK(ctt) 3796 3866 71 +
trnD(gtc) 3912 3978 67 +
atp8 3979 4140 162 +
atp6 4134 4811 678 +
cox3 4847 5635 789 +
trnG(tcc) 5639 5704 66 +
nad3 5705 6058 354 +
trnA(tgc) 6063 6131 69 +
trnR(tcg) 6131 6194 64 +
trnN(gtt) 6197 6263 67 +
trnS1(gct) 6264 6330 67 +
trnE(ttc) 6331 6395 65 +
trnF(gaa) 6485 6419 67
nad5 8221 6485 1737
trnH(gtg) 8288 8222 67
nad4 9628 8288 1341
nad4l 9918 9622 297
trnT(tgt) 9921 9985 65 +
trnP(tgg) 10051 9986 66
nad6 10054 10578 525 +
CYTB 10582 11718 1137 +
trnS2(tga) 11720 11787 68 +
nad1 12742 11804 939
trnL1(tag) 12817 12753 65
rrn16S 14155 12818 1338
trnV(tac) 14227 14156 72
rrn12S 15031 14228 804
D-loop 15032 16837 1806 +

Methods

Sample preparation and genomic DNA sequencing

A pair of E. balteatus female and male adults were originally captured from flowers of Hibiscus syriacus in Anyang City, Henan Province, China, and reared on Megoura crassicauda inbreeding for approximately 10 generations in the laboratory. The insect colony was maintained in the climate chamber at 21 ± 1 °C with 70 ± 2% relative humidity and photoperiod of 14 h Light: 10 h Dark. Individuals were immediately frozen in liquid nitrogen, followed by preservation at −80 °C in the laboratory prior to DNA extraction. Genomic DNA for both Illumina and PacBio CLR sequencing was obtained from 10 newly emerged female adults with surface-sterilized using the Genomic-tip Kit (QIAGEN) according to the manufacturer’s instructions, and for Hi-C sequencing it was obtained from a single newly emerged female adult individual. The determination of genomic DNA’s purity and integrity was conducted using two methods: the NanoDrop 2000 (Thermo Fisher Scientific, USA) and agarose gel electrophoresis (1.2%) respectively.

For Illumina sequencing, the paired-end libraries with a 350 bp length were constructed and sequenced on the Illumina NovaSeq 6000 platform (Illumina, CA, USA), and 57.50 Gb (Table 1) of clean reads were obtained after removing adapter sequences and low-quality reads with HTQC (v1.92.310) software. The short-reads from Illumina platform were quality filtered by Fastp using the parameters is -q 10 -u 50 -y -g -Y 10 -e 20 -l 100 -b 150 -B 150.

To perform PacBio CLR sequencing, we used the Megaruptor®°2 to shear the genomic DNA into fragments of approximately 20 kb. Subsequently, we prepared the SMRTbell library using the SMRTbell Express Template Prep kit 2.0 (Pacific Biosciences) as per the guidelines provided by the manufacturer. Following ligation, the SMRTbell library was digested by exonuclease and purified with 0.45X AMPure PB beads. After library characterization, 15-bells 18 kb were collected using the Sage ELF system (Sage Science, Beverly, MA). Sequencing primers and Sequel II DNA polymerase were annealed and bound to the final SMRTbell library, respectively. Finally, SMRT sequencing was performed using a single 8 M SMRT Cell on the Sequel II System. The sequencing yielded 247.38 Gb (484.58 × coverage) of the continuous long reads (CLR) with an N50 length of 39,791 bp and an average length of 28,301 bp (Table 3).

For Hi-C (high-throughput chromatin conformation capture) associated scaffolding, 150 bp paired-end reads with mate mapped to a different contig were constructed firstly. After applying the same filter criteria for short reads, the resulting Hi-C library was sequenced on the Illumina NovaSeq 6000 platform and generated a total of 63.97 Gb (140.14 × coverage) of clean data (Table 1).

Hi-C library preparation and sequencing

The Hi-C technique was used to construct the chromosome-level genome assembly of Episyrphus balteatus, and fresh tissues from one female adult (not including abdomen and wing) were used to construct Hi-C library. Formaldehyde is employed for sample fixation, facilitating the cross-linking of intracellular proteins with DNA and DNA strands with each other. This process ensures the preservation of their interactions and the overall maintenance of the cell’s intricate 3D structure. The DNA undergoes digestion through the restriction enzyme DpnII, leading to the formation of sticky ends on both ends of the crosslink. In the end, the DNA samples were fragmented, ranging from 300 to 700 base pairs (bp). Subsequently, streptavidin magnetic beads were employed to selectively capture DNA fragments that exhibit interactive associations, facilitating the construction of the library. After the library inspection was qualified, the Illumina platform was used for high-throughput sequencing, and the sequencing read length is paired-end 150 bp. The Hi-C library was constructed following the standard library preparation protocol, and 63.97 Gb of clean data was generated (Table 1).

Transcriptome sequencing

Transcriptomic samples were collected from various developmental stages of E. balteatus, which includes the eggs, 1st instar nymphs, 2nd instar nymphs, 3rd instar nymphs, pupae, and female adults of E. balteatus, respectively. Total RNA was extracted from egg, larvae, pupae, and adult samples respectively by using the TRIzol reagent (Thermo Fisher Scientifc, USA). The complementary DNA (cDNA) library was constructed and sequenced on an Illumina Novaseq 6000 platform. Following the construction of the library, the concentration and insert size were determined using Qubit3.0 and Agilent 2100. Moreover, Q-PCR was employed to precisely quantify the effective concentration of the library, ensuring its quality. Ultimately, a total of 123.90 Gb clean RNA-seq data was obtained by removing adapters, low-quality reads, and high-content unknown sequences (Table 1).

Estimation of genomic characteristics

Based on genome survey raw data of 57.50 Gb, the short reads from the Illumina platform were quality filtered by Fastp (version 0.21.0)4 using the parameters was ‘-q 10 -u 50 -y -g -Y 10 -e 20 -l 100 -b 150 -B 150’. The high-quality filtered reads were used for further genome size estimation. To assess potential contamination in the DNA of the collected samples, 10,000 single-ended reads were randomly selected from a 350 bp library sequenced and compared to the NT library by BLAST (ncbi-blast + , version 2.9.0)5 with the parameter set to ‘-num_descriptions 100-num_alignments 100-evalue 1e-05’. The libraries sequenced by Illumina platform were compared with issued plastids of E. balteatus6 by using SOAP (version 2.21)7 with the parameter set of ‘-m 260 -x 440’ to evaluate the extranuclear DNA content in the libraries and ensure the integrity of the genome assembly. Using Jellyfish (version 2.1.4)8, we conducted a count of the 19-kmers. For this analysis, we employed the parameters ‘-h 10000000000’. Subsequently, genome features were calculated using Genomescope (version 2.0)9. The parameters used for this calculation were ‘-k 19 -p 6 -m 100000’. Based on the analysis of diverse ploidy data, it was observed that the Kmer distribution map exhibited optimal fitting accuracy when considered in a diploid context (Fig. 1c), Additionally, the estimated size of an individual genome is approximately 464.94 Mb (Table 2). According to the distribution of kmer, it is estimated that the repetitive sequence content was about 28.05%, the heterozygosity was about 3.7%, and the GC content of the genome was about 30.52% (Fig. 1c).

De novo genome assembly

Genome assembly was completed based on high accuracy CLR data obtained above by using Smartdenovo10 with default parameters, followed by adjust based on Illumina reads for three times via tool Pilon11. In situations where genomes exhibit considerable heterozygosity, the primary assembly might compile all the fragments that bear heterozygous characteristics, leading to a genome size that surpasses the expected value. To resolve the haplotigs and overlaps in the primary assembly, the purge_dups12 was utilized, whereas the assembly was further enhanced through the application of Pilon11 and Racon13 to polish the collected data.

Using Hi-C technology14, the scaffold pipeline of the genome was dismantled into segments of 50 Kb each and subsequently reconstructed. The candidate error regions encompass the locations that could not be restored to their original assembly sequence. Within this region, the identification of error points is based on the Hi-C coverage depth, particularly focusing on areas with low coverage depth. By doing so, the error correction process of the initial assembled genome is successfully accomplished. For anchored contigs, clean read pairs were generated from the Hi-C library and were mapped to the polished Episyrphus balteatus genome using BWA (version 0.7.17)15 with the default parameters.

The paired reads that were mapped to a distinct contig were utilized to perform Hi-C associated scaffolding. Then invalid reads, including self-circle ligation, non-ligation, and various other types such as Dangling Ends, Re-ligation, and Dumped Pairs, were filtered out. HiC-Pro (version2.10.0)16, capable of detecting valid interaction pairs and invalid interaction pairs in Hi-C sequencing outcomes through analysis and comparison, thus was used to facilitate the assessment of Hi-C library quality. Using the agglomerative hierarchical clustering method in LACHESIS (version 2e27abb)16, we were able to cluster 723 contigs into 4 distinct groups (Table 4), which includes 419 sequences with the length of 467, 421,215 bp totally.

Finally, we obtained the high-quality assembled genome of Episyrphus balteatus at the chromosome level. The genome was consisted of 723 contigs with a total length of 467.42 Mb, which was similar to the predicted size of 464.94 Mb, and with a scaffold N50 of 118.85 Mb, maximum length of 120.20 Mb, and GC rate of 31.52% (Table 2). The analysis of Hi-C data aided in the alignment of 419 (57.95%) sequences with the length of 467.42 Mb (90.25%) of genome to 4 pseudo-chromosomes, which demonstrated clear distinctions among them according to the heatmap portraying chromatin interaction. (Table 2 and Table 4; Fig. 1d). In addition, the mitochondrial genome of E. balteatus was assembled through mitoZ17 and NOVOplasty18, and subsequently annotated using MITOS19 and GeSeq20. (Fig. 10; Table 11).

Repetitive elements and noncoding RNA annotation

A combination of homology-based and de novo approaches was utilized to identify transposon elements (TE) and tandem repeats. To embark on our analysis, we initiated the generation of a customized repeat library for the genome via RepeatModeler (version 2.0.1)21 (http://www.repeatmasker.org/RepeatModeler/) using the following parameters: “-name & RepeatModeler -pa 12”. This software possesses the capability to automatically employ two distinct de novo repeat discovery programs, namely RECON (version 1.0.8)22 and RepeatScout (version 1.0.6)23. The identification of full-length long terminal repeat retrotransposons (fl-LTR-RTs) was carried out by employing two different approaches. Firstly, the LTR_FINDER (version 1.07)24 tool was employed with parameters “-w 2 -C -D”. Additionally, the LTRharvest (version 1.5.10)25 tool was used with default parameters to complete the identification process. The high-quality intact fl-LTR-RTs and non-redundant LTR library were then produced by LTR_retriever (version 2.9.0)26 with default parameters. Non-redundant species-specific TE library was constructed by combining the de novo TE sequence library above with the well-known Dfam (version 3.5)27 database. We conducted a search for Final TE sequences in the genome of Episyrphus balteatus. These sequences were then classified through the use of a homology search against a library. The software tool used for this analysis was RepeatMasker (version 4.1.2)28 with parameters ‘-nolow -no_is -norna -engine wublast -parallel 8 -qq’. We employed the Tandem Repeats Finder (TRF) (version 409)29 with parameters ‘2 7 7 80 10 50 500 -d -h’ and MIcroSAtellite identification tool (MISA) (version 2.1)30 with default parameters to annotate the tandem repeats. In total, 43.55% of the assembled genome was classified as repetitive sequences in the genome, including transposable elements (TEs) with a sequence length of 182,487,345 bp, accounting for 35.23% of the whole genome, and tandem repeats with a sequence length of 43,054,586 bp, accounting for 8.31% of the whole genome (Table 5).

Non-coding RNAs, such as microRNA, rRNA, tRNA, and other RNAs with unidentified roles, are a class of RNAs that do not possess the ability to synthesize proteins. By utilizing various methods to anticipate non-coding RNAs, several distinct approaches are implemented based on the structural attributes they possess. The identification of tRNA was conducted using tRNAscan-SE (version 1.3.1)31 with default parameters. Prediction of rRNA was primarily performed using barrnap (version 0.9)32 with parameters ‘kingdom euk–threads 1’. For the prediction of miRNA, snoRNA, and snRNA, the Rfam (version 14.5)33 database was utilized through INFERNAL (version 1.1)34 with parameters ‘cpu 3–rfam’. Finally, a total of 1,103 tRNAs, 402 rRNAs, 33 snRNA, 24 snoRNA, and 49 miRNAs were obtained (Table 6).

Gene Prediction and Functional Annotation

This study employed a combination of three methodologies, specifically, de novo prediction, homology search, and transcript-based assembly, to annotate the protein-coding genes present within the genome. Utilizing the genome sequence assembled above, Augustus (version 3.1.0)35 and SNAP (version 2006-07-28)36 were employed to perform ab initio gene prediction with default parameters. For homo-based approaches, GeneModelMapper (GeMoMa) (version 1.7)37 was used with Drosophila melanogaster and Eupeodes corollae as references with parameters ‘run.sh mmseqs’. For the transcript-based methods, we aligned the RNA-seq reads to our previously constructed reference genome using Hisat (version 2.1.0)38 with the parameters ‘dta -p 10’. The assembled reads were transformed into transcripts using Stringtie (version 2.1.4)39 with parameters ‘p 2’. Genes were predicted from the assembled transcripts using GeneMarkS-T (version5.1)40 with default parameters. In this study, we employed the PASA software (version 2.4.1)41 to forecast gene using the unigenes that were assembled through Trinity (version 2.11)36 with default parameters ‘genome_guided_bam’. In the end, we combined predicted genes acquired from homology-based strategies, de novo-derived approaches, and transcripts, to generate the high-confidence gene set by employing the EVidenceModeler (version 1.1.1)42 in combination with PASA (version 2.4.1)41 with default parameters. In total, 14,848 protein-coding genes with an average length of 14629.57 bp were obtained in the assembled E. balteatus genome (Table 7). The average length of coding sequence (CDS) was 1714.47 bp with a total number of CDS of 75,039. The average exon length was 1714.63 bp and the average exon number of each gene was 5.05. A total of 3,095 conserved core genes are included in BUSCO’s diptera database (Table 7). The gene prediction completeness in E. balteatus was evaluated by using Benchmarking Universal Single-Copy Orthologs (BUSCO, version5.2.2)43 with parameters ‘m prot’. Our analysis revealed that 94.22% of the predicted genes (Table 7) contained BUSCO genes, only 0.27% with fragmented BUSCOs and 5.51% with missing BUSCOs, indicating a commendable gene prediction completeness in E. balteatus.

The identification of gene structure and annotations was based on aligning the data with various public databases. These databases included the National Center for Biotechnology Information (NCBI) Non-Redundant (NR), EggNOG, KOG, TrEMBL, InterPro, and Swiss-Prot protein databases. The alignment was carried out using diamond (2.0.4.142) with parameters of ‘masking 0 -e 0.001’. The data were also compared with the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, with an E-value threshold of 1E-5. InterProScan (version 5.34-73.0) utilized the InterPro protein database to annotate the protein domain44. The annotation process was performed with parameters ‘-iprlookup -pa -f xml -dp -t p -cpu 10’. The motifs and domains within gene models were identified by Pfam databases45. In total, around 95.14% (14,126 out of 14,848 total predicted genes) of the predicted genes responsible for encoding proteins could be annotated within the aforementioned databases (Table 8). In conclusion, we separately tallied the count of genes integrated with EVM using three distinct prediction techniques. The outcome revealed that a total of 11,048 genes were predicted consistently across all three methods (Fig. 2b).

Fig. 2.

Fig. 2

Genome assembly and temporal transcriptome of Episyrphus balteatus. (a) Circle genome landscape of Episyrphus balteatus. Circle a represents chromosomes, while circles b-e indicate TE density, SSR density, SSR, and gene density of each respective chromosome, respectively. (b) Protein-coding gene prediction of Episyrphus balteatus genome through three strategies.

Genome synteny analysis

Synteny analysis of genes can realize the excavation of genomic structural variation. Diamond (version 0.9.29.130)46 was applied to compare the gene sequences of Episyrphus balteatus and six Syrphidae hoverflies with different pseudo-chromosome number (i.e. 4 pseudo-chromosomes for Eupeodes corollae and Scaeva pyrastri, 5 pseudo-chromosomes for Xylota sylvarum and Syritta pipiens, 6 pseudo-chromosomes for Eristalis tenax and Volucella inanis) and identify similar gene pairs (e < 1e-5, C-score > 0.5). Next, MCScanX47 was employed to assess the chromosomal proximity of comparable gene pairs. Ultimately, it is possible to obtain all the genes within the synteny block with the parameters ‘-m 15’ (Figs. 2a, 3). Each chromosome of E. balteatus have a good relativity with from selected insects above, in which Contig 1996 can correspond to presumptive X-chromosome in these seven hoverflies, indicating that there was a high degree of consistency between them and Contig 1996 is the candidate X chromosome in E. balteatus (Fig. 3).

Temporal transcriptome of Episyrphus balteatus across egg, nymphal, pupal, adult stage

Samples were collected from all developmental stages of E. balteatus, namely, eggs, 1st instar nymph, 2nd instar nymph, 3rd instar nymph, pupa and female adult. Thirty E. balteatus individuals at each postembryonic stage and 300 egg were placed in each 2.0 mL Eppendorf tube with 3 replicates per stage, respectively. TRIzol reagent (Thermo, USA) was used for RNA extraction. The transcriptome sequences were acquired utilizing the identical methodology outlined within the “Transcriptome sequencing” section aforementioned.

Hisat (version 2.1.0)38 was used to locate the precise position of the clean reads from transcriptome sequencing on the assembled genome of E. balteatus according to the default parameters. Then above reads were assembled by using Stringtie (version 2.1.4)39, and subsequent analysis involved reconstructing the transcripts with default parameters. In conclusion, the expression level of transcripts or genes was quantified using a standardized indicator known as FPKM (Fragments Per Kilobase of transcript per Million mapped fragments)48. Principal components analysis (PCA) and heatmap clustering were utilized together to evaluate the sample relationship between or within groups (Fig. 11).

Fig. 11.

Fig. 11

PCA (principal components analysis) diagrams of transcriptome samples of Episyrphus balteatus at different development periods (a) and corresponding clustering heat maps of transcriptome samples associated with each other at different developmental periods of Episyrphus balteatus (b). Different periods are represented by different colored circles. In terms of correlation, the darker the red color, the higher the correlation, and the darker the blue color, the lower the correlation. L1, L2, L3 represent 1st, 2nd, 3rd instar larvae of Episyrphus balteatus, respectively.

Here, we define genes as differentially expressed genes (DEGs) if their expression levels vary significantly across distinct development stages. DEGs can be divided into up-regulated and down-regulated genes. Differential expression gene screening and analysis were conducted using DESeq. 2 based on the gene count values observed in each sample49. To minimize the occurrence of false positives resulting from alterations in the expression of numerous genes, the false discovery rate (FDR) was calculated by adjusting the significance p-value of the disparity. Such an adjustment benefits for evaluating the significance of the difference accurately50. Fold change (FC) represents the ratio of genes expression levels between two group of samples. For convenience of comparison, the fold change was expressed as log2FC. In the process of detecting significantly differential expression genes, the criteria employed were |log2FC| ≥ 1 and FDR < 0.01. The more pronounced the disparity in gene expression levels between the two sample groups, the higher absolute value of log2FC and the lower the FDR value. Eventually, the number of differentially expressed genes in the pairwise comparisons of Episyrphus balteatus across egg, 1st instar larvae (L1), 2nd instar larvae (L2), 3rd instar larvae, pupa, and female adult were obtained (Table 9). An enrichment analysis was conducted to detect any over-representation of GO terms and KEGG pathways by utilizing the hypergeometric test with a q-value ≤ 0.05 as the cutoff criterion. The DEGs were then mapped to these GO terms and KEGG pathways to evaluate the reliability and their roles (Figs. 4, 5).

Table 9.

Statistics of differentially expressed genes in the pairwise comparisons of Episyrphus balteatus across egg, 1st instar larvae (L1), 2nd instar larvae (L2), 3rd instar larvae, pupa, and female adult.

Groups DEGs_total DEGs_up DEGs_down
Egg_vs_L1 5047 2883 2164
L1_vs_L2 2208 1412 796
L2_vs_L3 3973 1784 2189
L3_vs_Pupa 5403 2692 2710
Pupa_vs_Adult 3899 2097 1802

Fig. 4.

Fig. 4

Differential expressed genes in pairwise comparison of Episyrphus balteatus across different developmental stages. (ae) Volcanic maps of different expressed genes in separate comparison among different developmental stages. The red and green balls represent the significantly up- and down-regulated expressed genes, respectively. (f) Venn diagrams of DEGs among those pairwise comparisons. L1, L2, L3 represent 1st, 2nd, 3rd instar larvae of Episyrphus balteatus, respectively.

Fig. 5.

Fig. 5

Significantly enriched KEGG pathways of DEGs in these pairwise comparison of Episyrphus balteatus at different developmental stages. a, 1st instar larvae vs. egg; b, 2nd instar larvae vs. 1st instar larvae; c, 3rd instar larvae vs. 2nd instar larvae; d, pupa vs. 3rd instar larvae; e, female adult vs. pupae.

Data Records

The dataset of Episyrphus balteatus genomics sequencing is available at NCBI with the accession number of PRJNA104965251 including four subsets (SRR2712810552, SRR2716787653, SRR2712902654, SRR2720421855) as follows: Genomic Illumina sequencing data were deposited in the Sequence Read Archive at NCBI under accession number SRR2712810552. Genomic PacBio sequencing data were deposited in the Sequence Read Archive at NCBI under accession number SRR2716787653. Hi-C sequencing data were deposited in the Sequence Read Archive at NCBI under accession number SRR2712902654. Mitochondrial genome were deposited in the Sequence Read Archive at NCBI under accession number SRR2720421855. RNA-seq data were deposited in the Sequence Read Archive at NCBI under accession number PRJNA105088956. The annotation files of the E. balteatus genome have been deposited at figshare (https://figshare.com/articles/dataset/Genome_annotation_information_of_Episyrphus_balteatus/24797310)57. The final assembled Episyrphus balteatus genome has been submitted to National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/) and are publicly accessible under accession number GCA_040182855.158.

Technical Validation

DNA integrity

The Nanodrop 2000 spectrophotometer (Thermo Fisher Scientific, USA) and QubitTM3Flurometer (USA, Thermo Fisher Scientific) were used to measure the concentration and quality of the extracted DNA. Absorbance of obtained DNA above at 260/280 nm and 260/230 nm were both about 1.8. The quality of genomic DNA was detected by agarose gel electrophoresis. The main band size of DNA fragments was ≥ 23 K, the degradation band was > 5 K. The absence of any contamination in the sample holes substantiated the preservation of DNA molecule integrity throughout this investigation.

Assessment of genome assemblies

Assess the integrity and accuracy of the genome in majorly three ways: To begin, we compared the short sequences acquired via the Illumina sequencing platform with the assembled genome using BWA (version 0.7.17)59. Through statistical comparison rate, proportion of covered genomes, and depth distribution, we assessed both the integrity of the assembled genome and the evenness of sequencing coverage. The results showed that 383,884,382 clean reads were obtained, of which 368,506,902 were located to the reference genome, accounting for 95.99% of all clean reads (Table 10). Secondly, to assess the integrity of the assembled genome, we utilized the Core Eukaryotic Genes Mapping Approach (CEGMA, version 2.5) with default parameters, and a core gene library consisting of 458 genes from eukaryotic model organisms was selected. This approach allows us to evaluate the assembled genome integrality effectively. The results showed that 446 of 458 core genes were identified in the assembled genome, accounting for 97.38% of the total (Table 10). Finally, in the evaluation of the assembled genome, the comparison was carried out between the single-copy gene set generated by BUSCO (version 5.2.2) and our assembled genome through BLAST5. Subsequently, the completeness and ratio of this comparison were assessed. The results showed that the completeness of BUSCO evaluation was 96.71% with only 0.44% fragmented BUSCOs and 2.85% missing BUSCOs (Table 10). All of the above results indicate that our assembled E. balteatus genome has high integrity and accuracy (Table 10). These BUSCO results were also compared with the integrity of other hoverfly species genomes, all of which were comparable (Table 12). Besides, the genome structure statistics of Episyrphus balteatus was compared with those from several representative dipteran insects as well, all of which were comparable (Table 13).

Table 10.

Assessment metrics for the final genome assembly of Episyrphus balteatus.

Items Types Number (Ratios)
Genome completeness assessment by BUSCO Complete BUSCOs (C) 1322 (96.71%)
Complete and single-copy BUSCOs (S) 1041 (76.15%)
Complete and duplicated BUSCOs (D) 281 (20.56%)
Fragmented BUSCOs (F) 6 (0.44%)
Missing BUSCOs (M) 39 (2.85%)
Total lineage BUSCOs 1,367
Genome completeness assessment by CEGMA Number (%) of 458 CEG* present in assembly 446 (97.38%)
Number (%) of 248 highly conserved CEGs present 240 (96.77%)
Genome accuracy Mapped by short-reads rate 368,506,902 (95.99%)

Notes: CEG, Core Eukaryotic Genes.

Table 12.

Evaluation the reliability of Episyrphus balteatus genome with published hoverfly species at the chromosome level.

Species Chromosome number Genome size (Mb) Scaffold N50 (Mb) No. scaffolds Contig N50 (Mb) No. contigs BUSCO (%) GC (%) References
Episyrphus balteatus Ay 5 467.42 118.85 611 9.16 723 94.22 31.52 In this study
Episyrphus balteatus Orf 5 535.3 133.6 18 5.9 186 98.4 31.1 65
Eristalis tenax 6 487.0 77.07 157 6.65 290 96.6 42.1 66
Scaeva pyrastri 4 320.0 86.2 4 8.7 183 96.7 32.6 67
Syritta pipiens 5 318.5 86.5 98.9 38.4 68
Eristalis pertinax 6 487.0 77.5 257 3.5 574 96.3 42.4 69
Volucella inanis 6 961.4 163.5 52 30.1 405 97.0 37.4 70
Xylota sylvarum 5 534.8 124.8 98.7 38.3 71
Eupeodes corollae 4 648.2 158.6 783 2.3 2107 96.7 33.9 72
Chrysotoxum bicinctum 5 913 265.8 92 5.7 412 96.6 34.1 73
Eristalis arbustorum 6 451 78.3 96.5 43.0 74
Melanostoma mellinum 5 731 235.0 76 4.6 479 96.3 37.0 75
Xanthogramma pedissequum 6 977 248.7 484 7.8 959 95.5 34.0 76
Eupeodes latifasciatus 4 846.0 189.4 436 2.7 1233 96.3 33.6 77
Cheilosia vulpina 6 405 69.4 97.1 37.3 78

Table 13.

Genome structure statistics of Episyrphus balteatus and several representative dipteran insects.

Species Gene Number Gene Length Average Gene length Exon Length Average Exon Length Exon Number Average Exon Number CDS Length Average CDS Length CDS Number Average CDS Number Intron Length Average Intron Length Intron Number Average Intron Number
E. balteatus Ay 14848 217219911 14629.57 25458760 1714.63 75039 5.05 25456441 1714.47 75039 5.05 191761151 12914.95 60191 4.05
E. balteatus Orf 13616 302838542 22241.37 32442487 2382.67 72318 5.31 23384950 1717.46 65094 4.78 270396055 19858.7 58702 4.31
D. melanogaster 13874 96769168 6974.86 30856435 2224.05 59913 4.32 22285411 1606.27 55037 3.97 65912733 4750.81 46039 3.32
C. capitata 12479 208366414 16697.36 31266924 2505.56 68053 5.45 22031422 1765.48 60638 4.86 177099490 14191.8 55574 4.45
A. aegypti 14528 678668889 46714.54 35651444 2453.98 70018 4.82 23681915 1630.09 62186 4.28 643017445 44260.56 55490 3.82
H. illucens 13958 554616600 39734.68 34102472 2443.22 81669 5.85 22664625 1623.77 72830 5.22 520514128 37291.45 67711 4.85
E. corollae 13645 329022604 24113.05 34552922 2532.28 71337 5.23 23954984 1755.59 64737 4.74 294469682 21580.78 57692 4.23

Notes: E. balteatus Ay, Episyrphus balteatus population from Anyang City, Henan Province, China; E. balteatus Orf, Episyrphus balteatus population from Wytham Woods, Oxfordshire, UK; D. melanogaster, Drosophila melanogaster; C. riparius, Chironomus riparius; A. aegypti, Aedes aegypti; H. illucens, Hermetia illucens; E. corollae, Eupeodes corollae.

Assessment of genome reliability

In order to evaluate the reliability of genome assembly and annotation of E. balteatus, comparative genomic analysis and phylogenetic reconstruction were performed with genetically close 19 species within the Diptera order (which includes 7 Syrphidae sepecies) and one pollinating insect (Apis mellifera, as the outgroup) (Fig. 6; Table 14). To identify single-copy orthologous genes, the protein sequence of the longest transcript for each gene was retrieved from E. balteatus, as well as those selected 20 different species. For the analysis of gene family clustering, we employed OrthoFinder (version 2.4)60 to compare the protein-coding sequences of 21 species’ genomes (Fig. 6a). The comparison was performed using the diamond method with a threshold e-value of 0.00001. As a result, 10, 037 gene families were constructed for E. balteatus in this work (Table 14), and plentiful of genes families were shared by those species, with especially more number in the comparison of five hoverfly species, than with A. mellifera and other Diptera insects (Fig. 6b,c,d).

Fig. 6.

Fig. 6

Statistics of gene families of Episyrphus balteatus and 19 other insects. Petals of shared and unique gene families of Episyrphus balteatus and 19 other insects (a). Venn diagram of gene families among two hoverfly species (Episyrphus balteatus and E. corollae) and three Diptera insects (b), three other hoverfly species (c), and A. mellifera (d). Gene family of E. balteatus Ref is from ref. 65 (Tables 12 and 14).

Table 14.

Genome datasets used for genomic reliability evaluation of Episyrphus balteatus in this study.

Order Family Species Gene families Download link
Diptera Syrphidae Episyrphus balteatus Ay 10,037 In this study
Eupeodes corollae 10,101 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/945/859/685/GCF_945859685.1_idEupCoro1.1/
Syritta pipiens 7,862 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/905/187/475/GCA_905187475.1_idSyrPipi1.1/;
https://ftp.ensembl.org/pub/rapid-release/species/Syritta_pipiens/GCA_905187475.1/ensembl/geneset/2021_12/Syritta_pipiens-GCA_905187475.1-2021_12-genes.gff3.gz
Volucella inanis 8,238 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/907/269/105/GCA_907269105.1_idVolInan1.1/;
http://ftp.ensembl.org/pub/rapid-release/species/Volucella_inanis/GCA_907269105.1/ensembl/geneset/2021_12/Volucella_inanis-GCA_907269105.1-2021_12-genes.gff3.gz
Xylota sylvarum 8,631 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/905/220/385/GCA_905220385.1_idXylSylv2.1/;
https://ftp.ensembl.org/pub/rapid-release/species/Xylota_sylvarum/GCA_905220385.1/ensembl/geneset/2021_12/Xylota_sylvarum-GCA_905220385.1-2021_12-genes.gff3.gz
Scaeva pyrastri 8,491 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/905/146/935/GCA_905146935.1_idScaPyra1.1/;
https://ftp.ensembl.org/pub/rapid-release/species/Scaeva_pyrastri/GCA_905146935.1/ensembl/geneset/2021_12/Scaeva_pyrastri-GCA_905146935.1-2021_12-genes.gff3.gz
Eristalis tenax 9,770 http://v2.insect-genome.com/Organism/372
Episyrphus balteatus Orf 10,337 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/945/859/705/GCF_945859705.1_idEpiBalt1.1/
Psychodidae Lutzomyia longipalpis 8,850 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/024/334/085/GCF_024334085.1_ASM2433408v1/
Phlebotomus papatasi 8,926 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/024/763/615/GCF_024763615.1_Ppap_2.1/
Cecidomyiidae Sitodiplosis mosellana 7,527 https://ngdc.cncb.ac.cn/gwh/Assembly/22236/show
Sciaridae Bradysia coprophila 9,648 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/014/529/535/GCF_014529535.1_BU_Bcop_v1/
Calliphoridae Lucilia cuprina 9,717 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/022/045/245/GCF_022045245.1_ASM2204524v1/
Tephritidae Bactrocera dorsalis 9,605 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/023/373/825/GCF_023373825.1_ASM2337382v1/
Chironomidae Chironomus riparius 8,416 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/917/627/325/GCA_917627325.4_PGI_CHIRRI_v4/
Muscidae Stomoxys calcitrans 9,744 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/963/082/655/GCF_963082655.1_idStoCalc2.1/
Stratiomyidae Hermetia illucens 9,360 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/905/115/235/GCF_905115235.1_iHerIll2.2.curated.20191125/
Diopsidae Teleopsis dalmanni 11,192 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/237/135/GCF_002237135.1_ASM223713v2/
Culicidae Aedes aegypti 8,914 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/204/515/GCF_002204515.2_AaegL5.0/
Drosophilidae Drosophila melanogaster 9,086 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/215/GCF_000001215.4_Release_6_plus_ISO1_MT/
Hymenoptera Apidae Apis mellifera 6,807 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/254/395/GCF_003254395.2_Amel_HAv3.1/

To determine the phylogenetic relationships between E. balteatus and other closely related species, we conducted a phylogenetic tree reconstruction. The MAFFT (version 7.205) was utilized61 with the parameters ‘–localpair–maxiterate 1000’to align the protein sequences of the single-copy orthologous genes. Under these criteria, we obtained a total of 545 single-copy genes. We utilized the LG + F + I + G4 model to build the phylogenetic trees using the maximum likelihood method in IQ-TREE (version 1.6.11)62 with bootstrap replicates set to 1000. Then the maximum likelihood method was used to estimate the divergence time by using the MCMCTREE of PAML (version4.9i)63.

The results of phylogenetic analysis indicated that Chironomidae and Culicidae speciated from Diptera ancestral insects firstly before ~248.31 Mya, followed by Psychodidae before ~228.15 Mya, Cecidomyiidae and Sciaridae before ~208.69 Mya, Stratiomyidae before ~179.96 Mya, Diopsidae and Trypetidae before ~ 115.19 Mya, Drosophilidae before ~99.36 Mya, Muscidae and Calliphoridae before ~58.81 Mya (Fig. 7). Interestingly, geographical population from E. balteatus in Anyang (N36°3’, E114°20’, Henan, China) and Oxfordshire (latitude 51.77, longitude –1.34, Berkshire, UK) with approximate 12000 kilometers (Fig. 12) were separated from each other before 5.11 (2.92~7.55) Mya, and were closest relatives to E. corollae (Fig. 7).

Fig. 12.

Fig. 12

Episyrphus balteatus sample acquisition locations. In this study, a female E. balteatus was originally collected from Anyang City, Henan Province, China and inbreeding reared under controlled laboratory conditions for approximately 10 generations before the start of all experiments. In contrast, in the study of Hawkes and Sivell65, a female E. balteatus specimen was collected from Wytham Woods, Oxfordshire, UK (latitude 51.77, longitude –1.34), and then used for DNA sequencing directly. In Doyle et al. study79, two female Episyrphus (KatzBiotech strain) individuals captured in mountain pass of Bujaruelo (Puerto de Bujaruelo), a 2273 m pass on the French–Spanish border in the Pyrenees, were used for DNA sequencing directly without species identification. (The map image is derived from ArcGIS Online, copyrighted by Esri, and can be used in academic publications).

Based on the determined gene families and the phylogenetic tree created to estimate the divergence time of these species, we used CAFE (version 4.2)64 to predict the expansion and contraction of the gene families in relation to their ancestors. Finally, KEGG pathways enrichment revealed that the 1,060 expanded gene families in E. balteatus genome were maily related to “nucleotide excision repair”, “Toll and Imd signaling pathway”, “starch and sucrose metabolism”(Fig. 8a), and the 1,535 contracted gene families were mostly related to “glycerolipid metabolism”, “fatty acid degradation”, “insect bormone biosynthesis” (Fig. 9a). As for GO enrichment analysis, it was found that expanded gene families in E. balteatus genome mainly involved “DNA integration”, “obsolete membrane part”, and “Ion transmembrane transporter activity” (Figs. 7, 8b,c,d), while the contracted gene families mainly involved “lipid metabolic process”, “plasma membrane”, and “monooxygenase activity” (Figs. 7, 9b,c,d).

Fig. 8.

Fig. 8

KEGG (a) and GO enrichment (bd) analyses of expanded gene families in Episyrphus balteatus genome.

Fig. 9.

Fig. 9

KEGG (a) and Go (b-d) enrichment analyses of contracted gene families in Episyrphus balteatus genome.

Acknowledgements

This work is supported by The Science and Technology Innovation 2030 (2023ZD04062), National Key R&D Program of China (2022YFD1400300), Agricultural Science and Technology Innovation Program of Chinese Academy of Agricultural Sciences, and China Agriculture Research System.

Author contributions

J.J., J.C., J.L., L.W. and X.Z conceived the project; J.J., L.C. and B.L. performed the experiments; J.J. and Y.G. wrote the manuscript; K.Z., C.X., N.H.F., X.G. and M.G. evaluated the results and edited the manuscript; Y.G. and J.J. performed the bioinformatic analyses; D.L. and P.E. reviewed and edited manuscript. All authors read and approved the final manuscript.

Code availability

The utilization of bioinformatics software and tools for this research was performed in accordance with published protocols and manuals obtained from public databases. The method provided a description of the software version and parameters without employing any specific code or scripts.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Jichao Ji, Yue Gao.

Contributor Information

Jichao Ji, Email: hnnydxjc@163.com.

Xiangzhen Zhu, Email: zhuxiangzhen318@163.com.

Li Wang, Email: wangli08zb@126.com.

Junyu Luo, Email: luojunyu1818@126.com.

Jinjie Cui, Email: aycuijinjie@163.com.

References

  • 1.Wotton, K. R. et al. Mass seasonal migrations of hoverflies provide extensive pollination and crop protection services. Current biology: CB29, 2167–2173.e2165, 10.1016/j.cub.2019.05.036 (2019). 10.1016/j.cub.2019.05.036 [DOI] [PubMed] [Google Scholar]
  • 2.Powney, G. D. et al. Widespread losses of pollinating insects in Britain. Nature communications10, 1018, 10.1038/s41467-019-08974-9 (2019). 10.1038/s41467-019-08974-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Yuan, H. et al. Genome of the hoverfly Eupeodes corollae provides insights into the evolution of predation and pollination in insects. BMC biology20, 157, 10.1186/s12915-022-01356-6 (2022). 10.1186/s12915-022-01356-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics (Oxford, England)34, i884–i890, 10.1093/bioinformatics/bty560 (2018). 10.1093/bioinformatics/bty560 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. Journal of molecular biology215, 403–410, 10.1016/s0022-2836(05)80360-2 (1990). 10.1016/s0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
  • 6.Pu, D. Q. et al. Mitochondrial genomes of the hoverflies Episyrphus balteatus and Eupeodes corollae (Diptera: Syrphidae), with a phylogenetic analysis of Muscomorpha. Scientific reports7, 44300, 10.1038/srep44300 (2017). 10.1038/srep44300 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Li, R., Li, Y., Kristiansen, K. & Wang, J. SOAP: short oligonucleotide alignment program. Bioinformatics (Oxford, England)24, 713–714, 10.1093/bioinformatics/btn025 (2008). 10.1093/bioinformatics/btn025 [DOI] [PubMed] [Google Scholar]
  • 8.Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (Oxford, England)27, 764–770, 10.1093/bioinformatics/btr011 (2011). 10.1093/bioinformatics/btr011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nature communications11, 1432, 10.1038/s41467-020-14998-3 (2020). 10.1038/s41467-020-14998-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Liu, H., Wu, S., Li, A. & Ruan, J. SMARTdenovo: a de novo assembler using long noisy reads. GigaByte (Hong Kong, China)2021, gigabyte15, 10.46471/gigabyte.15 (2021). 10.46471/gigabyte.15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PloS one9, e112963, 10.1371/journal.pone.0112963 (2014). 10.1371/journal.pone.0112963 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics (Oxford, England)36, 2896–2898, 10.1093/bioinformatics/btaa025 (2020). 10.1093/bioinformatics/btaa025 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Vaser, R., Sović, I., Nagarajan, N. & Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome research27, 737–746, 10.1101/gr.214270.116 (2017). 10.1101/gr.214270.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell159, 1665–1680, 10.1016/j.cell.2014.11.021 (2014). 10.1016/j.cell.2014.11.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics (Oxford, England)25, 1754–1760, 10.1093/bioinformatics/btp324 (2009). 10.1093/bioinformatics/btp324 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nature biotechnology31, 1119–1125, 10.1038/nbt.2727 (2013). 10.1038/nbt.2727 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Meng, G., Li, Y., Yang, C. & Liu, S. MitoZ: a toolkit for animal mitochondrial genome assembly, annotation and visualization. Nucleic acids research47, e63, 10.1093/nar/gkz173 (2019). 10.1093/nar/gkz173 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Dierckxsens, N., Mardulyn, P. & Smits, G. NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic acids research45, e18, 10.1093/nar/gkw955 (2017). 10.1093/nar/gkw955 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Bernt, M. et al. MITOS: improved de novo metazoan mitochondrial genome annotation. Molecular phylogenetics and evolution69, 313–319, 10.1016/j.ympev.2012.08.023 (2013). 10.1016/j.ympev.2012.08.023 [DOI] [PubMed] [Google Scholar]
  • 20.Tillich, M. et al. GeSeq - versatile and accurate annotation of organelle genomes. Nucleic acids research45, W6–w11, 10.1093/nar/gkx391 (2017). 10.1093/nar/gkx391 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences of the United States of America117, 9451–9457, 10.1073/pnas.1921046117 (2020). 10.1073/pnas.1921046117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome research12, 1269–1276, 10.1101/gr.88502 (2002). 10.1101/gr.88502 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics (Oxford, England)21(Suppl 1), i351–358, 10.1093/bioinformatics/bti1018 (2005). 10.1093/bioinformatics/bti1018 [DOI] [PubMed] [Google Scholar]
  • 24.Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic acids research35, W265–268, 10.1093/nar/gkm286 (2007). 10.1093/nar/gkm286 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC bioinformatics9, 18, 10.1186/1471-2105-9-18 (2008). 10.1186/1471-2105-9-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Ou, S. & Jiang, N. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant physiology176, 1410–1422, 10.1104/pp.17.01310 (2018). 10.1104/pp.17.01310 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Wheeler, T. J. et al. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic acids research41, D70–82, 10.1093/nar/gks1265 (2013). 10.1093/nar/gks1265 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Current protocols in bioinformaticsChapter 4, 4.10.11-14.10.14, 10.1002/0471250953.bi0410s25 (2009). [DOI] [PubMed]
  • 29.Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids research27, 573–580, 10.1093/nar/27.2.573 (1999). 10.1093/nar/27.2.573 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Beier, S., Thiel, T., Münch, T., Scholz, U. & Mascher, M. MISA-web: a web server for microsatellite prediction. Bioinformatics (Oxford, England)33, 2583–2585, 10.1093/bioinformatics/btx198 (2017). 10.1093/bioinformatics/btx198 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic acids research25, 955–964, 10.1093/nar/25.5.955 (1997). 10.1093/nar/25.5.955 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Loman, T. A Novel Method for Predicting Ribosomal RNA Genes in Prokaryotic Genomes. (2017).
  • 33.Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic acids research33, D121–124, 10.1093/nar/gki081 (2005). 10.1093/nar/gki081 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics (Oxford, England)29, 2933–2935, 10.1093/bioinformatics/btt509 (2013). 10.1093/bioinformatics/btt509 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics (Oxford, England)24, 637–644, 10.1093/bioinformatics/btn013 (2008). 10.1093/bioinformatics/btn013 [DOI] [PubMed] [Google Scholar]
  • 36.Korf, I. Gene finding in novel genomes. BMC bioinformatics5, 59, 10.1186/1471-2105-5-59 (2004). 10.1186/1471-2105-5-59 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Keilwagen, J. et al. Using intron position conservation for homology-based gene prediction. Nucleic acids research44, e89, 10.1093/nar/gkw092 (2016). 10.1093/nar/gkw092 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nature methods12, 357–360, 10.1038/nmeth.3317 (2015). 10.1038/nmeth.3317 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature biotechnology33, 290–295, 10.1038/nbt.3122 (2015). 10.1038/nbt.3122 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Tang, S., Lomsadze, A. & Borodovsky, M. Identification of protein coding regions in RNA transcripts. Nucleic acids research43, e78, 10.1093/nar/gkv227 (2015). 10.1093/nar/gkv227 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic acids research31, 5654–5666, 10.1093/nar/gkg770 (2003). 10.1093/nar/gkg770 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome biology9, R7, 10.1186/gb-2008-9-1-r7 (2008). 10.1186/gb-2008-9-1-r7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics (Oxford, England)31, 3210–3212, 10.1093/bioinformatics/btv351 (2015). 10.1093/bioinformatics/btv351 [DOI] [PubMed] [Google Scholar]
  • 44.Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics (Oxford, England)30, 1236–1240, 10.1093/bioinformatics/btu031 (2014). 10.1093/bioinformatics/btu031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Mistry, J. et al. Pfam: The protein families database in 2021. Nucleic acids research49, D412–D419, 10.1093/nar/gkaa913 (2021). 10.1093/nar/gkaa913 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nature methods12, 59–60, 10.1038/nmeth.3176 (2015). 10.1038/nmeth.3176 [DOI] [PubMed] [Google Scholar]
  • 47.Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic acids research40, e49-e49, 10.1093/nar/gkr1293 (2012). [DOI] [PMC free article] [PubMed]
  • 48.Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology28, 511–515, 10.1038/nbt.1621 (2010). 10.1038/nbt.1621 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq. 2. Genome biology15, 550, 10.1186/s13059-014-0550-8 (2014). 10.1186/s13059-014-0550-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Köster, J., Dijkstra, L. J., Marschall, T. & Schönhuth, A. Varlociraptor: enhancing sensitivity and controlling false discovery rate in somatic indel discovery. Genome biology21, 98, 10.1186/s13059-020-01993-6 (2020). 10.1186/s13059-020-01993-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.NCBI Sequence Read Archive.https://identifiers.org/ncbi/bioproject:PRJNA1049652 (2024).
  • 52.NCBI Sequence Read Archive.https://identifiers.org/ncbi/insdc.sra:SRR27128105 (2024).
  • 53.NCBI Sequence Read Archive.https://identifiers.org/ncbi/insdc.sra:SRR27167876 (2024).
  • 54.NCBI Sequence Read Archive.https://identifiers.org/ncbi/insdc.sra:SRR27129026 (2024).
  • 55.NCBI Sequence Read Archive.https://identifiers.org/ncbi/insdc.sra:SRR27204218 (2024).
  • 56.NCBI Sequence Read Archive.https://identifiers.org/ncbi/insdc.sra:SRP477240 (2024).
  • 57.Ji, J. The annotation files of Episyrphus balteatus genome.10.6084/m9.figshare.24797310.v2 (2023). 10.6084/m9.figshare.24797310.v2 [DOI]
  • 58.Ji, J. Episyrphus balteatus isolate JJ-2024, whole genome shotgun sequencing project. Genbankhttps://identifiers.org/ncbi/insdc.gca:GCA_040182855.1 (2024).
  • 59.Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England)25, 1754–1760, 10.1093/bioinformatics/btp324 (2009). 10.1093/bioinformatics/btp324 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome biology20, 238, 10.1186/s13059-019-1832-y (2019). 10.1186/s13059-019-1832-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Katoh, K., Asimenos, G. & Toh, H. Multiple alignment of DNA sequences with MAFFT. Methods in molecular biology (Clifton, N.J.)537, 39–64, 10.1007/978-1-59745-251-9_3 (2009). 10.1007/978-1-59745-251-9_3 [DOI] [PubMed] [Google Scholar]
  • 62.Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Molecular Biology and Evolution32, 268–274, 10.1093/molbev/msu300 (2014). [DOI] [PMC free article] [PubMed]
  • 63.Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Computer applications in the biosciences: CABIOS13, 555–556, 10.1093/bioinformatics/13.5.555 (1997). 10.1093/bioinformatics/13.5.555 [DOI] [PubMed] [Google Scholar]
  • 64.Han, M. V., Thomas, G. W. C., Lugo-Martinez, J. & Hahn, M. W. Estimating Gene Gain and Loss Rates in the Presence of Error in Genome Assembly and Annotation Using CAFE 3. Molecular Biology and Evolution30, 1987–1997, 10.1093/molbev/mst100 (2013). [DOI] [PubMed]
  • 65.Hawkes, W., Sivell, O. & Wotton, K. The genome sequence of the Marmalade Hoverfly, Episyrphus balteatus (De Geer, 1776). Wellcome Open Res8, 106, 10.12688/wellcomeopenres.19073.1 (2023). 10.12688/wellcomeopenres.19073.1 [DOI] [Google Scholar]
  • 66.Hawkes, W. & Wotton, K. The genome sequence of the drone fly, Eristalis tenax (Linnaeus, 1758). Wellcome Open Res6, 307, 10.12688/wellcomeopenres.17357.1 (2021). 10.12688/wellcomeopenres.17357.1 [DOI] [Google Scholar]
  • 67.Hawkes, W., Sivell, O., Sivell, D., Massy, R. & Wotton, K. The genome sequence of the pied hoverfly, Scaeva pyrastri (Linnaeus, 1758). Wellcome Open Res8, 83, 10.12688/wellcomeopenres.18892.1 (2023). 10.12688/wellcomeopenres.18892.1 [DOI] [Google Scholar]
  • 68.Crowley, L., Ashworth, M. & Wawman, D. The genome sequence of the Thick-legged Hoverfly, Syritta pipiens (Linnaeus, 1758). Wellcome Open Res, 349, 10.12688/wellcomeopenres.19848.1 (2023).
  • 69.Hawkes, W. & Wotton, K. The genome sequence of the tapered dronefly, Eristalis pertinax (Scopoli, 1763). Wellcome Open Res6, 292, 10.12688/wellcomeopenres.17267.2 (2021). 10.12688/wellcomeopenres.17267.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Crowley, L. M., Mitchell, R., Weston, S. T. & Wotton, K. R. The genome sequence of the Lesser Hornet Hoverfly, Volucella inanis (Linnaeus, 1758). Wellcome Open Res8, 69, 10.12688/wellcomeopenres.18897.1 (2023). 10.12688/wellcomeopenres.18897.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Crowley, L. & Nash, W. The genome sequence of the Golden-tailed Leafwalker, Xylota sylvarum (Linnaeus, 1758). Wellcome Open Res10.12688/wellcomeopenres.19241.1 (2023). 10.12688/wellcomeopenres.19241.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Sivell, D., Sivell, O., Hawkes, W. L. & Wotton, K. R. The genome sequence of the Vagrant Hoverfly, Eupeodes corollae (Fabricius, 1794). Wellcome Open Res8, 112, 10.12688/wellcomeopenres.19099.1 (2023). 10.12688/wellcomeopenres.19099.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Hawkes, W., Wotton, K. & Smith, M. The genome sequence of the two-banded wasp hoverfly, Chrysotoxum bicinctum (Linnaeus, 1758). Wellcome Open Res6, 321, 10.12688/wellcomeopenres.17382.1 (2021). 10.12688/wellcomeopenres.17382.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Hawkes, W. & Wotton, K. The genome sequence of the plain-faced dronefly, Eristalis arbustorum (Linnaeus, 1758). Wellcome Open Res, 61, 10.12688/wellcomeopenres.17580.1 (2022). [DOI] [PMC free article] [PubMed]
  • 75.Liu, H., Zhao, L., Li, G., He, Y. & Huo, K. The complete mitochondrial genome of Melanostoma mellinum (Linnaeus, 1758) (Diptera: Syrphidae) and phylogenetic analysis. Mitochondrial DNA B Resour7, 1664–1665, 10.1080/23802359.2022.2107452 (2022). 10.1080/23802359.2022.2107452 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Sivell, O. & Sivell, D. The genome sequence of a hoverfly, Xanthogramma pedissequum (Harris, 1776). Wellcome Open Res7, 38, 10.12688/wellcomeopenres.17559.1 (2022). 10.12688/wellcomeopenres.17559.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Falk, S. & Chua, P. The genome sequence of the meadow field syrph, Eupeodes latifasciatus (Macquart, 1829). Wellcome Open Res7, 253, 10.12688/wellcomeopenres.18113.1 (2022). 10.12688/wellcomeopenres.18113.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Falk, S. The genome sequence of the large burdock Cheilosia, Cheilosia vulpina (Meigen, 1822). Wellcome Open Res6, 351, 10.12688/wellcomeopenres.17491.1 (2021). 10.12688/wellcomeopenres.17491.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Doyle, T. et al. Genome-wide transcriptomic changes reveal the genetic pathways involved in insect migration. Molecular ecology31, 4332–4350, 10.1111/mec.16588 (2022). 10.1111/mec.16588 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. NCBI Sequence Read Archive.https://identifiers.org/ncbi/bioproject:PRJNA1049652 (2024).
  2. NCBI Sequence Read Archive.https://identifiers.org/ncbi/insdc.sra:SRR27128105 (2024).
  3. NCBI Sequence Read Archive.https://identifiers.org/ncbi/insdc.sra:SRR27167876 (2024).
  4. NCBI Sequence Read Archive.https://identifiers.org/ncbi/insdc.sra:SRR27129026 (2024).
  5. NCBI Sequence Read Archive.https://identifiers.org/ncbi/insdc.sra:SRR27204218 (2024).
  6. NCBI Sequence Read Archive.https://identifiers.org/ncbi/insdc.sra:SRP477240 (2024).
  7. Ji, J. The annotation files of Episyrphus balteatus genome.10.6084/m9.figshare.24797310.v2 (2023). 10.6084/m9.figshare.24797310.v2 [DOI]
  8. Ji, J. Episyrphus balteatus isolate JJ-2024, whole genome shotgun sequencing project. Genbankhttps://identifiers.org/ncbi/insdc.gca:GCA_040182855.1 (2024).

Data Availability Statement

The utilization of bioinformatics software and tools for this research was performed in accordance with published protocols and manuals obtained from public databases. The method provided a description of the software version and parameters without employing any specific code or scripts.


Articles from Scientific Data are provided here courtesy of Nature Publishing Group

RESOURCES