Skip to main content
Mobile DNA logoLink to Mobile DNA
. 2026 Feb 5;17:8. doi: 10.1186/s13100-026-00393-0

Tracing ancient viral footprints: a comprehensive study of endogenous viral elements in Bombus species

Lucas Barbosa de Amorim Conceição 1, João Pedro Nunes Santos 1, Lucas Yago Melo Ferreira 1, Gabriel Victor Pina Rodrigues 1, Marco Antônio Costa 1, Eric Roberto Guimarães Rocha Aguiar 2,
PMCID: PMC12922265  PMID: 41645309

Abstract

Background

Endogenous Viral Elements (EVEs) are viral sequences integrated into the host germline and passed to offspring. Most virus types can integrate, often with the help of host retroelements, especially for non-retroviral RNA viruses. It is known that EVEs are widespread across insect species and related to an extensive range of virus taxa, many of which might share similar evolutionary origins. Bombus bees are essential pollinators that have been experiencing worldwide colony declines in recent decades. Therefore, uncovering genetic elements and pathways to better understand host–pathogen interactions is crucial in conserving biodiversity.

Results

Non-retroviral Integrated RNA Virus Sequences (NIRVS) were widespread in Bombus genomes, without a clear correlation between genome size and the number of EVEs. Most of the EVEs were single-copy, ranging from 111 to 3,729 bp with an average of 504 bp. Most of them share similarities with unclassified viruses and known viruses belonging to the families Partitiviridae and Virgaviridae, as well as the order Martellivirales. We observed that over 25% of the NIRVS contain conserved domains, with larger ones having a higher probability of functional annotation. Most NIRVS with conserved domains contained Polymerase-related motifs, the most represented group of domains among Bombus species. A comprehensive analysis of the NIRVS sharing pattern suggests that they are more likely to be inherited from a common ancestor than to result from integration events after speciation. Also, viral elements are widely conserved amongst species. Furthermore, we investigated transcriptional activity and the potential of the NIRVS to function as a priming agent for antiviral responses against exogenous viruses. On that note, most NIRVS in Bombus are transcriptionally active, and some share 15 nt of contiguity with exogenous bee viruses and could potentially be used as templates for piRNA production.

Conclusions

The integration of non-retroviral RNA viruses into bumblebee genomes is ancient and represents a dynamic evolutionary process in which many viral elements are conserved, shared, and may be functionally active in Bombus bees.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13100-026-00393-0.

Keywords: NIRVS, Endogenous Viral Elements, Bombus, Bee, Martellivirales

Background

Endogenous Viral Elements (EVEs) are viral sequences integrated into host genomes and transmitted vertically to offspring due to insertion in the germline [1]. It is well known that viruses of all genome types may undergo insertion, driven mainly by the action of cellular retroelements, including non-retroviral integrated RNA virus sequences (NIRVS), which are often found in regions of repetitive DNA and are frequently flanked by Long Terminal Repeat (LTR) retroelements. The fact that reverse transcription for these elements occurs in the cytoplasm, is active in germline cells, and that some may harbor virus-like domains supports the association between non-retroviral RNA EVEs and retroelements [2].

Those insertions may evolve neutrally and accumulate mutations, losing their viral identity when inserted in silenced regions of host DNA, or be inserted in transcriptionally active areas where they could act as templates for the biogenesis of small RNAs (sRNAs) or even be translated into defective viral proteins, both of which can contribute to antiviral immunity. Some EVEs may even undergo exaptation, assuming different advantageous roles in their hosts [3]. Some examples of functional EVEs in animals include syncytin genes, elements derived from endogenous retroviruses that play a crucial role in placental development in mammals [4], and viral elements in microgastroid parasitoid wasps that are translated into bracovirus and ichnovirus virions to modulate host physiology and facilitate parasitism [5].

In mosquito genomes, the diversity and structure of EVEs have been screened [68], and a significant source of EVE transcription, related to the generation of sRNAs, has been confirmed [9]. Most importantly, a functional link between non-retroviral EVEs and antiviral immunity has also been tested and confirmed, in which antiviral piRNAs are produced in the presence of a naturally occurring EVE and its cognate virus, limiting its replication in mosquitoes [10].

This discovery provides further evidence of the potential crucial roles that non-retroviral EVEs may play in insect immunity. This potential offers new perspectives on how to mitigate infections and control natural viral hazards to key species in our survival, inspiring novel antiviral strategies, and also provides a better understanding of how these integrations have shaped eukaryote evolution. Beyond medically necessary species such as mosquito vectors to human arboviruses, we highlight in this study the need for a deeper understanding of the presence of EVEs in insect pollinators, specifically bees.

In addition, approximately 87,5% of all terrestrial angiosperms rely upon animals for pollination [11]. The total economic value of insect pollination worldwide is estimated at more than EUR 153 billion annually [12]. In the USA, bees contributed USD 15,12 billion [13] to the economic value of agricultural production. They are the primary pollinators of plants and visit more than 90% of the world’s leading 107 crop types [14].

Among managed and wild species, more than 20,000 bee species have been described worldwide [15]. The western honeybee (Apis mellifera), the eastern honeybee (Apis cerana), some bumblebees (Bombus spp.), stingless bees, and solitary bees are part of the small group of managed species (~ 50 species) across the world. Alongside them, there is evidence of the roles of wild pollinators and diverse pollinator assemblages in improving the quality and quantity of global crop production [16], which also provides crucial benefits for maintaining current biodiversity and ecosystem stability [17]. However, climate change, pollution, habitat loss, and pathogens, among others, are factors increasingly linked to colony loss worldwide, which could severely hinder human food security and disrupt ecosystem functions [1821]. Thus, it is important to address the urgency of finding possible solutions to colony losses, mitigating these factors, in which viral pathogens are among the most concerning.

It is known that most bee viruses have single-stranded, positive-sense RNA genomes, including the globally significant Deformed Wing Virus (DWV), which replicates in many bee species, including honeybees and bumblebees. Kashimir Bee Virus (KBV), Israeli Acute Paralysis Virus (IAPV), and Acute Bee Paralysis Virus (ABPV) have also been identified in those bees and contribute to high mortality in Bombus populations [2226].

Many bumblebees, Bombus spp. (Hymenoptera: Apidae), have experienced significant population declines over the last few decades, raising concerns about wild bee populations, agricultural production, and the maintenance of biodiversity as we know it [17, 2739]. They are crucial pollinators, increasing pollination efficiency by vibrating their bodies [40], remaining active in cold weather, and being larger and hairy, which enables them to carry larger pollen loads [41].

This study aims to assess the presence and diversity of conserved non-retroviral RNA Endogenous Viral Elements in publicly available Bombus genomes to better understand the endogenous virome of bees and to enhance current knowledge of virus-host interactions, including potential antiviral activity from EVEs.

Methods

Retrieval of public data

Genomic data for forty-one bumble bee species were sourced from the National Center for Biotechnology Information (NCBI) GenBank database (https://www.ncbi.nlm.nih.gov/genome), with species details and metadata in Table 1. We also collected public Bombus spp. RNAseq libraries (Additional File 1). Furthermore, RefSeq viral proteomes were obtained from GenBank (< https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/ > Accessed in March 2025).

Table 1.

Overview of Bombus genomes collected from the NCBI genome database

Organism Name Taxon. ID Assembly Accession Release Date Assembly Level
B. affinis 309941 GCF_024516045.1 05/08/2022 Chromosome
B. balteatus 85657 GCA_019201815.1 13/07/2021 Contig
B. bifarius 103933 GCF_011952205.1 03/04/2020 Scaffold
B. breviceps 395515 GCA_014825925.1 02/10/2020 Chromosome
B. campestris 207624 GCA_905333015.3 16/10/2022 Chromosome
B. confusus 217217 GCA_014737475.1 25/09/2020 Scaffold
B. consobrinus 130686 GCA_014737455.1 25/09/2020 Scaffold
B. cullumanus 2562068 GCA_014737535.1 25/09/2020 Scaffold
B. dahlbomii 85658 GCA_037178635.1 18/03/2024 Chromosome
B. difficillimus 395520 GCA_014737525.1 25/09/2020 Scaffold
B. fervidus 203811 GCA_041682495.1 03/09/2024 Chromosome
B. flavifrons 103934 GCF_040668555.1 12/07/2024 Scaffold
B. haemorrhoidalis 207636 GCA_014825975.1 02/10/2020 Chromosome
B. hortorum 85660 GCA_905332935.1 19/03/2021 Chromosome
B. huntii 85661 GCF_024542735.1 09/08/2022 Chromosome
B. hypnorum 30191 GCA_911387925.2 14/10/2022 Chromosome
B. ignitus 130704 GCA_014825875.1 02/10/2020 Chromosome
B. impatiens 132113 GCA_043295415.1 17/10/2024 Chromosome
B. jonellus 85663 GCA_964197665.1 03/07/2024 Chromosome
B. lapidarius 30192 GCA_964186655.1 07/06/2024 Chromosome
B. muscorum 203813 GCA_963971185.1 04/03/2024 Chromosome
B. opulentus 2024865 GCA_034509555.1 26/12/2023 Chromosome
B. pascuorum 65598 GCF_905332965.1 19/03/2021 Chromosome
B. picipes 309970 GCA_014737485.1 25/09/2020 Scaffold
B. polaris 130708 GCA_014737335.1 24/09/2020 Scaffold
B. pratorum 30194 GCA_930367275.1 13/02/2022 Chromosome
B. pyrosoma 396416 GCF_014825855.1 02/10/2020 Chromosome
B. rufofasciatus 309971 GCA_040285845.1 28/06/2024 Chromosome
B. sibiricus 421273 GCA_014737505.1 25/09/2020 Scaffold
B. skorikovi 395565 GCA_014737355.1 24/09/2020 Scaffold
B. sonorus 203818 GCA_029958995.1 09/05/2023 Scaffold
B. soroeensis 184059 GCA_014737365.1 24/09/2020 Scaffold
B. superbus 1869276 GCA_014737385.1 24/09/2020 Scaffold
B. sylvestris 30201 GCA_911622165.2 15/10/2022 Chromosome
B. sylvicola 309975 GCA_019677175.1 18/08/2021 Contig
B. terrestris 30195 GCF_910591885.1 09/02/2022 Chromosome
B. turneri 686820 GCA_014825825.1 02/10/2020 Chromosome
B. vancouverensis 2705178 GCF_011952275.1 03/04/2020 Scaffold
B. vestalis 30202 GCA_963556215.1 10/10/2023 Chromosome
B. vosnesenskii 207650 GCA_026744215.1 08/12/2022 Scaffold
B. waltoni 395577 GCA_014737395.1 24/09/2020 Scaffold

Identification of endogenous viral elements

Our six-step workflow to identify putative NIRVS commenced with an initial screening phase on the Galaxy platform (https://usegalaxy.org/). This first step involved Open Reading Frame (ORF) prediction using Getorf (v5.0.0.1, EMBOSS), targeting ORFs of 100–6000 nucleotides defined by start/stop codons [42]. The resulting ORFs were then subjected to a homology search against the viral protein database using Diamond (v2.1.11 + galaxy0) in Blastx mode. For this alignment, we applied a maximum E-value cutoff of 1 × 10 − 10 and considered up to 5 database matches per query sequence [43].

To refine the diamond output, we employed in-house Python scripts to select the optimal alignment hit for each query, prioritizing the lowest E-value and the highest bit score. Scripts are available in the following GitHub repository: < https://github.com/Bioinscripts/Pipeline-to-Endogenous-Viral-Elements-EVEs- >. The sequences were then taxonomically classified by querying the NCBI Entrez system. This classification guided a final filtering step designed to isolate RNA virus-like sequences by specifically removing any hits corresponding to retroviruses, DNA viruses, transposable elements, and other non-viral organisms manually.

To reduce redundancy amongst the identified sequences arising from paralogous insertions (copies), we performed a CD-HIT [44] run with a similarity threshold of 0.9 to cluster single sequences with their respective copies. Thus, we could reduce the amount of data for the next step, in which we aligned only the single sequences (or non-redundant) against the nr and core_nt databases (release gb263.0) on the online Blast platform (https://blast.ncbi.nlm.nih.gov/Blast.cgi). Amongst the 10 best hits in each analysis on Blastx, we selected the first viral hit (based on the lowest E-value) as main evidence to catalogue the query sequence as a putative NIRVS (Additional File 1).

Assessment of viral diversity, conserved domains and gc percentage

Taxonomic classification of the putative NIRVS was determined based on the NCBI taxonomic lineage of their respective best Blastx hits, referencing the NCBI Taxonomy Browser (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi). For conserved domain assessment, ORFs and their corresponding amino acid sequences were first predicted from the endogenous viral sequences using NCBI's ORFfinder (https://www.ncbi.nlm.nih.gov/orffinder). These amino acid sequences were subsequently analyzed for conserved domains using both NCBI's Conserved Domain Database (CD-Search) [45] and InterProScan [46]. To measure GC content on viral sequences, we used geecee (Galaxy Version 5.0.0) with default parameters [42].

Retrieval of circulating viruses in bees and alignment with viral elements

To assemble a comprehensive dataset of viral sequences associated with arthropod hosts, we conducted a systematic search of the NCBI Nucleotide database. Recognizing the variability in host annotations across records, we employed a broad search strategy incorporating taxonomic terms such as "Arthropoda," "Insecta," and specific orders including "Hymenoptera," "Diptera," "Lepidoptera," "Coleoptera," "Hemiptera," "Orthoptera," "Odonata," "Blattodea," "Mantodea," "Araneae," "Scorpiones," "Acari," and "Decapoda." Additionally, we included terms related to specific genera and groups of interest, such as "Apoidea," "Apis," "Bombus," "Melipona," and "Euglossini," to capture sequences pertinent to bees and related taxa. This approach aimed to capture a wide array of arthropod-associated viral sequences, accounting for potential inconsistencies in metadata annotations. All retrieved sequences were downloaded in FASTA format for subsequent analyses. This comparison aimed to assess the alignment of 15 contiguous bases between the EVEs and known viruses, based on the literature to predict a possible silencing function [47].

Assessing shared EVEs between different Bombus species

To assess the similarity of the NIRVS shared among different species of the Bombus genus, we used BLASTn alignment data. The identifiers of each alignment pair were sorted and combined, ensuring that each unique pair was counted only once. The NIRVS pairs were subsequently enumerated by species, thereby yielding a count matrix for each species pair. These counts were transformed into a matrix and standardized for dissimilarity analysis using the Bray–Curtis distance, implemented via the vegdist function from the vegan package (v2.6–10) [48]. The graph was made by the ComplexHeatmap package (v2.22.0). This methodological approach facilitated the identification of patterns of NIRVS sharing across the analyzed species, thereby highlighting potential evolutionary relationships or conserved functional roles of these genetic elements. It is noteworthy that stringent filtering procedures were not implemented during the BLASTn alignments (e.g., by employing identity or coverage thresholds). This methodological decision enabled the observation of raw interaction patterns and distributional behavior of EVEs across species.

de novo transcriptome assembly and EVE-derived transcriptional activity

A total of thirty RNA-seq libraries, one per unique species (Additional File 1) were retrieved using the iseq tool v1.5.0 [49]. Quality assessment of raw reads was performed with MultiQC v1.28 [50], followed by adapter trimming and removal of low-quality bases (Phred score < 20) using Fastp v0.24.0 [51]. High-quality paired-end reads were de novo assembled with SPAdes v4.1.0 in RNA mode [52]. Redundant transcripts were clustered at 95% sequence identity using CD-HIT-EST v4.8.1[44]. ORFs were predicted with TIdeS v1.3.5 [53], employing a custom-trained machine learning model based on a non-redundant Bombus genus proteome obtained from NCBI. Transcript abundance was quantified at the transcript level with Salmon v1.10.3 [54] and normalized by transcripts per million (TPM).

Mapping small RNA libraries and piRNA analysis

To investigate the potential production of piRNAs by Bombus species targeting the characterized NIRVS, we analyzed publicly available small RNA sequencing (sRNA-seq) libraries retrieved from the NCBI Sequence Read Archive (SRA). All datasets available up to January 10th, 2026 were considered, with the exception of Bombus terrestris, whose piRNA pathway has been previously characterized and described in detail [55]. The libraries analyzed, and their associated metadata, are provided in Additional File 1.

Raw sequencing reads were initially subjected to quality control and adapter trimming using fastp v1.0.1 [56]. Only high-quality reads passing the default quality filtering parameters were retained for downstream analyses. Filtered reads were subsequently mapped against the NIRVS sequences using Bowtie v1.3.1 [57], allowing up to one mismatch and retaining only valid alignments.

To evaluate piRNA-associated features, the size distribution of mapped reads was assessed, with particular emphasis on the 24–32 nt length range. Additionally, nucleotide composition analyses were performed to examine the characteristic 5′ uridine (1U) bias and adenine enrichment at the tenth position (10A), which are hallmark signatures of primary and secondary piRNA populations [58]. Read length profiles and nucleotide preference plots were generated using ggplot2 in R.

NIRVS GC context

For the analysis of NIRVS in Bombus terrestris, we further refined our selection based on GC content profiles across the flanking regions. We prioritized sequences that exhibited abrupt inflection points in GC content at the NIRVS insertion site. Each selected region comprised 20,000 nucleotides—10,000 upstream and 10,000 downstream of the NIRVS. To aid in this selection, we used the EMBOSS tool cpgplot to identify and visualize CpG islands across the extracted sequences (cpgplot -sequence extracted_sequences.fasta -graph ps -outfile cpgplot_results.txt) using the following parameters: window size of 500, minimum island length of 200, observed/expected CpG ratio threshold of 0.3, and minimum GC percentage of 30%.

These regions were then examined for their genomic context, distinguishing whether the NIRVS was located within exons, introns, or intergenic regions. This enabled more targeted analyses, including specific alignments between the NIRVS and the exon into which it was inserted.

Phylogenetic analysis

For this step, our goal was to assess the phylogenetic relationships among all 41 species in this work to infer how the shared EVEs relate to one another. We based our study on the most comprehensive phylogeny for the Bombus genus found in the literature [59]. We recreated their phylogenetic tree using the same sequences as the authors and included species not present in the original tree but available in GenBank. Sequences were edited and aligned (default parameters, CLUSTAL W) in MEGA v12 [60]. We applied the same logic to manual editing of the sequences; however, we excluded PEPCK (Phosphoenolpyruvate Carboxykinase) sequences from our analysis due to high alignment ambiguity. We selected the best substitution models in ModelFinder [61]. The tree was estimated in IQ-TREE v2.4.0 [62] with default parameters and up to 1000 bootstrap replicates.

We also estimated phylogenetic trees for the order Martellivirales, including predicted endogenous viral elements specific to the taxon. We retrieved all Martellivirales RdRp protein sequences from RefSeq (Search string: "Martellivirales[Organism] AND RNA-dependent RNA polymerase[Protein Name] OR RdRp[Protein Name] OR replicase[Protein Name] OR RNA dependent RNA polymerase[Protein Name] OR replication-associated polyprotein[Protein Name] OR replicase protein[Protein Name] OR non-structural polyprotein[Protein Name] OR nonstructural polyprotein[Protein Name]") and one sequence from each family in Hepelivirales and Tymovirales for the outgroup. Before aligning our sequences, we used the latest ICTV Taxonomy Release (< https://ictv.global/msl/current >) to classify the retrieved sequences into families and orders, filtering out all sequences that did not belong to Martellivirales but were collected with the search string, and later used the family classification to correctly address the monophyletic clades in the final tree. Using Mafft v7 [63] with the FFT-NS-I strategy, the amino acid sequences were aligned with the Xiangshan martelli-like Virus 1, Xiangshan martelli-like virus 2, and Xiangshan martelli-like virus 3 RdRp sequences, alongside all NIRVs that were similar and aligned to the same region in Xiangshan martelli-like viruses with blastx alignments. After manual trimming in MEGA v12, which led to shortening the sequences to an extension of 161 amino acids due to the short size of NIRVS, the tree was inferred using IQ-TREE Galaxy Version 2.4.0 + galaxy1 [62] with the Q.pfam + F + I + R4 substitution model, selected as the best-fit model by ModelFinder [61], and visualized with iTOL [64]. A thorough optimization strategy was employed, including 200 initial trees, 50 retained top trees, and an exhaustive Nearest Neighbor Interchange search. We assessed branch support using 1000 replicates of both SH-aLRT and Ultrafast Bootstrap. The final command line with personalized parameters is as follows: “iqtree –prefix PREF -T ${GALAXY_SLOTS:−10} –redo -s '/mnt/pulsar/files/staging/13893363/inputs/dataset_6ebff454-1cb5-4c94-ae97-e2af9d62d8f1.dat' –seqtype AA –lmap '1000' -m '' –msub 'nuclear' –cmin '2' –cmax '10' –merit 'AIC' –ninit '200' –ntop '50' –nbest '5' –nstop '100' –radius '6' –perturb '0.5' -allnni –alrt '1000' –sup-min '0.0' –ufboot '1000' –nmax '1000' –nstep '100' –bcor '0.99' –beps '0.5'”.

Genomic context characterization

We characterized the genomic context of all eight annotated Bombus species available in NCBI (B. affinis, B. bifarius, B. flavifrons, B. huntii, B. pascuorum, B. pyrosoma, B. terrestris, and B. vancouverensis) in order to find common patterns of integration or shared context among species, as well as proximity to transposable elements. To evaluate the identity and coverage shared among all flanking regions, NIRVS loci were first aggregated into clusters based on genomic proximity, merging intervals located within 10 kb of one another, and then extracted and concatenated regions spanning 30 kb upstream and downstream of each NIRVS cluster. Structural similarity was assessed using ProgressiveMauve [65], generating percent identity matrices that were clustered in R using the Ward.D2 method. Sequence similarity was evaluated with BLASTn, retaining the best high-scoring pair (HSP) per comparison. Mean nucleotide identity and cumulative coverage were calculated, and results were visualized as bubble plots using ggplot2, integrating Mauve clustering order with BLAST similarity.

Transposable Elements were annotated using a custom automated pipeline integrating de novo discovery with homology-based detection. Repetitive consensus sequences were identified using RepeatModeler2 [66] and merged with the Dfam database. To minimize redundancy, the combined repetitive consensus and Dfam database were clustered using cd-hit-est (version 4.8.1) [44] with a sequence identity threshold of 98% (-c 0.98) and alignment coverage of 80% (-aS 0.80). The resulting non-redundant library served as the reference for genomic masking and annotation using RepeatMasker version 4.2.2 [67]. To evaluate the spatial association between NIRVS and transposable elements (TEs), TE annotations were filtered to exclude non-TE features (e.g., rRNA, simple repeats) and merged if located within 1 kb. We calculated TE base-pair density within strand-corrected 500-bp bins extending 10 kb upstream and downstream of NIRVS insertion sites. Enrichment relative to the genomic background was assessed using Fisher’s exact tests.

Finally, to compare and investigate the genomic neighborhoods of NIRVS insertions across different Bombus species, we used the same window extending 30 kb upstream and downstream of the insertion boundaries as before, from which we mapped cellular genes using the GFF3 annotations available in NCBI, and repetitive elements using the RepeatMasker outputs, filtering out simple repeats and low-complexity regions. Transposable elements were color-coded by classification (LTR, LINE, SINE, DNA, RC) and visualized alongside EVEs and host genes using the matplotlib package.

Features composition comparison

In this analysis, we focused exclusively on viral coding sequences previously identified in Bombus spp. and on NIRVS found in Bombus terrestris. To reduce redundancy, the viral CDSs were filtered using CD-HIT with a 90% similarity threshold. Subsequently, we selected those viral CDSs that clustered with NIRVS, prioritizing sequences likely to share similar genomic features.

To investigate patterns of similarity between NIRVS and exogenous viral sequences, we performed hierarchical clustering based on codon usage, GC content, and sequence length. Codon usage profiles were generated using the cusp tool from EMBOSS, and GC content was calculated using geecee [42]. To identify the variables most strongly contributing to the observed clustering, we conducted a principal component analysis (PCA). In addition, to assess the possibility of using these features to differentiate between exogenous and endogenous viral sequences, we conducted a t-distributed stochastic neighbor embedding (t-SNE) using 2 dimensions and 6 perplexity [68].

PCR and sanger sequencing

We collected 3 Bombus spp. bees from wild colonies near Montana State University in Bozeman, Montana, USA. DNA was extracted following a phenol–chloroform-isoamyl alcohol protocol. 4 pairs of primers for PCR were designed from the best-representative EVEs for a broader range of Bombus species, based on blastn alignments (additional file 2, Fig. 10A) against Dumyat virus and Xiangshan martelli-like virus 2. After running the PCR products on an electrophoresis gel, we purified the amplified DNA sequences and Sanger-sequenced them. A final blastn alignment was performed to confirm the similarity between the sequenced element and the identified EVE.

Results

NIRVS are widespread in Bombus genomes

The genus Bombus (bumble bees) is well represented in GenBank, with numerous high-quality sequenced genomes, many of which are assembled to the chromosome level. Key contributions to this resource have been made by institutions such as the Chinese Academy of Agricultural Sciences and the Wellcome Sanger Institute. Bombus genome sizes exhibit considerable variation, ranging from approximately 230 Mb (Bombus superbus) to 479 Mb (Bombus vosnesenskii) (Fig. 1A). While the number of predicted ORFs generally correlates with genome size, the subsequent counts of Diamond Blastx hits and the final set of putative NIRVS (Additional File 2 – Fig. 1) do not maintain this direct proportionality. The marked reduction in candidate elements through these successive analytical stages underscores the critical role of stringent filtering in accurately identifying NIRVS.

Fig. 1.

Fig. 1

Overview of the genomic features and endogenous viral elements identified in Bombus bees. A multiple bar graph exploring genome size, number of predicted ORFs, filtered DIAMOND viral hits, and putative EVEs after manual curation for each bee species. On the left, species are associated with their respective subgenera classification: Al, Alpinobombus; Ag, Alpigenobombus; Bi, Bombias; Bo, Bombus; Cu, Cullumanobombus; Kl, Kallobombus; Md, Mendacibombus; Mg, Megabombus; Ml, Melanobombus; Or, Orientalibombus; Pr, Pyrobombus; Ps, Psithyrus; Sb, Sibiricobombus; St, Subterraneobombus; Th, Thoracobombus. B Sankey plot displaying the NIRVS classification based on sequence similarity best hit across different taxa and genetic types

Overall, our comprehensive analysis of publicly available genome assemblies from 41 Bombus species identified 467 putative NIRVS sequences. These elements, identified via Blastx searches and subsequent filtering, ranged from 100 to 6000 bp in length, with 252 (approximately 54%) being shorter than 500 bp (details in Additional File 1). The distribution of NIRVS varied significantly among species: for instance, B. terrestris harbored 37 NIRVS, whereas B. hortorum and B. opulentus each contained only a single identified NIRVS. Notably, B. waltoni and B. superbus were the only species in which no NIRVS were detected under our analytical conditions.

Dominance of unclassified and plant/insect/fungi viral signatures in Bombus NIRVS

A substantial portion of the identified NIRVS (231 sequences) remained unassigned to any specific viral family or order. Among the classifiable sequences, NIRVS showed similarities with several viral families: Partitiviridae (32 sequences), Virgaviridae (n = 29), Phasmaviridae (n = 25), Totiviridae (n = 15), and Rhabdoviridae (n = 10). Additionally, some NIRVS could only be resolved to the order level, including 4 sequences assigned to Bunyavirales and 121 to Martellivirales. Consequently, the majority of these classified NIRVS likely originated from positive-sense single-stranded RNA (+ ssRNA) viruses, followed by those from double-stranded RNA (dsRNA) and negative-sense single-stranded RNA (-ssRNA) viruses (Fig. 1B).

Polymerase-related domains are prevalent in Bombus NIRVS

To confirm or refute the protein domains identified by Blastx alignments, we performed functional annotation of conserved viral domains within the identified NIRVS, revealing a prevalence of RNA-dependent RNA polymerase (RdRp) domains, detected in 66 NIRVS. Other common domains included Polymerase-related ones (methyltransferase: 20; helicase: 20) and coat protein domains (9 NIRVS) (Fig. 2A). However, a considerable number of sequences (292 NIRVS) lacked any detectable conserved viral domains, potentially indicating they are highly degraded or divergent viral remnants. Also, most species had more abundant RdRp integrations than in other domains, both in Blastx and CD-blast hits (Fig. 2B and C). Interestingly, while nucleotide-level alignments for many full-length NIRVS were not conclusive in establishing clear viral resemblance, the protein domains identified through functional annotation consistently corresponded to the viral protein families suggested by the initial Blastx alignments. This was particularly evident for NIRVS related to Xiangshan martelli-like virus 2 and Xiangshan martelli-like virus 3, from which most functional annotations originated. Many NIRVS contained multiple domains attributed to these specific martelli-like viruses. For example, in Bombus affinis, Bombus flavifrons, and Bombus terrestris, individual NIRVS related to these viruses were found to harbor up to three distinct conserved domains.

Fig. 2.

Fig. 2

Screening of conserved viral domains in NIRVS. A Compared prevalence of predicted conserved domains (Interpro and CD-Blast) and protein hits on Blastx. B Shared Blastx protein hits between NIRVS across bee species. C Shared conserved domains between NIRVS across bee species. D multiple bar graph for conserved domains distribution across bee species and identified by related viral taxon

We observed a distinct profile of viral sequences integration associated with each taxon. NIRVS associated with Partitiviridae are typically derived mostly from polymerase sequences containing RdRp domains. In contrast, those linked to Phasmaviridae, Rhabdoviridae, Totiviridae, and unclassified Bunyavirales were mostly associated with nucleoprotein and coat/capsid and often lacked detectable conserved domains. Furthermore, some Virgaviridae-like NIRVS possessed both helicase and RdRp domains, which supports a broader pattern of diverse integrations from the order Martellivirales (see Fig. 2D and Additional File 2 – Fig. 2).

NIRVS size correlates with presence of conserved domains and host GC similarity

We separately analyzed the correlation between protein domains and two key genomic features of NIRVS sequences. To provide evidence on the origin and timing of integration, we analyzed GC content in the viral sequences. We compared them with the host sequences, where the GC content differences (Host GC – EVE GC = ΔGC) varied across viral families, in which NIRVS related to taxa such as Totiviridae, Martellivirales, Virgaviridae (including positive-sense ssRNA and dsRNA virus groups), and most of the unclassified sequences exhibited greater ΔGC values (Fig. 3). On the other hand, NIRVS related to Phasmaviridae, Rhabdoviridae, and Bunyavirales (all negative-sense ssRNA virus groups) showed more similarity to their hosts’ GC content.

Fig. 3.

Fig. 3

Relationship between GC content and presence of conserved domains. Each NIRVS classified by Blastx hit was screened for the confirmation of conserved domains. Absence of predicted conserved domains for the blastx hits is labeled as green. Presence of predicted conserved domains with Interpro and CD-Blast for the blastx hits is labeled blue. Significance between GC content of NIRVS with and without domains was calculated using Wilcoxon test

Statistical comparisons revealed a significant tendency for sequences with functional annotation to show fewer minor differences from their host GC (p = 0.0037), suggesting that larger viral insertions with identifiable conserved domains may be more structurally adapted to the host genome (Fig. 3). Some domain-containing sequences, especially RdRp and coat proteins (p < 2e-16 and p = 0.00092, respectively), have significantly lower ΔGC compared to their counterparts. In contrast, sequences lacking conserved domains showed no significant correlation with genome size (R = 0.054, p = 0.57) and exhibited higher ΔGC values. This could reflect a selective pressure favoring GC content convergence toward host genomes for functional EVEs.

In terms of sequence length, the other genomic feature we highlight here, annotated sequences tend to be longer than non-annotated sequences in some cases. NIRVS with RdRp and polyprotein domains were significantly longer (p = 4.3e-12 and p = 0.00043, respectively), whilst no significant differences were observed for other virus-related proteins (Additional File 2—Fig. 3). NIRVS related to Virgaviridae and Totiviridae families generally had longer sequences, while those associated to sequences from Partitiviridae and Phasmaviridae families exhibited shorter average lengths.

NIRVS sharing patterns and shared genomic context suggest a high prevalence of inheritance events

To determine whether NIRVS were shared across Bombus species, we performed a Bray–Curtis dissimilarity analysis on all identified NIRVS sequences, which resulted in their clustering into five distinct groups (Fig. 4). When compared with results from the Bombus phylogeny, these groupings suggest potential shared histories of viral integration events among different subgenera. Our species are grouped according to previous phylogenies [59, 6971]. The complete version of the tree is found in Additional File 2 – Fig. 4.

Fig. 4.

Fig. 4

Phylogeny correlates with NIRVS sharing patterns in Bombus species. NIRVS were clustered using Bray–Curtis dissimilarity, which quantifies differences in sequence abundance between species, and visualized as a heatmap. These patterns were compared against a reduced phylogeny of 41 Bombus species inferred from four genes (16S, EF1-α, Opsin, and ArgK) using a maximum likelihood approach with 1,000 bootstrap replicates. The full phylogeny, comprising > 200 species, is provided in Additional File 2 – Fig. 4. Species are associated with their respective subgenera classification and colored accordingly: Al, Alpinobombus; Ag, Alpigenobombus; Bi, Bombias; Bo, Bombus; Cu, Cullumanobombus; Kl, Kallobombus; Md, Mendacibombus; Mg, Megabombus; Ml, Melanobombus; Or, Orientalibombus; Pr, Pyrobombus; Ps, Psithyrus; Sb, Sibiricobombus; St, Subterraneobombus; Th, Thoracobombus

Alpinobombus (Al), Bombus (Bo), and Pyrobombus (Pr) bees have the most identified NIRVS among all groups and are relatively close in the phylogenetic tree. This specific lineage may have conserved more endogenous viral sequences over time, as opposed to recent integrations, since, even though they do not share some sequences with other branches, they do share “Dumyat virus”, “Wuhan insect virus 22”, “Xiangshan martelli-like virus 2”, and “Xiangshan martelli-like virus 3” related EVEs with many other species.

From the total, only 6 NIRVS were species-specific, highlighting the high prevalence of NIRVS sharing and possibly inheritance. On the other hand, for some of the NIRVS, for instance, we observed the presence of Dumyat virus-related sequences in 38 out of 41 species, which correlated with the speciation of subgenus Mendacibombus (Additional File 2 – Fig. 4). Genomic context analysis of species with available annotations revealed a high level of nucleotide conservation and sharing for orthologous insertions of Dumyat virus and Xiangshan Martelli-like viruses (Additional file 2 – Figs. 5, 6, 7 and 8). Those regions display high microsynteny and conserved flanking genes, suggesting an orthologous insertion event rather than multiple insertions. They also appear to be inserted within clusters of diverse transposable elements (DNA, LINE, LTR), while the NIRVS locus itself remains relatively low in TE insertions (Additional file 2 – Fig. 9).

Many NIRVS are transcriptionally active in Bombus

To investigate the potential transcriptional activity and assess the distribution of the analyzed non-retroviral integrated RNA virus sequences within the Bombus genus, we mapped reads from each retrieved RNA-seq library against a set of 467 NIRVS sequences alongside constitutive host marker genes.

From the total, 83 (~ 20%) NIRVS showed transcriptional activity in at least one Bombus species. Interestingly, B. bifarius EVE5 was the most widespread, exhibiting notable transcript abundance in 20 distinct Bombus libraries derived from various species. Excluding NIRVS that showed no detectable transcript levels, the least widespread elements included B. terrestris EVE22, B. rufofasciatus EVE2, and B. picipes EVE15 (Fig. 5 and Additional File 1).

Fig. 5.

Fig. 5

Assessment of Transcriptional activity of NIRVS sequences in Bombus species. Heatmap showing NIRVS abundance along different Bombus species. Rows were clustered based on Pearson correlation to group sequences with similar abundance calculated as transcripts per million (TPM). Only transcriptionally active sequences are shown

Notably, several NIRVS displayed high expression levels. Specifically, B. breviceps EVE3, B. terrestris EVE7, and B. impatiens EVE1 exhibited the highest Transcripts Per Million (TPM) values. In some libraries, the TPM levels for these NIRVS were comparable to those observed for constitutively expressed genes such as Histone H4 and Cathepsin L. Some sequences appeared to be expressed in different species due to the similarities that those sequences share across different species.

In order to check the fidelity of our findings, we assembled the reads from each RNAseq library that mapped against each EVE and performed blastn alignments to confirm that the assembled contigs match the integrations in size and identity (additional file 2, Fig. 13).

Potential for Bombus NIRVS to template piRNAs targeting exogenous viruses

Among the identified NIRVS in Bombus, 11 exhibited a minimum of 15 nucleotides (nt) of perfect contiguous identity when aligned against Arthropoda-associated viral sequences from the NCBI RefSeq database (Fig. 6). This observation is notable because PIWI-clade Argonaute proteins utilize piRNAs as guides to combat viral infections, among other functions [72]. Effective PIWI-catalyzed target slicing requires piRNAs to have at least 15 nt of contiguous base-pairing with the target sequence, while mismatches at other positions can be tolerated [47].

Fig. 6.

Fig. 6

Similarity of NIRVS and circulating viruses in Bombus. Sequences are displayed according to nucleotide alignment between putative NIRVS and viruses from RefSeq database. Identity and sequence lengths are shown. Only sequences with at least 15 contiguous base pairs were considered

Therefore, even though Bombus NIRVS generally show low overall nucleotide identity and coverage when compared to contemporary circulating viruses, these identified 15nt contiguous regions could hypothetically allow NIRVS-derived transcripts to serve as templates for piRNA biogenesis, thereby potentially contributing to antiviral defense mechanisms.

We also tested our hypothesis by mapping small RNA reads from public libraries against the identified NIRVS. We did not find many matches for most EVEs, and even when matched, there were very few reads that could not be confidently classified as piRNAs (Additional File 2 – Figs. 14 and 15).

NIRVS and exogenous viruses show similar codon usage and GC content

To investigate whether NIRVS maintained viral molecular signatures or converged to the host characteristics, we used Bombus terrestris as a model to assess nucleotide composition that could indicate adaptation and help to differentiate exogenous from endogenous sequences. Through a Principal Component Analysis (PCA) based on codon usage, GC content, and sequence length, we observed that a few codons and, most notably, GC content were key factors shaping the distribution of sequences in multivariate space (Additional File 2 – Fig. 11). Also, NIRVS encoding conserved protein domains tended to cluster apart from those lacking annotated domains, suggesting potential functional constraints associated with recognizable protein-coding capacity.

We also tried a different strategy, trying to differentiate endogenous and exogenous sequences. Comparison among NIRVS and related exogenous viruses identified in Bombus terrestris revealed that for some NIRVS the patterns of codon usage closely match those of related circulating viruses, which suggests two possibilities: (i) NIRVS are products of recent integrations or (ii) NIRVS are under positive selection, preserving exogenous viral characteristics (Additional File 2—Fig. 12).

We also compared NIRVS in Bombus terrestris with similar viral elements from other species and exogenous viruses, all related to the order Martellivirales, through a phylogenetic analysis (Fig. 7). We aimed to identify which lineage within this taxon is most closely related to the NIRVS we previously identified and how the sequences are related to one another. All viral elements were grouped into a highly supported monophyletic clade (SH-alrt = 98.2/Bootstrap = 99), suggesting a shared evolutionary event or a specific viral lineage that diversified within these bee hosts, rather than random, unrelated insertions. They are placed as a sister clade to Xiangshan martelli-like virus 2 and Xiangshan martelli-like virus 3, strongly suggesting that they constitute a distinct lineage within the broader Martellivirales order.

Fig. 7.

Fig. 7

Maximum likelihood phylogenetic tree (partial RdRp amino acid sequences) of NIRVS identified in Bombus genomes. The phylogenetic reconstruction places the identified NIRVS within the order Martellivirales. The tree was inferred using IQ-TREE (Galaxy Version 2.4.0 + galaxy1) with the Q.pfam + F + I + R4 substitution model, selected as the best-fit model by ModelFinder, and visualized with iTOL v7. A thorough optimization strategy was employed, including 200 initial trees, 50 retained top trees, and an exhaustive Nearest Neighbor Interchange (-allnni) search. Nodal support values are indicated as Shimodaira–Hasegawa—approximate likelihood ratio test (%)/Ultrafast Bootstrap (%) based on 1000 replicates for both. The EVEs (colored in red) form a well-supported monophyletic clade (98.2/99) and are positioned as the sister group to the Xiangshan martelli-like viruses 2 and 3. Unclassified Martelli-viruses are depicted as single branches rather than collapsed within families. The complete version of this tree, without any collapsed branches, is shown in Additional File 2, Fig. 16

Discussion

The integration of non-retroviral RNA viruses into bumblebee genomes represents a dynamic evolutionary process that elicits both past viral infections and bee-virus coevolution. In this study, we characterized NIRVS across several Bombus genomes, revealing significant variation in their sequence composition, functional domains, and taxonomic origins. Our findings suggest that viral integrations have occurred repeatedly throughout the evolutionary history of bumblebees, and many are shared among Bombus species, likely due to inheritance from a common ancestor. The PCR results further validate our findings and show the actual presence of the two most widespread NIRVS across Bombus species (Additional File 2, Fig. 10B, C and D).

The most frequently found RNA virus families related to EVEs in arthropod genomes are primarily four: Rhabdoviridae, Flaviviridae, and Chuviridae [2, 55, 73]. The last two weren’t identified in our analysis. The viral elements identified showed similarities to virus sequences classified within families that do not have significant pathogenic importance in bees, since the main viruses affecting commercially important populations belong to the families Dicistroviridae and Iflaviridae included in the order Picornavirales [74, 75]. However, the origin of these NIRVS from viral groups with currently limited recognized pathogenicity in Bombus does not necessarily negate their biological relevance. Indeed, endogenous viral elements, regardless of the pathogenic potential of their ancestral viruses, are increasingly understood to contribute to host biology, including the potential modulation of antiviral immune responses.

When assessing the potential immune role of NIRVS in Bombus species, we searched for exogenous viruses that shared a 15nt contiguous sequence identical to intrinsic regions in integrated elements. The choice for this criterion is based on how the piRNA pathway functions in antiviral immunity, providing initial evidence of a possible role for these integrations [58]. We included sRNA sequencing analysis to further understand this possibility, but there were very few publicly available libraries, and our results were inconclusive. We only detected a small portion of small RNAs mapping to PASCUORUM_EVE2 from library SRR29032815. The absolute number of reads was very modest (less than 20), and even though their molecular profile displayed some canonical hallmarks of primary piRNAs (such as the size of 27–29 nt and a strong uracil bias at the 5' position), they lacked ping-pong signatures, which are robust hallmarks for secondary piRNAs. The absence of secondary piRNAs might be related to the inexistence of the target transcript, which would enable the ping-pong cycle. We still consider the possibility that secondary piRNA reads could be underrepresented due to the small amount of germline tissue compared to other tissues in the whole bees, since all samples for the screened libraries were from whole individuals.

Apart from antiviral immunity, EVEs may be conserved or exaptated to perform different roles within the cell [2]. Many of our findings suggest that the integrations do not retain fully conserved protein domains or do not show similarity with known viral domains. Also, some viral lineages frequently undergo modular gene exchange. Such events can produce genomes composed of domains with distinct evolutionary origins, which in turn affects how individual modules are retained or lost during endogenization. The presence of EVEs encoding only capsid genes, or only replicase domains, could reflect ancestral recombination events.

The virus families Partitiviridae and Totiviridae (dsRNA viruses), along with Virgaviridae (a + ssRNA virus family within the order Martellivirales), are recognized for infecting a broad range of hosts, including fungi, plants, and protozoa [7678]. The detection of NIRVS in these families across various insect species has been previously documented [79], consistent with our findings in Bombus.

The family Rhabdoviridae and the order Bunyavirales (which includes the family Phasmaviridae) consist primarily of -ssRNA viruses, many of which are known to infect insects [80]. NIRVS derived from these viral taxa are frequently reported in arthropods, where Rhabdoviridae-related integrations, in particular, appear to be notably abundant [55, 79, 8183].

From an evolutionary perspective, the strong correspondence between NIRVS grouping and the Bombus evolutionary history suggests vertical inheritance of many viral sequences from common ancestors prior to species diversification. Groups in which we identified more sequences are probably under similar pressures to conserve those elements, or integrations are more likely to be recent. The same logic applies to groups that harbor few conserved viral sequences, which may have lost them due to evolutionary events. GC content, codon usage, transcription activity, and genomic context analyses were key factors in understanding the intricacies of NIRVS inheritance and conservation, from which we identified patterns of conservation and activity in viral sequences across the analyzed Bombus species. Recently integrated sequences tended to have GC% closer to that of their source virus, while older insertions tend to have GC% similar to that of their host. The same logic applies to codon usage: the endogenous viral sequence might undergo neutral mutations and purifying selection, approximating it to the host sequence nucleotide structure.

We hypothesized that the frequent interactions between plants and bees, and the presence of commensal organisms such as gut microbiota and parasites, might enable the integration of different viruses into bee genomes. Most recently, evidence of arthropod-infecting viruses belonging to those families, especially in dipterans and ticks, has been reported (reviewed by [84]). It indicates the need for further investigation of insect viromes to better understand their diversity and the mechanisms underlying cross-species transmission. However, it is known that insects may vector a variety of plant viruses, some of which promote propagative-persistent infections in which they can fully replicate in this intermediate host, even in germline cells (reviewed by [85]). It is important to consider that the ubiquitous presence of arthropods across global biomes and ecological niches, the high diversity of arthropod viruses, and the reliance on solely innate antiviral responses all contribute to the complexity of arthropod-virus interactions.

Non-retroviral EVEs in Bombus species were assessed in two previous works. The first [61] presented data regarding the presence of EVEs in the genomes of 48 arthropods, including A. mellifera and B. terrestris. Although their findings support the hypothesis of the present study, they reported a different number of EVEs [51] for B. terrestris than those found here [37]. This discrepancy may be explained by the use of a different genome assembly, different database versions, or differing selection criteria in the methodology. They selected EVEs smaller than 100 bp and used a less stringent e-value for alignment searches. Additionally, sequences similar to single-stranded DNA viral sequences were included. A second study also assessed the presence of endogenous viral elements in various insects but focused on virga/nege-like elements [86]. Their findings align with ours, considering the presence of Martellivirales-related endogenous elements and a similar presence of various conserved domains. Both works included only a few species of Bombus bees and had access to less robust libraries in comparison to our work, due to the frequent updates of all databases in GenBank up until now.

Our findings suggest that bumblebees may have historically interacted with a diverse array of novel or highly divergent RNA viruses. Understanding of the bumblebee-associated ancient virome is essential for finding new solutions for modern problems and unveils the limitations of existing reference databases in accurately capturing the full extent of viral diversity in ecologically important insects. However, our study has some limitations. For instance, the identification of NIRVS relies primarily on sequence homology searches into reference genomes (not always assembled in chromosomes) and lacks experimental validation. Only the most conserved regions are detectable at the protein level due to the high divergence of viral sequences. In addition, our methodology excludes DNA viruses, is limited to publicly available genomes and annotations, and does not include laboratory validation to assess potential immune-related functions.

Conclusion

This study revealed that NIRVS in Bombus genomes vary significantly in sequence composition, functional domains, and taxonomic origin, with a great number of elements shared across species, probably due to inheritance from common ancestors. Interestingly, the elements detected were related to viral groups with currently limited recognized pathogenicity in bumblebees. Nonetheless, the biological relevance of these NIRVS should not be underestimated. Endogenous viral elements are increasingly acknowledged for their potential roles in host genome evolution, immune regulation, and antiviral defense. Collectively, these findings emphasize that NIRVS are not merely remnants of past infections but may also be conserved in many species for potential genomic roles.

Supplementary Information

13100_2026_393_MOESM1_ESM.xlsx (266.6KB, xlsx)

Additional file 1. Genomes Metadata, SRA libraries, Data Overview, NIRVS per bee species, NIRVS per virus family, 15nt alignment data

13100_2026_393_MOESM2_ESM.pdf (11.3MB, pdf)

Additional file 2. Figure 1. Correlation between genome size and EVE abundance. Scatterplot displaying the relationship between total genome size (Mb) and the number of identified Endogenous Viral Elements. A linear regression analysis is represented by a blue line. Each red dot represents a distinct genome assembly. Figure 2. Relative abundance of distributions of viral domains per viral taxa. Blastx protein hits are displayed on the left, and conserved domains are shown on the right. Total abundance is shown. Figure 3. Correlation between sequence size and presence of conserved domains. Each Blastx protein hit type was screened for conserved domains. P-value for correlation significance between sequence size and domain distribution is shown. Each blastx hit is colored according to the respective viral taxa. Figure 4. Phylogenetic tree of the Bombus genus. Maximum Likelihood tree inferred from a four-gene dataset (16S, EF1-α, Opsin, and ArgK) covering 211 species. The phylogeny was reconstructed using IQ-TREE v2.4.0 with best-fit substitution models (ModelFinder) and 1000 bootstrap replicates. Colored vertical bars denote subgeneric classifications: Al, Alpinobombus; Ag, Alpigenobombus; Bi, Bombias; Bo, Bombus; Cu, Cullumanobombus; Kl, Kallobombus; Md, Mendacibombus; Mg, Megabombus; Ml, Melanobombus; Or, Orientalibombus; Pr, Pyrobombus; Ps, Psithyrus; Sb, Sibiricobombus; St, Subterraneobombus; Th, Thoracobombus. Figure 5. Pairwise structural and sequence similarity of EVE-containing genomic regions of annotated Bombus species. Bubble plot displaying the mean nucleotide identity (color scale) and cumulative coverage (bubble size) between genomic contexts (60 kb concatenated flanking regions) of clustered EVEs. Similarity metrics were calculated using the best high-scoring pair (HSP) from BLASTn. The matrix ordering is determined by hierarchical clustering (Ward.D2) of structural identity matrices generated by ProgressiveMauve. Figure 6. Orthologous insertion of a Dumyat virus-derived Endogenous Viral Element within the skywalker gene across different Bombus subgenera. Genomic windows extending 30 kb upstream and downstream of the viral insertion were extracted to analyze the local genomic context. Gene annotations (NCBI) and repetitive elements (RepeatMasker) are displayed for both sense and antisense strands. The EVEs are consistently located within the GTPase-activating protein skywalker gene, suggesting an orthologous insertion event flanked by conserved cellular genes. The region displays high microsynteny, with conserved flanking genes including cell cycle checkpoint control protein and bcl-2-related ovarian killer protein-like. Figure 7. Orthologous insertions of two Endogenous Viral Elements derived from Xiangshan martelli-like virus 2 (coat protein) and Pedersore virga-like virus across different subgenera. Genomic windows extending 30 kb upstream and downstream of the viral insertion were extracted to analyze the local genomic context. Gene annotations (NCBI) and repetitive elements (RepeatMasker) are displayed for both sense and antisense strands. The EVEs are consistently located in the opposite strand to the Skeletor gene, suggesting an orthologous insertion event flanked by conserved cellular genes. The region displays high microsynteny, with conserved flanking genes including DDB1- and CUL4-associated factor 10 and sulfotransferase-like proteins. Figure 8. Orthologous insertion of a Xiangshan martelli-like virus 2-derived Endogenous Viral Element (Vmethyltransferase domain) across different Bombus subgenera. Genomic windows extending 30 kb upstream and downstream of the viral insertion were extracted to analyze the local genomic context. Gene annotations (NCBI) and repetitive elements (RepeatMasker) are displayed for both sense and antisense strands. The EVEs are consistently located next to the GABA neurotransmitter transporter-1A gene, suggesting an orthologous insertion event flanked by conserved cellular genes. The region displays high microsynteny, with conserved flanking genes including PHD finger protein rhinoceros and calcium binding protein Mo25. Figure 9. Spatial distribution and statistical enrichment of Transposable Elements flanking EVE insertion sites across six annotated Bombus species. The top panels for each species show the frequency of TE classes (DNA, LTR, LINE, SINE) mapped within 10 kb upstream and downstream of EVE loci. TEs were filtered and merged if they were located within 1 kb of each other to reduce fragmentation. In the bottom panels, enrichment of TE density relative to the genomic background was assessed via Fisher’s exact tests in strand-corrected 500-bp bins. The curves track the Odds Ratio at each distance, with red points indicating statistical significance (p < 0.05). Figure 10. Experimental validation of the most widespread viral insertions in Bombus genomes. A) Blastn alignment between virus genomes (Dumyat virus and Xiangshan martelli-like virus 2) and related EVEs. Highlighted subjects were used to design primers. B) Electrophoresis gels for screening the results of PCR analysis with 4 different primers (plus control primer) in three different samples of Bombus spp. In the bottom right corner, the photo shows one of the samples before DNA extraction. C) PCR products (Forward and Reverse) sequenced by Sanger and aligned with respective EVEs. Alignment plots were exported from Geneious. Figure 11. Principal Component Analysis (PCA) of Bombus-associated viruses and Endogenous Viral Elements (EVEs). We based our analysis on codon usage in Bombus terrestris, sequence size, and GC content. The plot shows the distribution of data points along the first two principal components, which capture the major variation in the dataset. Arrows indicate the contribution of each feature to the components, with direction and length reflecting their influence on the variance. Figure 12. T-distributed Stochastic Neighbor Embedding (t-SNE) of Bombus-associated viruses and Endogenous Viral Elements (EVEs). We based our analysis on codon usage in Bombus terrestris, sequence size, and GC content. The plot shows the clustering of data points along the first two principal dimensions. Figure 13. Blastn alignments between assembled reads and EVEs. For each SRA library, reads were mapped (samtools) against EVEs and assembled (trinity). Identity and coverage parameters are stated in columns “pident” and “qcovs” respectively. Figure 14. Mapping profiles and nucleotide signatures of EVE-derived small RNAs in Bombus species. Size distribution of small RNA reads (16–34 nt) mapping to Endogenous Viral Elements identified in B. hortorum, B. lapidarus, and B. impatiens sRNAseq libraries. Histograms show the count of sense (positive y-axis) and antisense (negative y-axis) reads. Figure 15. Mapping profiles and nucleotide signatures of EVE-derived small RNAs in Bombus pascuorum. (A) Size distribution of small RNA reads (16–34 nt) mapping to PASCUORUM_EVE1 and PASCUORUM_EVE2 in B. pascuorum sRNAseq libraries. Histograms show the count of sense (positive y-axis) and antisense (negative y-axis) reads. (B) Sequence logos displaying the nucleotide composition of the 27–29 nt antisense reads identified when mapping the reads from SRR29032815 against PASCUORUM_EVE2, showing 5′-terminal nucleotide probabilities. Figure 16. Maximum likelihood phylogenetic tree (partial RdRp amino acid sequences) of NIRVS identified in Bombus genomes. The phylogenetic reconstruction places the identified NIRVS within the order Martellivirales. The tree was inferred using IQ-TREE (Galaxy Version 2.4.0+galaxy1) with the Q.pfam+F+I+R4 substitution model, selected as the best-fit model by ModelFinder, and visualized with iTOL v7. A thorough optimization strategy was employed, including 200 initial trees, 50 retained top trees, and an exhaustive Nearest Neighbor Interchange (-allnni) search. Nodal support values are indicated as Shimodaira–Hasegawa - approximate likelihood ratio test (%)/Ultrafast Bootstrap (%) based on 1000 replicates for both. The EVEs (colored in red) form a well-supported monophyletic clade (98.2/99) and are positioned as the sister group to the Xiangshan martelli-like viruses 2 and 3. Unclassified Martelli-viruses are depicted as single branches. Viral families are monophyletic and colored accordingly

Acknowledgements

We thank all the members of the Virus Bioinformatics Laboratory—UESC for the fruitful discussions and the Laboratory of Molecular Biodiversity and Omics—LabiÔmicas for the local infrastructure.

Abbreviations

LTR

Long Terminal Repeat

NIRVS

Non-retroviral Integrated RNA Virus Sequences

ORF

Open Reading Frame

PCA

Principal Component Analysis

piRNA

PIWI-interacting RNA

PIWI

P-element Induced WImpy testis

RdRp

RNA-dependent RNA polymerase

sRNA

Small RNA

Authors’ contributions

Conceptualization, E.R.G.R.A. and M.A.C.; methodology, L.B.d.A.C., J.P.N.S., L.Y.M.F. and G.V.P.R; formal analysis, L.B.d.A.C., J.P.N.S., L.Y.M.F., and G.V.P.R.; resources, E.R.G.R.A.; data curation, L.B.d.A.C., J.P.N.S., L.Y.M.F. and G.V.P.R.; writing—original draft preparation, L.B.d.A.C., J.P.N.S., L.Y.M.F., G.V.P.R. and E.R.G.R.A.; writing—review and editing, L.B.d.A.C, J.P.N.S., L.Y.M.F., G.V.P.R, and E.R.G.R.A.; visualization, L.B.d.A.C., J.P.N.S., G.V.P.R., and L.Y.M.F.; supervision, E.R.G.R.A. and M.A.C.; project administration, E.R.G.R.A.; funding acquisition, E.R.G.R.A. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brazil (CAPES)—Finance Code 001. E.R.G.R.A. is a researcher of the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).

Data availability

All datasets analyzed during the current study are available in GenBank, RefSeq and Read Archive (SRA) from the National Center of Biotechnological Information (NCBI). The accession codes are available in Additional File 1.

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Newman L, Duffus ALJ, Lee C. Using the free program MEGA to build phylogenetic trees from molecular data. Am Biol Teacher. 2016;78(7):608–12. [Google Scholar]
  • 2.Blair CD, Olson KE, Bonizzoni M. The widespread occurrence and potential biological roles of endogenous viral elements in insect genomes. Curr Issues Mol Biol. 2019;34(1):13–30. [DOI] [PubMed] [Google Scholar]
  • 3.Aswad A, Katzourakis A. Paleovirology and virally derived immunity. Trends Ecol Evol. 2012;27(11):627–36. [DOI] [PubMed] [Google Scholar]
  • 4.Lavialle C, Cornelis G, Dupressoir A, Esnault C, Heidmann O, Vernochet C, et al. Paleovirology of ‘syncytins’, retroviral env genes exapted for a role in placentation. Philos Transact Royal Society B: Biol Sci. 2013;368(1626):20120507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Lorenzi A, Strand MR, Burke GR, Volkoff AN. Identifying bracovirus and ichnovirus genes involved in virion morphogenesis. Curr Opin Insect Sci. 2022;49:63–70. [DOI] [PubMed] [Google Scholar]
  • 6.Lequime S, Lambrechts L. Discovery of flavivirus-derived endogenous viral elements in Anopheles mosquito genomes supports the existence of Anopheles-associated insect-specific flaviviruses. Virus Evol. 2017;3(1):vew035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Palatini U, Miesen P, Carballar-Lejarazu R, Ometto L, Rizzo E, Tu Z, et al. Comparative genomics shows that viral integrations are abundant and express piRNAs in the arboviral vectors Aedes aegypti and Aedes albopictus. BMC Genomics. 2017;18(1):512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Whitfield ZJ, Dolan PT, Kunitomi M, Tassetto M, Seetin MG, Oh S, et al. The diversity, structure and function of heritable adaptive immunity sequences in the Aedes aegypti genome. Curr Biol. 2017;27(22):3511-3519.e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Aguiar ERGR, de Almeida JPP, Queiroz LR, Oliveira LS, Olmo RP, Faria IJ, et al. A single unidirectional piRNA cluster similar to the flamenco locus is the major source of EVE-derived transcription and small RNAs in Aedes aegypti mosquitoes. RNA. 2020;26(5):581–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Suzuki Y, Baidaliuk A, Miesen P, Frangeul L, Crist AB, Merkling SH, et al. Non-retroviral endogenous viral element limits cognate virus replication in Aedes aegypti ovaries. Curr Biol. 2020;30(18):3495-3506.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ollerton J, Winfree R, Tarrant S. How many flowering plants are pollinated by animals? Oikos. 2011;120(3):321–6. [Google Scholar]
  • 12.Gallai N, Salles JM, Settele J, Vaissière BE. Economic valuation of the vulnerability of world agriculture confronted with pollinator decline. Ecol Econ. 2009;68(3):810–21. [Google Scholar]
  • 13.Calderone NW. Insect pollinated crops, insect pollinators and US agriculture: trend analysis of aggregate data for the period 1992–2009. PLoS ONE. 2012;7(5):e37235 Smagghe G, editor. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Klein AM, Vaissière BE, Cane JH, Steffan-Dewenter I, Cunningham SA, Kremen C, et al. Importance of pollinators in changing landscapes for world crops. Proc R Soc B. 2007;274(1608):303–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ascher S, Pickering J. Discover Life. Bee species guide and world checklist (Hymenoptera: Apoidea: Anthophila). 2020. Available from: http://www.discoverlife.org/mp/20q?guide=Apoidea_species.Cited 2024 Jan 2
  • 16.Garibaldi LA, Steffan-Dewenter I, Winfree R, Aizen MA, Bommarco R, Cunningham SA, et al. Wild pollinators enhance fruit set of crops regardless of honey bee abundance. Science. 2013;339(6127):1608–11. [DOI] [PubMed] [Google Scholar]
  • 17.Potts SG, Imperatriz-Fonseca V, Ngo HT, Aizen MA, Biesmeijer JC, Breeze TD, et al. Safeguarding pollinators and their values to human well-being. Nature. 2016;540(7632):220–9. [DOI] [PubMed] [Google Scholar]
  • 18.Amendt J. Insect decline—a forensic issue? Insects. 2021;12(4):324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kehoe R, Frago E, Sanders D. Cascading extinctions as a hidden driver of insect decline. Ecol Entomol. 2021;46(4):743–56. [Google Scholar]
  • 20.van der Sluijs JP. Insect decline, an emerging global environmental risk. Curr Opin Environ Sustainabil. 2020;1(46):39–42. [Google Scholar]
  • 21.Wagner DL, Grames EM, Forister ML, Berenbaum MR, Stopak D. Insect decline in the anthropocene: death by a thousand cuts. Proc Natl Acad Sci. 2021;118(2):e2023989118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Brutscher LM, McMenamin AJ, Flenniken ML. The buzz about honey bee viruses. PLoS Pathog. 2016;12(8):e1005757 Dutch RE, editor. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Parmentier L, Smagghe G, De Graaf DC, Meeus I. Varroa destructor macula-like virus, lake Sinai virus and other new RNA viruses in wild bumblebee hosts (Bombus pascuorum, Bombus lapidarius and Bombus pratorum). J Invertebr Pathol. 2016;134:6–11. [DOI] [PubMed] [Google Scholar]
  • 24.McMenamin AJ, Flenniken ML. Recently identified bee viruses and their impact on bee pollinators. Curr Opin Insect Sci. 2018;26:120–9. [DOI] [PubMed] [Google Scholar]
  • 25.Genersch E, Yue C, Fries I, De Miranda JR. Detection of deformed wing virus, a honey bee viral pathogen, in bumble bees (Bombus terrestris and Bombus pascuorum) with wing deformities. J Invertebr Pathol. 2006;91(1):61–3. [DOI] [PubMed] [Google Scholar]
  • 26.Meeus I, De Miranda JR, De Graaf DC, Wäckers F, Smagghe G. Effect of oral infection with Kashmir bee virus and Israeli acute paralysis virus on bumblebee (Bombus terrestris) reproductive success. J Invertebr Pathol. 2014;121:64–9. [DOI] [PubMed] [Google Scholar]
  • 27.Benton TG, Vickery JA, Wilson JD. Farmland biodiversity: is habitat heterogeneity the key? Trends Ecol Evol. 2003;18(4):182–8. [Google Scholar]
  • 28.Cameron SA, Lozier JD, Strange JP, Koch JB, Cordes N, Solter LF, et al. Patterns of widespread decline in North American bumble bees. Proc Natl Acad Sci USA. 2011;108(2):662–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Colla SR, Packer L. Evidence for decline in eastern North American bumblebees (Hymenoptera: Apidae), with special focus on Bombus affinis Cresson. Biodivers Conserv. 2008;17(6):1379–91. [Google Scholar]
  • 30.Goulson D, Lye GC, Darvill B. Decline and conservation of bumble bees. Annu Rev Entomol. 2008;53(1):191–208. [DOI] [PubMed] [Google Scholar]
  • 31.Goulson D, Nicholls E, Botías C, Rotheray EL. Bee declines driven by combined stress from parasites, pesticides, and lack of flowers. Science. 2015;347(6229):1255957. [DOI] [PubMed] [Google Scholar]
  • 32.Grixti JC, Wong LT, Cameron SA, Favret C. Decline of bumble bees (Bombus) in the North American Midwest. Biol Conserv. 2009;142(1):75–84. [Google Scholar]
  • 33.Kerr JT, Pindar A, Galpern P, Packer L, Potts SG, Roberts SM, et al. Climate change impacts on bumblebees converge across continents. Science. 2015;349(6244):177–80. [DOI] [PubMed] [Google Scholar]
  • 34.Losey JE, Vaughan M. The economic value of ecological services provided by insects. Bioscience. 2006;56(4):311. [Google Scholar]
  • 35.Lozier JD, Strange JP, Stewart IJ, Cameron SA. Patterns of range-wide genetic variation in six North American bumble bee (Apidae: Bombus) species: population genetics of North American Bombus. Mol Ecol. 2011;20(23):4870–88. [DOI] [PubMed] [Google Scholar]
  • 36.Ogilvie JE, Forrest JR. Interactions between bee foraging and floral resource phenology shape bee populations and communities. Curr Opin Insect Sci. 2017;21:75–82. [DOI] [PubMed] [Google Scholar]
  • 37.Persson AS, Smith HG. Seasonal persistence of bumblebee populations is affected by landscape context. Agric Ecosyst Environ. 2013;165:201–9. [Google Scholar]
  • 38.Rasmont P, Iserbyt S. The bumblebees scarcity syndrome: are heat waves leading to local extinctions of bumblebees (Hymenoptera: Apidae: Bombus? Annales de la Société entomologique de France (NS). 2012;48(3–4):275–80. [Google Scholar]
  • 39.Soroye P, Newbold T, Kerr J. Climate change contributes to widespread declines among bumble bees across continents. Science. 2020;367(6478):685–8. [DOI] [PubMed] [Google Scholar]
  • 40.Switzer CM, Combes SA. Bumblebee sonication behavior changes with plant species and environmental conditions. Apidologie. 2017;48(2):223–33. [Google Scholar]
  • 41.Goulson D. Bumblebees as pollinators. In: Goulson D, editor. Bumblebees: Behaviour, ecology, and conservation. 2nd ed. Oxford: Oxford University Press; 2010. p. 161–176.
  • 42.Rice P, Longden I, Bleasby A. EMBOSS: the European molecular biology open software suite. Trends Genet. 2000;16(6):276–7. [DOI] [PubMed] [Google Scholar]
  • 43.Buchfink B, Reuter K, Drost HG. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods. 2021;18(4):366–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9. [DOI] [PubMed] [Google Scholar]
  • 45.Marchler-Bauer A, Bo Y, Han L, He J, Lanczycki CJ, Lu S, et al. CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 2017;45(D1):D200–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Blum M, Chang HY, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 2021;49(D1):D344–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Gainetdinov I, Vega-Badillo J, Cecchini K, Bagci A, Colpan C, De D, et al. Relaxed targeting rules help PIWI proteins silence transposons. Nature. 2023;619(7969):394–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Dixon P. VEGAN, a package of R functions for community ecology. J Vegetation Science. 2003;14(6):927–30. [Google Scholar]
  • 49.Chao H, Li Z, Chen D, Chen M. iSeq: an integrated tool to fetch public sequencing data. Bioinformatics. 2024;40(11):btae641 Wren J, editor. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884-90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Bushmanova E, Antipov D, Lapidus A, Prjibelski AD. rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data. GigaScience. 2019;8(9):giz100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Maurer-Alcalá XX, Kim E. TIdeS: a comprehensive framework for accurate open reading frame identification and classification in eukaryotic transcriptomes. Genome Biol Evol. 2024;16(12):evae252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017;14(4):417–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.ter Horst AM, Nigg JC, Dekker FM, Falk BW. Endogenous viral elements are widespread in arthropod genomes and commonly give rise to PIWI-Interacting RNAs. J Virol 2019;93(6):10.1128/jvi.02124-18. [DOI] [PMC free article] [PubMed]
  • 56.Chen S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta. 2023;2(2):e107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Anzelon TA, Chowdhury S, Hughes SM, Xiao Y, Lander GC, MacRae IJ. Structural basis for piRNA targeting. Nature. 2021;597(7875):285–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Cameron SA, Hines HM, Williams PH. A comprehensive phylogeny of the bumble bees (Bombus): BUMBLE BEE PHYLOGENY. Biol J Linn Soc. 2007;91(1):161–88. [Google Scholar]
  • 60.Kumar S, Stecher G, Suleski M, Sanderford M, Sharma S, Tamura K. MEGA12: molecular evolutionary genetic analysis version 12 for adaptive and green computing. Mol Biol Evol. 2024;41(12):msae263 Battistuzzi FU, editor. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Kalyaanamoorthy S, Minh BQ, Wong TKF, Von Haeseler A, Jermiin LS. Modelfinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017;14(6):587–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, Von Haeseler A, et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37(5):1530–4 Teeling E, editor. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Letunic I, Bork P. Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res. 2024;52(W1):W78-82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Darling AE, Mau B, Perna NT. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE. 2010;5(6):e11147 Stajich JE, editor. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA. 2020;117(17):9451–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Chen N. Using repeat masker to identify repetitive elements in genomic sequences. CP in Bioinformatics. 2004;5(1). 10.1002/0471250953.bi0410s05. Cited 2026 Jan 18
  • 68.Lê S, Josse J, Husson F. FactoMineR : An R Package for Multivariate Analysis. J Stat Soft. 2008;25(1). Available from: http://www.jstatsoft.org/v25/i01/. Cited 2025 May 30
  • 69.Kawakita A, Sota T, Ascher JS, Ito M, Tanaka H, Kato M. Evolution and phylogenetic utility of alignment gaps within intron sequences of three nuclear genes in bumble bees (Bombus). Mol Biol Evol. 2003;20(1):87–92. [DOI] [PubMed] [Google Scholar]
  • 70.Vereecken NJ. A phylogenetic approach to conservation prioritization for Europe’s bumblebees (Hymenoptera: Apidae: Bombus ). Biol Cons. 2017;206:21–30. [Google Scholar]
  • 71.Gonçalves LT, Françoso E, Deprá M. Mitochondrial phylogenomics of bumblebees, Bombus (Hymenoptera: Apidae): a tale of structural variation, shifts in selection constraints, and tree discordance. Zool J Linnean Soc. 2024;202(1):zlad178. [Google Scholar]
  • 72.Ozata DM, Gainetdinov I, Zoch A, O’Carroll D, Zamore PD. Piwi-interacting RNAs: small RNAs with big functions. Nat Rev Genet. 2019;20(2):89–108. [DOI] [PubMed] [Google Scholar]
  • 73.Gilbert C, Belliardo C. The diversity of endogenous viral elements in insects. Curr Opin Insect Sci. 2022;49:48–55. [DOI] [PubMed] [Google Scholar]
  • 74.Chen YP, Siede R. Honey Bee Viruses. In: Advances in virus research. Elsevier; 2007. p. 33–80. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0065352707700027. Cited 2025 Jan 12 [DOI] [PubMed]
  • 75.Grozinger CM, Flenniken ML. Bee viruses: ecology, pathogenicity, and impacts. Annu Rev Entomol. 2019;64(1):205–26. [DOI] [PubMed] [Google Scholar]
  • 76.Adams MJ, Adkins S, Bragard C, Gilmer D, Li D, MacFarlane SA, et al. ICTV Virus taxonomy profile: Virgaviridae. J Gen Virol. 2017;98(8):1999–2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Ibañez-Escribano A, Gomez-Muñoz MT, Mateo M, Fonseca-Berzal C, Gomez-Lucia E, Perez RG, et al. Microbial Matryoshka: addressing the relationship between pathogenic flagellated protozoans and their RNA Viral Endosymbionts (Family Totiviridae). Vet Sci. 2024;11(7):321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Vainio EJ, Chiba S, Ghabrial SA, Maiss E, Roossinck M, Sabanadzovic S, et al. ICTV Virus taxonomy profile: Partitiviridae. J Gen Virol. 2018;99(1):17–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Hernández-Pelegrín L, Ros VID, Herrero S, Crava CM. Non-retroviral endogenous viral elements in tephritid fruit flies reveal former viral infections not related to known circulating viruses. Microb Ecol. 2024;87(1):7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Walker PJ, Freitas-Astúa J, Bejerman N, Blasdell KR, Breyta R, Dietzgen RG, et al. ICTV Virus Taxonomy Profile: Rhabdoviridae 2022: This article is part of the ICTV Virus Taxonomy Profiles collection. J Gen Virol 103(6). 10.1099/jgv.0.001689. Cited 2025 May 29 [DOI] [PMC free article] [PubMed]
  • 81.Cui J, Holmes EC. Endogenous RNA viruses of plants in insect genomes. Virology. 2012;427(2):77–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Geisler C, Jarvis DL. Rhabdovirus-like endogenous viral elements in the genome of Spodoptera frugiperda insect cells are actively transcribed: implications for adventitious virus detection. Biologicals. 2016;44(4):219–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Katzourakis A, Gifford RJ. Endogenous Viral Elements in Animal Genomes. PLoS Genet. 2010;6(11):e1001191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Li C, Holmes EC, Shi W. The diversity, pathogenic spectrum, and ecological significance of arthropod viruses. Trends Microbiol. 2025;33(8):826–38. [DOI] [PubMed]
  • 85.Borah A, Koundal S, Bhausaheb MA, Rana M, Srivastava S. Mechanism of arthropod-mediated transmission of plant viruses - a review. J Adv Zool. 2023;44(S6):38–52. [Google Scholar]
  • 86.Kondo H, Chiba S, Maruyama K, Andika IB, Suzuki N. A novel insect-infecting virga/nege-like virus group and its pervasive endogenization into insect genomes. Virus Res. 2019;262:37–47. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

13100_2026_393_MOESM1_ESM.xlsx (266.6KB, xlsx)

Additional file 1. Genomes Metadata, SRA libraries, Data Overview, NIRVS per bee species, NIRVS per virus family, 15nt alignment data

13100_2026_393_MOESM2_ESM.pdf (11.3MB, pdf)

Additional file 2. Figure 1. Correlation between genome size and EVE abundance. Scatterplot displaying the relationship between total genome size (Mb) and the number of identified Endogenous Viral Elements. A linear regression analysis is represented by a blue line. Each red dot represents a distinct genome assembly. Figure 2. Relative abundance of distributions of viral domains per viral taxa. Blastx protein hits are displayed on the left, and conserved domains are shown on the right. Total abundance is shown. Figure 3. Correlation between sequence size and presence of conserved domains. Each Blastx protein hit type was screened for conserved domains. P-value for correlation significance between sequence size and domain distribution is shown. Each blastx hit is colored according to the respective viral taxa. Figure 4. Phylogenetic tree of the Bombus genus. Maximum Likelihood tree inferred from a four-gene dataset (16S, EF1-α, Opsin, and ArgK) covering 211 species. The phylogeny was reconstructed using IQ-TREE v2.4.0 with best-fit substitution models (ModelFinder) and 1000 bootstrap replicates. Colored vertical bars denote subgeneric classifications: Al, Alpinobombus; Ag, Alpigenobombus; Bi, Bombias; Bo, Bombus; Cu, Cullumanobombus; Kl, Kallobombus; Md, Mendacibombus; Mg, Megabombus; Ml, Melanobombus; Or, Orientalibombus; Pr, Pyrobombus; Ps, Psithyrus; Sb, Sibiricobombus; St, Subterraneobombus; Th, Thoracobombus. Figure 5. Pairwise structural and sequence similarity of EVE-containing genomic regions of annotated Bombus species. Bubble plot displaying the mean nucleotide identity (color scale) and cumulative coverage (bubble size) between genomic contexts (60 kb concatenated flanking regions) of clustered EVEs. Similarity metrics were calculated using the best high-scoring pair (HSP) from BLASTn. The matrix ordering is determined by hierarchical clustering (Ward.D2) of structural identity matrices generated by ProgressiveMauve. Figure 6. Orthologous insertion of a Dumyat virus-derived Endogenous Viral Element within the skywalker gene across different Bombus subgenera. Genomic windows extending 30 kb upstream and downstream of the viral insertion were extracted to analyze the local genomic context. Gene annotations (NCBI) and repetitive elements (RepeatMasker) are displayed for both sense and antisense strands. The EVEs are consistently located within the GTPase-activating protein skywalker gene, suggesting an orthologous insertion event flanked by conserved cellular genes. The region displays high microsynteny, with conserved flanking genes including cell cycle checkpoint control protein and bcl-2-related ovarian killer protein-like. Figure 7. Orthologous insertions of two Endogenous Viral Elements derived from Xiangshan martelli-like virus 2 (coat protein) and Pedersore virga-like virus across different subgenera. Genomic windows extending 30 kb upstream and downstream of the viral insertion were extracted to analyze the local genomic context. Gene annotations (NCBI) and repetitive elements (RepeatMasker) are displayed for both sense and antisense strands. The EVEs are consistently located in the opposite strand to the Skeletor gene, suggesting an orthologous insertion event flanked by conserved cellular genes. The region displays high microsynteny, with conserved flanking genes including DDB1- and CUL4-associated factor 10 and sulfotransferase-like proteins. Figure 8. Orthologous insertion of a Xiangshan martelli-like virus 2-derived Endogenous Viral Element (Vmethyltransferase domain) across different Bombus subgenera. Genomic windows extending 30 kb upstream and downstream of the viral insertion were extracted to analyze the local genomic context. Gene annotations (NCBI) and repetitive elements (RepeatMasker) are displayed for both sense and antisense strands. The EVEs are consistently located next to the GABA neurotransmitter transporter-1A gene, suggesting an orthologous insertion event flanked by conserved cellular genes. The region displays high microsynteny, with conserved flanking genes including PHD finger protein rhinoceros and calcium binding protein Mo25. Figure 9. Spatial distribution and statistical enrichment of Transposable Elements flanking EVE insertion sites across six annotated Bombus species. The top panels for each species show the frequency of TE classes (DNA, LTR, LINE, SINE) mapped within 10 kb upstream and downstream of EVE loci. TEs were filtered and merged if they were located within 1 kb of each other to reduce fragmentation. In the bottom panels, enrichment of TE density relative to the genomic background was assessed via Fisher’s exact tests in strand-corrected 500-bp bins. The curves track the Odds Ratio at each distance, with red points indicating statistical significance (p < 0.05). Figure 10. Experimental validation of the most widespread viral insertions in Bombus genomes. A) Blastn alignment between virus genomes (Dumyat virus and Xiangshan martelli-like virus 2) and related EVEs. Highlighted subjects were used to design primers. B) Electrophoresis gels for screening the results of PCR analysis with 4 different primers (plus control primer) in three different samples of Bombus spp. In the bottom right corner, the photo shows one of the samples before DNA extraction. C) PCR products (Forward and Reverse) sequenced by Sanger and aligned with respective EVEs. Alignment plots were exported from Geneious. Figure 11. Principal Component Analysis (PCA) of Bombus-associated viruses and Endogenous Viral Elements (EVEs). We based our analysis on codon usage in Bombus terrestris, sequence size, and GC content. The plot shows the distribution of data points along the first two principal components, which capture the major variation in the dataset. Arrows indicate the contribution of each feature to the components, with direction and length reflecting their influence on the variance. Figure 12. T-distributed Stochastic Neighbor Embedding (t-SNE) of Bombus-associated viruses and Endogenous Viral Elements (EVEs). We based our analysis on codon usage in Bombus terrestris, sequence size, and GC content. The plot shows the clustering of data points along the first two principal dimensions. Figure 13. Blastn alignments between assembled reads and EVEs. For each SRA library, reads were mapped (samtools) against EVEs and assembled (trinity). Identity and coverage parameters are stated in columns “pident” and “qcovs” respectively. Figure 14. Mapping profiles and nucleotide signatures of EVE-derived small RNAs in Bombus species. Size distribution of small RNA reads (16–34 nt) mapping to Endogenous Viral Elements identified in B. hortorum, B. lapidarus, and B. impatiens sRNAseq libraries. Histograms show the count of sense (positive y-axis) and antisense (negative y-axis) reads. Figure 15. Mapping profiles and nucleotide signatures of EVE-derived small RNAs in Bombus pascuorum. (A) Size distribution of small RNA reads (16–34 nt) mapping to PASCUORUM_EVE1 and PASCUORUM_EVE2 in B. pascuorum sRNAseq libraries. Histograms show the count of sense (positive y-axis) and antisense (negative y-axis) reads. (B) Sequence logos displaying the nucleotide composition of the 27–29 nt antisense reads identified when mapping the reads from SRR29032815 against PASCUORUM_EVE2, showing 5′-terminal nucleotide probabilities. Figure 16. Maximum likelihood phylogenetic tree (partial RdRp amino acid sequences) of NIRVS identified in Bombus genomes. The phylogenetic reconstruction places the identified NIRVS within the order Martellivirales. The tree was inferred using IQ-TREE (Galaxy Version 2.4.0+galaxy1) with the Q.pfam+F+I+R4 substitution model, selected as the best-fit model by ModelFinder, and visualized with iTOL v7. A thorough optimization strategy was employed, including 200 initial trees, 50 retained top trees, and an exhaustive Nearest Neighbor Interchange (-allnni) search. Nodal support values are indicated as Shimodaira–Hasegawa - approximate likelihood ratio test (%)/Ultrafast Bootstrap (%) based on 1000 replicates for both. The EVEs (colored in red) form a well-supported monophyletic clade (98.2/99) and are positioned as the sister group to the Xiangshan martelli-like viruses 2 and 3. Unclassified Martelli-viruses are depicted as single branches. Viral families are monophyletic and colored accordingly

Data Availability Statement

All datasets analyzed during the current study are available in GenBank, RefSeq and Read Archive (SRA) from the National Center of Biotechnological Information (NCBI). The accession codes are available in Additional File 1.


Articles from Mobile DNA are provided here courtesy of BMC

RESOURCES