Abstract
Honey’s DNA mixture originates from various organisms like plants, arthropods, fungi, bacteria, and viruses. Conventional methods like melissopalynological analysis and targeted honey DNA metabarcoding offer a limited view of honey’s biological composition. We conducted a honey bulk DNA metagenomic analysis to characterize the honey’s taxonomic composition and identify honey bee-related pathogens and parasites based on 266 Estonian and 103 foreign honey samples. 70.4% of the DNA in Estonian honey was derived from green plant families like Brassicaceae, Rosaceae, Fabaceae, and Pinaceae. Geographical distribution analysis revealed distinct botanical compositions between Estonian mainland and island samples. The bacterial family Lactobacillaceae was prevalent overall, reflecting the honey bee microbiota in honey. We detected 12 honey bee pathogens and parasites, including Paenibacillus larvae, Nosema ceranae, Varroa destructor, and Aethina tumida. In conclusion, the study underscores the potential of bulk DNA-based non-targeted metagenomic approaches for monitoring honey bee health, environment, and honey composition, origin, and authenticity.
Subject terms: Metagenomics, Pathogens, Food microbiology, Plant sciences, Agriculture
Introduction
Honey bees are considered effective large-scale environmental monitors due to their large-scale foraging activities. Their hive products, especially honey, provide a snapshot of the honey bee and honey production environment, containing nectar and pollen DNA from various plant species and DNA sequences from arthropods, fungi, bacteria, and viruses1,2. Previous studies focusing on North European honey biological composition based on DNA analysis have identified predominant floral sources such as Brassica, Trifolium, Malus, Prunus, Fragaria, Medicago, Populus, and Solanum3,4, that are widely spread plant genera also in Estonian nature. Apis mellifera, as anticipated, is the most commonly detected arthropod species in honey DNA analyses1,3. Additionally, DNA from several other arthropods from honey bee foraging environments, like plant-sucking and honeydew-producing insect aphids from the order Hemiptera, have been detected not only from honeydew honey but also from blossom honey5. From DNA viruses, mainly Apis mellifera filamentous virus (AmFV) has also been identified within honey DNA, which is known to be a ubiquitous dsDNA virus that affects many apiaries throughout Europe and can have mild pathogenetic effects on honey bees6. However, most pathogenic viruses affecting honey bees are RNA viruses, requiring RNA-based screening methods for detection7. Also, fungi, mostly yeasts, that are known to tolerate high sugar concentration and recognized for their roles in food and beverage production as fermentative agents, such as species from Zygosaccharomyces, and fungal pathogens affecting insects or plants, such as Metarhizium spp., Aspergillus spp., Nosema (Vairimorpha) ceranae, Bettsia alvei, or Alternaria alternata, have also been observed in honey samples1,3. Honey DNA has been found to contain common microorganisms from the honey bee gut microbiota, such as Lactobacillus kunkeei, as well as pathogens affecting honey bees or plants, and ubiquitous bacteria species like Escherichia coli from honey bee foraging and honey production environments1,8. Honey DNA analysis has been used to detect several potential honey bee pathogens, such as Paenibacillus larvae – the causative agent of American Foulbrood, Melissococcus plutonius – the etiological agent of European Foulbrood, and Spiroplasma species – the agent of the spiroplasmosis1,3,9. Screening for pathogens is essential for several reasons. This detection aids in identifying and managing diseases affecting honey bee colonies that are already at an early stage. Colony losses have been linked to pathogens such as Varroa destructor or Nosema ceranae10. Sensitive bulk DNA-based screening allows the detection of infections before visual symptoms appear. For hive health, early detection of pathogens can facilitate timely intervention, potentially saving colonies from devastating diseases. Additionally, understanding the prevalence and spread of pathogens locally and on larger scales can help monitor and manage diseases and invasive honey bee parasites. For example, the recent first report of established Tropilaelaps mercedesae mite populations in Europe provides a basis for further screening11. Another significant colony pest affecting both wild and managed honey bees, the small hive beetle (Aethina tumida), also requires effective monitoring methods for early detection12.
Considering the above, the honey composition reflects the surrounding ecological landscape. It helps detect pathogens, map hive health, describe the honey bee foraging and honey production environment, and describe geographical peculiarities, creating a fingerprint of common regional honey and combating food fraud. Traditional methods, such as melissopalynological analysis or DNA metabarcoding, offer a limited view of honey composition. The melissopalynological analysis is restricted to detecting pollen plants, ignoring nectar and honeydew plants and other organisms, including pathogens, that leave DNA traces in honey13. DNA metabarcoding expands this scope by targeting a broader range of organisms, but it remains a targeted approach, limited to detecting only targeted taxa based on a few successfully preamplified genomic regions14. To use an unbiased approach, we used shotgun metagenomics sequencing of all DNA extracted from the honey sample, which describes the complexity of samples containing thousands of distinct species belonging to different kingdoms or phyla14. We conducted a thorough all-DNA-sequencing-based metagenomic analysis on 266 Estonian centrifugally-extracted honey samples to map the botanical composition of Estonian honey with geographical distribution analysis. Additionally, we included 103 foreign honey samples for comprehensive honey bee-related pathogens and parasites analysis, as not all honey bee-related pathogens and parasites of interest are present in Estonia.
Results
Our study presents a metagenomic analysis of honey bulk DNA to identify its taxonomic composition and monitor honey bee pathogens.
Estonian honey DNA taxonomic composition
In our analysis of the Estonian honey DNA taxonomic composition, we characterized the proportions of bacteria, fungi, animals (Animalia, Metazoa), green plants (Viridiplantae), and viruses (Fig. 1, S1). As anticipated, most of the DNA was derived from green plants (70.4% ± 0.12), with bacteria constituting a secondary component (22.7% ± 0.07).
Fig. 1. Bulk DNA taxonomic composition of Estonian honey.
The symbol “±” represents the standard deviation (s.d.).
Although Viridiplantae dominated the honey composition, the dominant family identified was the bacterial Lactobacillaceae (average relative read abundance of 19.5%) (Fig. 2A). Within the family Lactobacillaceae, the prevalent genus was Apilactobacillus (Fig. S2). Other common bacterial families were Pseudomonadaceae (1.7%) and Erwiniaceae (1.5%). The top five prevalent families of Viridiplantae in Estonian honey were Brassicaceae (19.1%), Rosaceae (13.1%), Fabaceae (12.0%), Pinaceae (9.1%), and Salicaceae (7.4%) (Fig. 2A). As expected, the genera with the highest abundance were Brassica, Picea, Trifolium, Rubus, and Salix (Fig. S2).
Fig. 2. Average relative read abundances of bacteria, fungi, animals (Animalia), and green plants (Plantae) detected in Estonian honey.
A shows the relative read abundances for families with an average abundance greater than 0.2% across all. Panel B shows families with an average abundance of less than or equal to 0.2%.
The prominent Animalia families detected in honey DNA were Hominidae and Apidae, containing among others human (genus Homo), honey bee (genus Apis), and bumblebee (genus Bombus) DNA (Figs. 2A, B, S2). Interestingly, the analysis revealed DNA traces belonging to the mammal families Canidae and Bovidae, albeit in proportions under 0.2% (Fig. 2A, B, S2). Also, DNA from arthropod families containing honey bee or hive parasites or pests can be detected, e.g., Varroidae, Pyralidae, and Vespidae (Fig. 2B).
The predominant fungal families detected in honey DNA were Saccharomycetaceae and Metschnikowiaceae, mainly from yeasts’ genera Zygosaccharomyces, Saccharomyces, and Metschnikowia (Fig. S2). Viral DNA was predominantly from the Apis mellifera filamentous virus (Fig. S2).
Estonian honey bulk DNA botanical composition and geographical distribution of plant genera
We investigated the geographical distribution of plant genera identified in Estonian honey samples based on their average relative sequencing read abundances (Fig. 3). The most widely distributed plant genus in Estonian honey DNA was Brassica, as confirmed by Fig. 2, where Brassicaceae was the most common plant family. While Brassica was common and contributed in most areas, there were exceptions. For example, Brassica was not the dominant plant genus in the samples from Estonian islands like Vilsandi, Ruhnu, Muhu, Kihnu, and Vormsi (Fig. 3). Additionally, the islands’ honey samples had different prevalent plant genera compared to the Estonian mainland honey, such as Frangula, Geum, Rhamnus, and a considerable proportion of other plant genera (categorized as “Other”) (Figs. 3, S3). From north to south, the mainland featured common plant genera such as Brassica, Picea, Trifolium, Salix and Rubus. From east to west, there was an increase in Rhamnus and Frangula prevalence. Other plant genera, such as Aegopodium, Vicia, and Melilotus, were also prevalent in Estonian honey.
Fig. 3. Honey bulk DNA botanical composition and geographical distribution of 16 most prevalent plant genera across Estonia.
Each pie chart represents a county, showing the proportional composition of plant genera identified in honey samples. The color-coded legend indicates the corresponding plant genera.
Honey bee pathogens and parasites in honey bulk DNA
In addition to plants, our methodology also detects DNA traces from animals, fungi and bacteria, including honey bee-related pathogens and parasites. We pre-selected and monitored 20 honey bee pathogens and parasites in Estonian and foreign honey samples (see Methods). Specific DNA sequences from 12 pathogens or parasites (out of 20 monitored) were detected in numerous samples with either laboratory-confirmed pathogens, visually confirmed parasites, beekeeper-suspected issues, or samples without confirmation (Figs. 4, 5, S4). For instance, DNA proof from the bacterium Paenibacillus larvae, which causes honey bee disease American foulbrood, was detected in both two laboratory-confirmed control honey samples, each with a fraction of sequencing reads exceeding 2%. In all the samples where the microsporidian parasite Nosema sp. was detected, including two samples from the hives suspected of nosematosis, only Nosema ceranae was detected but not Nosema apis. DNA traces of Aethina tumida (small hive beetle) were only observed in some foreign samples collected from the honey market, USA, Ghana, the sample labeled as Spanish honey, and two honey samples from blends of EU and non-EU honey. This beetle is known to be not present in Estonia or other European countries. DNA traces from flour mite Acarus siro were detected in one Estonian honey sample. The widespread parasitic honey bee mite (Varroa destructor) and the greater wax moth (Galleria mellonella) were found in many Estonian and foreign apiaries (Figs. 5, S4). Also, DNA sequences from honey bee pathogens or pests like Ascosphaera apis (fungus causing chalkbrood), Melissococcus plutonius (causing European foulbrood), Spiroplasma species (related to spiroplasmosis, May disease), Bettsia alvei (causing pollen mold), and even from Forficula auricularia (insect, European earwig) were detected in numerous Estonian and foreign honey samples (Figs. 4, 5, S4). Our analysis did not detect Tropilaelaps clareae or Tropilaelaps mercedesae in any of the honey samples tested.
Fig. 4. Detection of pathogens and parasites in Estonian and foreign honey samples.
Red triangles indicate laboratory-confirmed pathogens or visually confirmed parasites, while orange triangles represent beekeeper-suspected issues. Grey points (“No information”) depict samples with no information. Honey samples that did not yield any sequencing reads assigned to the pathogens listed in the Methods section are excluded from this figure. A fraction close to 0% signifies a very low proportion of sequencing reads assigned to a particular pathogen, but indicates presence. Notably, certain pathogens were detected exclusively in either Estonian or foreign honey samples. For example, Aethina tumida presence was found only in foreign samples (panel B), whereas Acarus siro was detected in only one Estonian sample (panel A). Sequencing reads originating from Acarapis woodi were not detected in any of the samples analyzed.
Fig. 5. Comparison of the percentage of honey samples from Estonia (n = 266) and foreign origins (n = 103) affected by screened pathogens and parasites.
The figure includes only the 12 detected pathogens out of 20 that were screened. The following pathogens were not detected in any of the analyzed samples: Acarapis woodi, Achroia grisella, Braula coeca, Nosema apis, Oplostomus fuligineus, Senotainia tricuspis, Tropilaelaps clareae, and Tropilaelaps mercedesae.
Discussion
The honey bulk DNA metagenomic analysis provides a more unbiased and less restricted overview of honey’s biological composition compared to the targeted DNA-based approaches. Unlike the DNA metabarcoding method, which targets limited selected gene(s) of the specific organism(s), the honey bulk DNA approach provides a comprehensive overview of honey botanical, microbial, fungal, viral, and animal (including entomological) diversity, including honey bee pathogens and parasites15. We conducted thorough analyses on 266 Estonian and 103 foreign honey samples. Unlike honeycomb-scraped samples, these samples were collected from centrifugally extracted honey, which contains honey DNA from various honeycombs and hives of the apiaries from different locations. The amount of at least one million metagenomic DNA sequencing reads per honey sample enables us to describe the biological environment of the honey bee foraging and honey production. In addition to the DNA only from plant pollen, this method analyses all DNA traces in the sample, including cell-free DNA, which allows us to detect pollen, nectar, and honeydew plants as well as DNA from other organisms.
We demonstrate that green plants (Viridiplantae) constitute the majority of the DNA content in honey, accounting for 70.4% of the total honey DNA composition, with Brassicaceae, Rosaceae, Fabaceae, Pinaceae, and Salicaceae being the most common plant families identified in Estonian honey (Figs. 1, 2). The most common plant genera were expectedly Brassica, Picea, Trifolium, Rubus, and Salix (S2 Fig). These results concord with the observations made for the composition of honey pollen plants in Estonia16, indicating that a significant part of plant DNA in honey may originate from plant pollen.
Interestingly, the most predominant genus detected in honey DNA based on the amount of sequencing reads was not from the plants but the bacterial genus Apilactobacillus, aligning with its known association with honey bee microbiota (Fig. S2), as also shown by the past study3,17. This finding reinforces the idea that honey metagenomics can provide insights beyond botanical composition, extending to pollinator health and microbiome dynamics.
Although in much lower proportions, also other notable bacterial families, like Pseudomonadaceae and Erwiniaceae (1.7% and 1.5%, respectively), were detected, both of which include species known for their roles in various ecological functions and interactions with plants and insects (Fig. 2)18. These findings demonstrate that the taxonomic diversity of plant genera in honey DNA surpasses that of bacterial genera. Moreover, bacterial taxa are often represented by a few dominant families, while plant DNA is more evenly distributed across numerous genera.
As expected, the most common Animalia families detected in honey DNA were the mammal’s family Hominidae and the arthropods’ family Apidae, containing mostly human (genus Homo), honey bee (genus Apis), and bumblebee (genus Bombus) DNA from honey bee foraging and honey production environment. Interestingly, DNA from arthropod families containing common honey bee or hive parasites or pests from the honey bee or honey production environment can be detected in honey DNA, e.g., Varroidae, Pyralidae, and Vespidae (Fig. 2B). The family Vespidae includes species detrimental to honey bees, such as hornets. Although hornet DNA detected in our samples was mainly from the European hornet Vespa crabro, this finding could be valuable when searching methods for monitoring and early detection of the Asian hornet (Vespa velutina), a species known to be devastating for honey bee populations in warmer areas of Europe, but not yet detected in Estonia19. The widespread parasitic honey bee mite (Varroa destructor) from the arthropod family Varroidae and the greater wax moth (Galleria mellonella) from the family Pyralidae were detected both in many Estonian and foreign honey samples (Figs. 4, 5)20,21.
In contrast, the small hive beetle (Aethina tumida), known to cause colony collapses in weak colonies, was only found in five samples, according to the label originating from the US, Spain, Ghana, and two honey blends of undetermined geographical origins from EU and non-EU countries (Fig. 4)22. Importantly, Aethina tumida, known to be absent in Estonia, was not detected in any Estonian honey samples (Fig. 4). Interestingly, DNA traces of the small hive beetle, which is not known to be present in Europe, were detected in a sample labeled as Spanish-origin honey (S4 Fig). Since all foreign honey samples in which Aethina tumida was detected came from the honey market, we cannot confidently determine the true geographical origin of these samples. This approach demonstrates that the honey bulk DNA metagenomic analysis could be a valuable screening tool to monitor agriculturally significant honey bee parasites’ prevalence and geographical distribution.
Our analysis revealed the presence of several other honey bee-related pathogens and parasites (Figs. 4, S4). Notably, the bacteria species Paenibacillus larvae, which is known to cause American foulbrood disease in honey bees, was detected in several samples, including two positive control honey samples from the hives that were confirmed to have American foulbrood disease23. In both control samples, a substantial proportion of sequencing reads were attributed to Paenibacillus larvae (Fig. 4, 8.5% and 2.4%). Also, DNA traces from honey bee pathogens or parasites like Ascosphaera apis (fungus causing chalkbrood), Melissococcus plutonius (causing European foulbrood), Nosema ceranae (microsporidian parasite, causing Nosematosis), Spiroplasma species (related to spiroplasmosis, May disease), Bettsia alvei (causing pollen mold), and even from Forficula auricularia (insect, European earwig) were detected in several Estonian and foreign honey samples. We did not detect DNA of the following honey bee pathogens or parasites in any analyzed Estonian or foreign honey sample: Acarapis woodi (parasitic honey bee mite, causes acarapiosis), Achroia grisella (lesser wax moth), Braula coeca (Braula fly, bee louse), Oplostomus fuligineus (large African hive beetle), Senotainia tricuspis (fly, causes senotainiosis), Tropilaelaps clareae (parasitic honey bee mite, causes tropilaelapsosis) and Tropilaelaps mercedesae (parasitic honey bee mite, causes tropilaelapsosis). This might be because these important honey bee pathogens and parasites species are not widespread worldwide, and none of these have been seen in Estonian apiaries yet. We also did not detect the microsporidian parasite Nosema apis in our samples, even though it has been identified as the primary Nosema species responsible for Nosematosis in Estonia24. Research has shown that N. ceranae has replaced N. apis in many countries, including Italy, Argentina or even northern countries such as Lithuania24–28. Essentially, N. ceranae has spread rapidly worldwide24. Therefore, it is possible that N. ceranae has also replaced N. apis in Estonia by now. Overall, our study highlights the potential of honey bulk DNA analysis as a powerful tool for monitoring honey bee health post-honey collection.
Interestingly, we even detected trace amounts of DNA sequences from mammals, probably originating from domestic or pest animals, possibly due to the contamination from the honey bee foraging, honeycombs’ storage, hove or honey production environment. For example, honey bees often collect brackish water enriched with mineral salts, which could be contaminated by mammal excreta and DNA (Canidae and Bovidae, Fig. 2)29. This result shows the sensitivity of DNA analysis and indicates the possible DNA transfer through honey bees’ diet. This is in accordance with the study that has demonstrated the presence of DNA from plant-sucking insects in honey DNA that produce the sticky excretion collected by honey bees5. DNA contamination from pest animals, such as mice representing <0.2% of sequencing reads, may result from their contact with the honeycombs or the hive environment.
The fungal community was primarily represented by the yeast families Saccharomycetaceae and Metschnikowiaceae, mainly the genera Zygosaccharomyces, Saccharomyces, and Metschnikowia, which are commonly involved in fermentation processes (Fig. S2). The presence of Saccharomycetaceae has also been detected in previous honey-related studies1,3,30. We also detected viral DNA, predominantly from the Apis mellifera filamentous virus (Fig. S2), which is known to infect honey bees but is little to no pathogenic and has been detected in past studies6,31. The difference between our finding of 2.9% sequencing reads assigned to DNA viruses and the 40.2% (±30.0%), as reported by Wirta et al.3, can be explained by differences in the reference database and the number of samples analyzed (Fig. 1). However, most honey bee disease-causing, such as Deformed Wing Virus and many other, are RNA viruses that need RNA-based screening methods7.
We also investigated Estonian honey DNA botanical composition and geographical distribution of the most prevalent plant genera in honey based on 266 Estonian honey samples (Fig. 3). Consistent with previous findings, we also observed frequent occurrences of Brassica, Malus, and Trifolium, aligning with previous records from North European honey (Fig. 3)3,16,32. Interestingly, we observed distinct differences in the plant genera compositions in honey between the Estonian mainland and the islands’ honey samples, with the islands showing a higher proportion of Frangula and species categorized as “Other” compared to the mainland (Figs. 3, S3). In the honey DNA samples from small islands in Estonia, the proportion of Brassica was substantially lower compared to the other regions. This could be explained by the lack of large agricultural fields on small islands in Estonia. Furthermore, the diverse DNA taxonomical composition of honey creates a unique fingerprint for every honey sample, containing not only hundreds of different species of plants but also bacteria, fungi, animals, arthropods and other organisms. Therefore, we hypothesize that metagenomic analysis of all extracted DNA could be utilized to analyze the authenticity and geographical origin of honey (Figs. 2, 3). Furthermore, this geographic variation highlights the utility of honey metagenomics for environmental monitoring. By tracking shifts in floral composition, researchers can assess the impact of land-use changes, urbanization, and climate fluctuations on plant-pollinator interactions. Additionally, our findings demonstrate that honey can serve as a bioindicator of ecosystem health, with potential applications in conservation biology and sustainable agriculture.
Metagenomic analysis of honey DNA presents inherent challenges, primarily because the accuracy of the results heavily relies on the public reference database used for analysis, as also pointed out by other researchers33. If a genus is absent from the database, it can introduce biases and potentially reduce the accuracy of the analysis33. As comprehensive databases for plants are still under development and there is a predominance of complete genome sequences for bacteria and viruses in existing databases, we created a custom Kraken 2 reference database in our study (including partial genome sequences), with an extended number of honey-related plants. Our Kraken 2 reference database was sourced from three main collections: NCBI nt collection, The One Thousand Plants Project, and NCBI’s Sequence Read Archive34–36. This approach enables the detection of an increased number of plants in honey DNA. In addition, the majority of foreign honey samples used in the pathogen analysis in this study were acquired from shops, the contents of the honey jars were not validated, and we had to rely on the label information. However, as we were using foreign honey samples only for pathogen analysis in this study, the accuracy of the label did not affect the proof-of-concept of detecting known pathogens in honey samples.
In conclusion, our metagenomic analysis of honey DNA provided a detailed and comprehensive overview of its biological composition, highlighting its significant diversity of botanical composition, and possibilities to monitor honey bee-related pathogens and parasites. This study mapped the botanical composition of Estonian honey with the geographical distribution of different plant genera in Estonia, and conducted honey bee-related pathogen analysis, underscoring the potential of all DNA sequencing-based metagenomic approaches not only for describing the botanical composition of honey, monitoring honey bee health and apiary environment but also for identifying authenticity and origin of honey by using untargeted analysis of all DNA sequences extracted from honey.
Methods
Honey samples
A total of 264 honey samples were collected from various regions across Estonia to describe the DNA taxonomic composition of Estonian honey (Fig. S1). The samples included a diverse range of honey types, covering both monofloral and polyfloral honeys, as well as honeys from different foraging seasons (spring, summer, and autumn). Additionally, samples were collected from various foraging environments, including urban apiaries, and from honey produced by different honey bee subspecies. Additionally, two positive control samples from the hives with diagnosed American Foulbrood infection caused by Paenibacillus larvae were included, although their specific locations were not disclosed and are therefore included in honey bee pathogen analysis but not in the analysis of Estonian honey DNA botanical composition and geographical distribution. For honey bee pathogen analysis, in addition to the Estonian honey samples, 103 foreign samples were obtained directly from beekeepers, shops, or honey markets (Table S1).
The foreign honey samples analyzed in this study were obtained from beekeepers, shops, and honey markets, with provenance information based solely on label descriptions. As no formal chain of custody was available, we cannot verify the exact production dates, country of origin, or purity of these samples. Given the high prevalence of honey adulteration in the international market, the authenticity of these samples remains uncertain. Due to these limitations, foreign honey samples were included only in the pathogen analysis and were excluded from the botanical composition assessment to avoid potential biases. In contrast, local Estonian samples were carefully collected with verified provenance, ensuring reliable background information for geographic analysis.
All samples were produced during the summers of 2020 to 2022 and collected for analysis between 2020 and 2023. All honey samples were collected from centrifugally extracted honey and not directly from the honeycomb. Centrifugally extracted honey samples contain DNA traces from several honeycombs and several hives in the apiary and provide a more comprehensive DNA taxonomical composition picture of the honey that is sold on the market as well as the honey bees’ foraging, hives’, and honey production environment in an apiary.
DNA extraction and sequencing
Each honey sample was preheated at 40 °C and homogenized by mixing with a clean spoon. 30 g of honey was weighed into a 50 ml centrifuge tube and diluted in 25 ml of preheated MilliQ water. After centrifugation at 4000 rpm, the supernatant was removed, and bulk DNA from the pellet was extracted by the NucleoSpin Food Mini kit (MACHEREY‑NAGEL) and evaluated in 60 µl of elution buffer. For the following library preparation, the DNA was fragmented down to 150-200 bp fragments by Covaris M220 focused-ultrasonicator (Covaris) and concentrated by NucleoSpin Gel and PCR Clean-up kit (MACHEREY‑NAGEL). The quality and quantity of the DNA fragments were assessed on an Agilent 2200 TapeStation (Agilent Technologies) to monitor the concentration and size distribution of honey bulk DNA fragments for the following library preparation (expected size of the fragments was 150-200 bp and concentration at least 1 ng/µl). Illumina-compatible DNA libraries were prepared using the Celvia CC AS in-house developed FOCUS protocol. Briefly, fragmented 25 µl honey bulk DNA (1 ng/µl) was end-repaired and A-tailed by a specific enzymatic mixture. Short double-stranded and index-labeled DNA adapters were ligated to both ends of pre-treated DNA fragments. The full adapter sequence and a sufficient ready-made Illumina-compatible library were ensured by 12-cycle PCR. 36 samples were pooled equimolarly, and the quality and quantity of the pool were assessed on an Agilent 2200 TapeStation (Agilent Technologies). The honey bulk DNA pooled library was sequenced using the Illumina NextSeq 500 instrument (Illumina Inc.) and an 85 bp single-read protocol.
Metagenomic analysis
Sequencing read counts ranged from 1 to 27 million, with a median of 13.7 million reads per sample. Sequencing data analysis was managed with Nextflow (23.09.3-edge) and conducted using the computational resources of the High Performance Computing Center of the University of Tartu37,38. Sample sequencing lanes were concatenated, assessed for quality using SeqKit stats (2.4.0), and subsequently analyzed for taxonomic composition39. No additional pre-processing was applied to the FASTQ files. We also did not characterize the batch effect or did not take that into account in the statistical analysis.
To classify the taxonomic composition of the Estonian honey by assigning taxonomic labels to sequence reads, we utilized the taxonomic sequence classifier Kraken 2 (2.1.3) with a custom reference database40. The minimum hit groups required for classification were set to 3, and the confidence threshold representing a non-probabilistic scoring scheme for taxonomic assignment was set to 0.5. Across 264 Estonian honey samples, the median percentage of classified sequencing reads was 36.5%, ranging from 9.12% to 71.5%, with a standard deviation of 10.1%.
We constructed a comprehensive reference database by combining data from three sources: the NCBI nucleotide (nt) collection, the One Thousand Plants Project (1KP), and the NCBI Sequence Read Archive (SRA)34–36. To ensure the database is highly relevant, we focused on species commonly associated with honey. This process involved two steps: identifying gaps in the coverage of known honey plants and analyzing unclassified DNA reads from honey samples. For the latter, we used computational tools to match these unclassified sequences with entries in the SRA database36. By selecting the most relevant sequences, we enriched the database to better capture the diversity of organisms found in honey. Sequences from the SRA were further processed to ensure high quality (fastp with the parameters cut_front, cut_right, and correction)41. Raw sequencing reads were cleaned and assembled into longer fragments (SPAdes 3.14.1), which were then screened to remove contaminants such as human or bacterial DNA42. This ensured that the final database included only meaningful sequences for honey DNA analysis. To improve efficiency, we reduced redundancy in the dataset by removing duplicate or overlapping sequences within each species.
To characterize the overall taxonomic composition of Estonian honey, we analyzed Kraken 2 (.kreport) output, retaining only reads classified at the domain (D) and kingdom (K) levels. To ensure comparability, taxonomic read counts (Clade_Fragment_Count) were normalized by total sequencing reads per sample. Taxonomic groups were summarized by calculating their mean and standard deviation of relative abundance across all samples, and final proportions were expressed as percentages.
To analyze the taxonomic composition at the family level (F), we processed Kraken 2 output, retaining only classifications at the family rank. Taxonomic read counts were normalized by sequencing depth, and relative abundances were computed as percentages of total reads.
To describe and estimate the abundances of honey bee pathogens and parasites in honey DNA on the species level, we used Bracken (2.8) with the read length set to 80, taxonomic level to species, and threshold for the abundance estimation set to 1043. We analyzed the fraction of total sequencing reads (fraction_total_reads) assigned to 20 pre-selected honey bee-related pathogens and parasites from the .bracken output file. These included: Acarapis woodi, Acarus siro, Achroia grisella, Aethina tumida, Ascosphaera apis, Bettsia alvei, Braula coeca, Forficula auricularia, Galleria mellonella, Melissococcus plutonius, Nosema apis, Nosema ceranae, Oplostomus fuligineus, Paenibacillus larvae, Senotainia tricuspis, Spiroplasma apis, Spiroplasma melliferum, Tropilaelaps clareae, Tropilaelaps mercedesae, and Varroa destructor.
For geographical distribution analysis of plant genera, Kraken 2 output was filtered to retain only genus-level classifications (Rank_Code = “G”) within Streptophyta. The Clade_Fragment_Count column was used as a measure of relative abundance and was normalized against the total read count per sample to account for sequencing depth. To adjust for differences in sample sizes across counties, the data were further normalized by the number of samples per county. The 16 most abundant plant genera were analyzed individually, with the remaining genera grouped under “Other”.
Statistical analysis of the Bracken and Kraken 2 outputs and data visualization was conducted in R (4.4.1)44.
Supplementary information
Acknowledgements
This work was supported by the European Agricultural Fund for Rural Development (Estonian Rural Development Plan 2014-2020, 616219790085).
Author contributions
P.P. analyzed and interpreted honey DNA metagenome sequencing data, visualized the results, wrote the original draft, and reviewed and edited the manuscript. H.T. and M.V. created the initial computational pipeline for sequencing data analysis. M.V. created the custom Kraken 2 reference database. K.R. conceptualized the study and methodology, collected and curated honey samples, optimized laboratory protocols for honey bulk DNA extraction, supervised, reviewed and edited the manuscript, and administered the project. AS conceptualized the study and reviewed and edited the manuscript. KK designed and optimized laboratory protocols for the honey bulk DNA extraction, library preparation, and sequencing, conceptualized the results and visualization of the study, and reviewed and edited the manuscript. All authors read and approved the final manuscript.
Data availability
The data generated during this study are available in the Sequence Read Archive (SRA) repository under BioProject PRJNA1135913 (https://www.ncbi.nlm.nih.gov/sra/PRJNA1135913).
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
The online version contains supplementary material available at 10.1038/s41538-025-00464-1.
References
- 1.Bovo, S. et al. Shotgun metagenomics of honey DNA: Evaluation of a methodological approach to describe a multi-kingdom honey bee derived environmental DNA signature. PLoS One13, e0205575 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wirta, H., Abrego, N., Miller, K., Roslin, T. & Vesterinen, E. DNA traces the origin of honey by identifying plants, bacteria and fungi. Sci. Rep.11, 1–14 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Wirta, H. K., Bahram, M., Miller, K., Roslin, T. & Vesterinen, E. Reconstructing the ecosystem context of a species: Honey-borne DNA reveals the roles of the honeybee. PLoS One17, e0268250 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Varis, A.-L., Helenius, J. & Koivulehto, K. Pollen spectrum of Finnish honey. Agric Food Sci.54, 403–420 (1982). [Google Scholar]
- 5.Utzeri, V. J. et al. Entomological signatures in honey: an environmental DNA metabarcoding approach can disclose information on plant-sucking insects in agricultural and forest landscapes. Sci. Rep.8, 9996 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Papp, M. et al. Apis mellifera filamentous virus from a honey bee gut microbiome survey in Hungary. Sci. Rep.14, 1–8 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Highfield, A. C. et al. Deformed Wing Virus Implicated in Overwintering Honeybee Colony Losses. Appl. Environ. Microbiol75, 7212 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ilyasov, R. A., Boguslavsky, D. V., Ilyasova, A. Y., Sattarov, V. N. & Danilenko, V. N. A multifaceted bioactivity of honey: interactions between bees, plants and microorganisms. Uludag Bee J.24, 356–385 (2024). [Google Scholar]
- 9.Ebeling, J., Knispel, H., Hertlein, G., Fünfhaus, A. & Genersch, E. Biology of Paenibacillus larvae, a deadly pathogen of honey bee larvae. Appl Microbiol. Biotechnol.100, 7387–7395 (2016). [DOI] [PubMed] [Google Scholar]
- 10.Ravoet, J. et al. Comprehensive bee pathogen screening in Belgium reveals Crithidia mellificae as a new contributory factor to winter mortality. PLoS One8, e72443 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Brandorf A., Ivoilova M. M., Yañez O., Neumann P., Soroker V. First report of established mite populations, Tropilaelaps mercedesae, in Europe. 10.1080/00218839.2024.2343976.
- 12.Ribani A., Taurisano V., Utzeri V. J., Fontanesi L. Honey environmental DNA can be used to detect and monitor honey bee pests: development of methods useful to identify Aethina tumida and Galleria mellonella Infestations. Vet. Sci. 2022;9. 10.3390/VETSCI9050213. [DOI] [PMC free article] [PubMed]
- 13.Louveaux, J., Maurizio, A. & Vorwohl, G. Methods of Melissopalynology. Bee World59, 139–157 (1978). [Google Scholar]
- 14.Porter, T. M. & Hajibabaei, M. Scaling up: A guide to high-throughput genomic approaches for biodiversity analysis. Mol. Ecol.27, 313–338 (2018). [DOI] [PubMed] [Google Scholar]
- 15.Taberlet, P., Coissac, E., Pompanon, F., Brochmann, C. & Willerslev, E. Towards next-generation biodiversity assessment using DNA metabarcoding. Mol. Ecol.21, 2045–2050 (2012). [DOI] [PubMed] [Google Scholar]
- 16.Puusepp, L. & Koff, T. Pollen analysis of honey from the Baltic region, Estonia. Grana53, 54–61 (2014). [Google Scholar]
- 17.Brar, G. et al. High abundance of lactobacilli in the gut microbiome of honey bees during winter. Sci. Rep.15, 1–16 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Cambronero-Heinrichs, J. C. et al. Erwiniaceae bacteria play defensive and nutritional roles in two widespread ambrosia beetles. FEMS Microbiol. Ecol.99, 1–11 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Requier, F. et al. Predation of the invasive Asian hornet affects foraging activity and survival probability of honey bees in Western Europe. J. Pest Sci.92, 567–578 (2019). [Google Scholar]
- 20.Rosenkranz, P., Aumeier, P. & Ziegelmann, B. Biology and control of Varroa destructor. J. Invertebr. Pathol.103, S96–S119 (2010). [DOI] [PubMed] [Google Scholar]
- 21.Kwadha, C. A. et al. The biology and control of the greater Wax Moth, Galleria mellonella. Insects8, 61 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Claing, G. et al. Prevalence of pathogens in honey bee colonies and association with clinical signs in southwestern Quebec, Canada. Can. J. Vet. Res.88, 45 (2024). [PMC free article] [PubMed] [Google Scholar]
- 23.Hansen, H. & Brødsgaard, C. J. American foulbrood: a review of its biology, diagnosis and control. Bee World80, 5–23 (1999). [Google Scholar]
- 24.Naudi, S. et al. Variation in the distribution of Nosema species in honeybees (Apis mellifera Linnaeus) between the neighboring countries Estonia and Latvia. Vet. Sci. 20218, 58 (2021). VolPage 588. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Pacini, A. et al. Distribution and prevalence of Nosema apis and N. ceranae in temperate and subtropical eco-regions of Argentina. J. Invertebr. Pathol.141, 34–37 (2016). [DOI] [PubMed] [Google Scholar]
- 26.Papini, R. et al. Prevalence of the microsporidian Nosema ceranae in honeybee (Apis mellifera) apiaries in Central Italy. Saudi J. Biol. Sci.24, 979–982 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Klee, J. et al. Widespread dispersal of the microsporidian Nosema ceranae, an emergent pathogen of the western honey bee, Apis mellifera. J. Invertebr. Pathol.96, 1–10 (2007). [DOI] [PubMed] [Google Scholar]
- 28.Sinpoo, C., Paxton, R. J., Disayathanoowat, T., Krongdang, S. & Chantawannakul, P. Impact of Nosema ceranae and Nosema apis on individual worker bees of the two host species (Apis cerana and Apis mellifera) and regulation of host immune response. J. Insect Physiol.105, 1–8 (2018). [DOI] [PubMed] [Google Scholar]
- 29.Mahefarisoa, K. L., Simon Delso, N., Zaninotto, V., Colin, M. E. & Bonmatin, J. M. The threat of veterinary medicinal products and biocides on pollinators: A One Health perspective. One Heal12, 100237, 10.1016/J.ONEHLT.2021.100237 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Marvin, G. E. The occurrence and characteristics of certain yeasts found in fermented honey. J. Econ. Entomol.21, 363–370 (1928). [Google Scholar]
- 31.Hartmann, U., Forsgren, E., Charrière, J. D., Neumann, P. & Gauthier, L. Dynamics of Apis mellifera Filamentous Virus (AmFV) infections in honey bees and relationships with other parasites. Viruses 20157, 2654–2667 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Salonen, A., Ollikka, T., Grönlund, E., Ruottinen, L. & Julkunen-Tiitto, R. Pollen analyses of honey from Finland. Grana48, 281–289 (2009). [Google Scholar]
- 33.Lu, J. et al. Metagenome analysis using the Kraken software suite. Nat. Protoc.17, 2815–2839 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Wheeler, D. L. GenBank. Nucleic Acids Res.35, D21–D25 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Leebens-Mack, J. H. et al. One thousand plant transcriptomes and the phylogenomics of green plants. Nature574, 679–685 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Kodama, Y. Collaboration on behalf of the INSD, Shumway M, Collaboration on behalf of the INSD, Leinonen R, Collaboration on behalf of the INSD. The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res.40, D54–D56 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Tartu U of. UT Rocket. Published online 2018. 10.23673/PH6N-0144.
- 38.DI Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol.35, 316–319 (2017). [DOI] [PubMed] [Google Scholar]
- 39.Shen, W., Sipos, B. & Zhao, L. SeqKit2: A Swiss army knife for sequence and alignment processing. iMeta3, e191 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol.20, 1–13 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics34, i884–i890 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Prjibelski, A., Antipov, D., Meleshko, D., Lapidus, A. & Korobeynikov, A. Using SPAdes De Novo Assembler. Curr. Protoc. Bioinforma.70, e102, 10.1002/CPBI.102 (2020). [DOI] [PubMed] [Google Scholar]
- 43.Lu J., Breitwieser F. P., Thielen P., Salzberg S. L. Bracken: Estimating species abundance in metagenomics data. PeerJ Comput. Sci. 2017;2017 (1):e104. 10.7717/PEERJ-CS.104/SUPP-5. [DOI] [PMC free article] [PubMed]
- 44.R Core Team. R: A Language and Environment for Statistical Computing. Published online 2024. https://www.r-project.org/.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data generated during this study are available in the Sequence Read Archive (SRA) repository under BioProject PRJNA1135913 (https://www.ncbi.nlm.nih.gov/sra/PRJNA1135913).





