Skip to main content
BMC Ecology and Evolution logoLink to BMC Ecology and Evolution
. 2025 Nov 13;25:121. doi: 10.1186/s12862-025-02429-0

CODABEILLES: a reliable reference library of COI DNA barcodes for French wild bees monitoring (Apoidea: Anthophila)

Mélodie Ollivier 1,2,, Anaïs Marquisseau 1, Eric Dufrêne 3, Rémi Rudelle 4, Rodolphe Rougerie 5, Adrien Perrard 6,7, Magalie Pichon 1; The CODABEILLES Consortium
PMCID: PMC12613473  PMID: 41233723

Abstract

In the Anthropocene, the decline of insect pollinators poses a significant threat to ecosystem services, particularly to wild bee populations essential for plant biodiversity and agricultural productivity. France, with 983 species, hosts one of the most diverse bee faunas in Europe, yet these species face growing pressures from habitat loss, climate change, and intensive agriculture. Addressing this crisis requires robust taxonomic frameworks and efficient species identification methods to support long-term monitoring initiatives such as the European Pollinator Monitoring Scheme, EU-PoMS. DNA barcoding, utilizing the COI-5P gene, has proven effective for species delineation and biodiversity monitoring, particularly in detecting cryptic diversity among genera with large numbers of species such as Andrena, Nomada or Lasioglossum. However, significant gaps remain in reference libraries, particularly for the species from the Mediterranean Basin. To bridge this gap, the CODABEILLES initiative was launched in 2021 to enhance barcode data for the French bee fauna. Initially, only 25% of species had barcodes from French voucher specimens, increasing to 62% when considering voucher specimens from other countries. By 2025, thanks to collaboration with sixteen specialists and institutions, CODABEILLES contributed 1477 reference barcodes, covering approximately 560 species and raising barcode coverage to 82%. When integrating data published under other initiatives over the same period the coverage reaches 94% of the French bee fauna. This dataset significantly enhances species identification accuracy and supports large-scale pollinator monitoring through metabarcoding and environmental DNA approaches. Despite the success of COI-5P barcoding, taxonomic inconsistencies persist, necessitating further integrative research. This study underscores the need for continued collaboration among taxonomists, molecular biologists, and conservationists to refine species classifications and ensure comprehensive reference databases. The improved barcode coverage provided by CODABEILLES paves the way for more accurate DNA-based monitoring of wild bee populations and their ecological interactions, crucial for guiding conservation strategies in the face of ongoing environmental change.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12862-025-02429-0.

Keywords: Hymenoptera, Pollinators, DNA barcoding, Integrative taxonomy, Cytochrome c oxidase subunit I (CO1-5P), Molecular identification, Cryptic diversity

Background

In the Anthropocene, growing evidence of massive declines in insect populations (in abundance, biomass, diversity, or spatial distribution) poses a threat to ecosystem services [1]. Animal pollination, mainly provided by a diversity of wild bee species (Hymenoptera, Apoidea, Anthophila), is essential for the production of 75% of cultivated plants but also for the conservation of wild plant biodiversity [24]. Wild bees are a diverse group among hymenopterans, counting over 20,000 species worldwide [5]. Global bee distribution shows a specific pattern: while tropical environments are renowned for their extraordinary richness in many insect species, relatively xeric areas surpass other regions in terms of bee species richness [6, 7]. The Mediterranean Basin is therefore considered a hotspot of bee diversity [8, 9]. With 983 species, France (Mainland and Corsica) harbours the 4th most diversified fauna in Europe after Greece, Spain and Italy [10, 11]. This accounts for nearly half of Europe’s 2,138 bee species [10].

The loss of insect pollinators is attributed to a combination of global factors that cause significant disruptions to wildlife. Among the well-documented drivers of pollinator collapse are climate change and land use change, which result in the loss of natural habitats. This trend affects both bumblebees [1214] and solitary bees [15] as today’s agricultural landscapes fail to provide adequate nesting sites and floral resources [16, 17]. Intensive agriculture, often linked to the use of pesticides and fertilizers, is another significant stressor contributing to the decline of wild bees [17]. However, the local impacts of agricultural practices can be mitigated at landscape scale by the presence of semi-natural habitats such as woods and grasslands [1820]. The incorporation or restoration of flower-rich areas into farmland is a practical measure that should be enhanced. For instance, the research project RestPoll (Restore Pollinator Habitats in Europe), along with agri-environmental schemes at both national and European levels (French “Plan national en faveur des insectes pollinisateurs et de la pollinisation 2021–2026” and European Pollinator Monitoring Scheme, EU-PoMS, JRC138660), represent recent initiatives aimed at addressing a shared challenge: counteracting or slowing the collapse of pollinator populations in the coming decades.

The Europe-wide long-term monitoring conducted under EU-PoMS relies on standardized sampling methods [21]. Like any monitoring program in insect ecology and conservation studies, it requires two critical elements to ensure effectiveness: 1) a robust taxonomic framework, and 2) an accurate, rapid and cost-effective identification of bee specimens at the species level. Despite significant efforts to advance knowledge of the European bee fauna [22], new species are still frequently described [23, 24] and species boundaries within certain groups remain unresolved. DNA barcoding, which uses the 5’ region of the mitochondrial gene cytochrome c oxidase 1 (COI-5P) as a standardized marker for species delineation, has progressively been adopted as part of an integrative taxonomy approach [25, 26]. It helped in clarifying groups of species being historically controversial and has frequently demonstrated its capacity to uncover cryptic bee species [27, 28]. For instance, European bees of the genus Andrena subgenus Taeniandrena have been investigated, combining DNA barcodes and morphology, revealing an unexpected level of cryptic diversity in a geographically restricted area [29].

For several bee genera, an accurate species-level identification requires advanced taxonomic expertise [30]. Although training opportunities are beginning to emerge, these skills remain confined to a few experts and are even unavailable in many countries. The integration of DNA barcoding with high-throughput sequencing (HTS) technologies presents a cost- and time-efficient solution for scaling up routine pollinator monitoring on a large scale [31]. This approach is referred to as metabarcoding or megabarcoding, depending on how individuals are pooled during the process and the resulting ability to obtain abundance data [32, 33]. The accuracy of a specimen assignment to a species through DNA barcoding heavily depends on the completeness of the DNA barcode reference libraries and on the sufficient characterization of intraspecific variability [34].

In the past decades, only a few national and regional libraries have been established for wild bees, for example in Chile [35], Canada [36], Ireland [37], United Kingdom [38], Luxembourg [39] and more recently Slovenia [40]. Despite global efforts to develop DNA barcode references for wild bee species, the reference library for French species remains incomplete. Significant gaps exist for species from the Mediterranean Basin, barcodes from French vouchers are necessary for a better characterization of intraspecific diversity, and, where barcodes are available, often a very limited number of specimens has been sequenced. In early 2021, when the CODABEILLES initiative was launched to enhance reference data on the French bee fauna, only 25% of species were covered by barcodes obtained from French specimens, mainly collected in the Loire Valley [41]. Coverage increased to 62% when including German bee specimens provided by an extensive work by Schmidt et al. [42]. Very recently, Wood et al. [9] made a considerable contribution to documenting wild bees in the Iberian Peninsula providing useful data for some French species of the Mediterranean Basin. However, given the potential cryptic diversity that may still exist among diverse groups of southern Europe [29, 43], providing barcode references from French vouchers remains highly valuable.

The lack of reference sequences for pollinators poses a serious impediment for biodiversity monitoring that relies on meta- or megabarcoding, as well as for developing non-lethal approaches using environmental DNA (eDNA) detection [44, 45]. This challenge is further compounded by the need for alternative DNA barcode markers in cases where COI is either insufficiently resolutive or unsuitable for analyzing degraded eDNA [4648]. Addressing this issue, a recent study generated reference data for the alternative 16S marker, covering 148 species collected in Occitania, South-West France [49]. However, this highlights the substantial progress still required to achieve comprehensive coverage for the 982 French species of wild bees (indeed, a total of 982 wild bee species are known from France, excluding the domesticated species, Apis mellifera LINNAEUS, 1758; [11]).

In the present article, we provide 1477 reference barcodes covering ~ 560 French bee species. This contribution has increased coverage from 62 to 82% of French species with barcodes and from 25 to 72% with French voucher specimens since the beginning of the project in 2021. This was made possible through the collaboration with sixteen specialists and institutions who opened their collections for tissue material sampling. Relying on experts’ collections provides access to broader taxonomic coverage and highly reliable identifications within a short timeframe [50], addressing the need for rapid database completion. Collaboration with experts was also crucial for the subsequent validation of the data and for addressing inconsistencies that arise between morphological identification and molecular specimen identification. Thus, in this study, our objectives were: 1) to provide COI-5P reference barcode data for the French bee fauna, specifically species from the Mediterranean Basin, 2) to confirm the global suitability of the COI-5P marker for bee species delineation, despite some groups not exhibiting a proper barcode gap, 3) to highlight inconsistencies opening avenues for taxonomic investigations in the coming years, and 4) to propose a rigorous procedure for establishing reliable DNA barcode libraries based on collection specimens.

Construction and content

The reference library herein presented was implemented following the procedures described in Fig. 1, that we suggest as good practices for the establishment of reference barcode libraries. Key steps are detailed below, they involve: 1) Assessment of existing data and identification of gaps, 2) Data acquisition through biological material sampling and sequencing, 3) Sequence analyses to confirm reference barcodes.

Fig. 1.

Fig. 1

Good practices for the establishment of reference barcode libraries

  1. Assessment of existing data and identification of gaps
    • Existing data extraction: In spring 2021, the checklist of the wild bee species of France (Hymenoptera: Apoidea: Anthophila) was obtained from the taxonomic repository (TAXREF version 15.0; [51]) implemented by the National Inventory of Natural Heritage. The domestic honey bee, Apis mellifera, was excluded from the list and for the remaining 967 species, the public database BOLD system was queried to assess the coverage in reference barcode sequences. Sequence data extraction was done using the R package bold. Only a subset of sequences meeting barcode standards were kept to assess species sampling priority (sequences > 500 bp-long, < 1% undefined base and country specification).
    • Prioritising needs: Three priority levels were defined: level 0 for low priority species (species covered by at least one barcode sequence obtained from a French voucher), level 1 for intermediate priority species (species covered by at least one barcode obtain from a voucher outside of France) and level 2 for high priority species (species lacking a reference barcode). The samplings focused on priority level 1 and 2 in order to 1) improve barcode coverage for French wild bee fauna and 2) better characterize intraspecific genetic diversity. We also considered a few low-priority species when collaborators needed molecular validation of a specimen’s identification.
  2. Data acquisition: biological material sampling and sequencing
    • Biological material sampling: Biological material for DNA extraction was exclusively obtained from dry collection specimens, following the procedures recommended in Ollivier et al. [52]. Sixteen contributors from the CODABEILLES Consortium (supplementary material 1), including institutional and personal reference collections, provided tissue material. Targeting the specimens from collections of experts was preferred to the sampling of fresh material. Expert collections often gather male and female specimens of a species, specimens belonging to rare species and specimens sampled through large temporal sequences offering choice for tissue sampling. Beyond the fact that it avoids killing new specimens to find the right species, collaborating with experts allows a wide taxonomic range of species to be covered rapidly with highly reliable identified specimens. To account for intraspecific genetic divergence, three specimens per species coming from different locations were sampled (min = 1, median = 3, max = 13). Figure 2 shows sites where specimens used in this study were collected in France. A minimum amount of tissue, i.e. a single fore or hind leg (depending on the need to access diagnostic characters) was sampled for each specimen, allowing for subsequent re-observation of the voucher specimen. Using flame-decontaminated forceps, tissue samples were placed in 96-well plates (with the exception of well H12, which was left empty to serve as a negative control).
    • Metadata acquisition: Pictures of the voucher specimen were taken and metadata (including taxonomic identification, identifier, sex, and sampling date and locality) were recorded in an Excel spreadsheet (Microsoft Corporation, 2025) for BOLD submission. When a specimen could not be confidently assigned to a species with 100% certainty, we retained a group-level name (e.g., Bombus gr. terrestris, Halictus gr. simplex) and provided taxonomic notes in the metadata explaining the morphological ambiguities relative to existing species descriptions. Metadata and pictures for all samples were uploaded to BOLD, under a dedicated project, called CODAB (publicly accessible, see section Availability of data and materials).
    • Molecular and bioinformatic processes: Complete 96-well plates were sent to the Canadian Centre for DNA Barcoding (CCDB, Guelph, Canada) for sequencing of DNA barcodes. Molecular operations were carried out according to standard procedures for processing high-throughput samples (protocols available on https://ccdb.ca/resources/). For PCR amplification of the full-length (658 bp) DNA barcode region of the COI-5P gene, two methods were employed depending on the age of the voucher specimen. For recently collected specimens, a primer cocktail combining the universal LCO1490/HCO2198 primer pair [53] with the LepF1/LepR1 primer pair [54] was used. This procedure was carried out on 23 plates (2165 specimens) out of 24. One plate (95 specimens), containing decades-old specimens collected between 1905 and 1984 from Robert Delmas’s reference collection, was processed using the primers sets developed by Prosser et al. [55] for degraded DNA. This primer configuration allows generating up to 12 overlapping amplicons of the standard COI-5P marker [56]. PCR fragments, for all samples (2260), were sequenced with single molecule real-time sequencing (SMRT) on the PacBio Sequel platform at the CCDB [57]. SMRT sequences were analysed using mBRAVE (Multiplex Barcoding Research and Visualization Environment,www.mbrave.net) with a standard pipeline involving sequence trimming, quality filtering, de-replication, identification, and OTU generation
  3. Sequence analyses to confirm reference barcodes
    • Removal of non-biological sequences: Following molecular and bioinformatic processing, sequences were made accessible on BOLD under CODAB project where users can obtain information on their quality. Some records were flagged due to 1) the detection of stop codon (45 sequences), and 2) the suspicion of contamination (129 sequences). The contaminated sequences were confirmed with batch ID Engine (using analytical tools of BOLD workbench) and eliminated. Using MEGA11 software [58], sequences flagged for stop codon detection were aligned against sequences from conspecifics, or if not available, from closely related species (same subgenus or genus) in order to point out the possible reading frame shift caused by the presence of indels and leading to the introduction of stop codons. Sequences were edited to correct for these minor errors.
    • Bee sequence analyses: To estimate the relevance of the COI-5P marker for the delimitation of French wild bee species, we implemented classical methods based on sequence divergence. Intraspecific and interspecific variations were assessed with pairwise distances using the Kimura 2-parameter (K2P) distance model [59] and pairwise deletion method. To provide a graphical representation of species divergence, neighbor-joining (NJ) trees were inferred using sequences produced in this study along with sequences already available on BOLD for the same species or genera. To confirm the existence and extent of the barcode gap in French wild bee species, we plotted the maximum intraspecific distances against the interspecific (nearest neighbour) distances. A histogram was generated to illustrate the distribution of normalized divergence at both species and genus level. These analyses were conducted using sequences longer than 400 bp, with less than 1% unidentified bases and for which species identification was not doubtful. For sequences meeting barcode standards (sequences > 500 bp-long, < 1% undefined base) a Barcode Index Number (BIN) was assigned automatically by BOLD [60]. BINs are sequence clusters that often correspond to a biological species and help to assess the congruence between molecular clusters and morphological identifications. This combination of approaches was necessary to ensure the reliability of sequences as reference barcodes and to highlight some inconsistencies that led experts to re-identify some of the voucher specimens. Several scenarios were observed and are discussed in the next section: (1) congruence between morphological and molecular species-level identification, and (2) incongruence between morphological and molecular results, which may be due to putative misidentifications of the voucher specimen, specimen belonging to a known species complex, unstable taxonomy, or suspected cryptic diversity. Records attributed to these cases made up the actual CODABEILLES library, a dataset accessible through BOLD at http://dx.doi.org/10.5883/DS-CODAB01.

Fig. 2.

Fig. 2

French sampling locations of the specimens processed in CODABEILLES project. 34 specimens were collected in other countries but corresponded to species found in France (4 in Belgium, 3 in Greece, 7 in Italy, 4 in Morocco, 3 in Portugal, 7 in Spain, 3 in Switzerland, 2 in Tunisia)

Eventually, to account for possible barcodes generated outside the CODABEILLES initiative also contributing to French wild bee species coverage over the same period, the BOLD checklists tool was used on the 982 French wild bee species [11].

Utility and discussion

The reference library, generated under the CODABEILLES initiative for French wild bees, is available on the BOLD Public Data Portal. It can be downloaded in TSV format, including metadata and sequences, at 10.5883/DS-CODAB01. The data follow the standardized format provided by the Barcode Code Data Model Records (see the GitHub page: https://github.com/boldsystems-central/BCDM/tree/main and for details on field descriptions, refer to the field_definitions.tsv file). Records are also searchable within BOLD using the search engine. As an alternative way to explore and use these reference data, users can log in and access the library dataset directly through BOLD via the workbench platform.

  1. Sequencing success from specimens in collection varies across taxonomic groups

The 2260 specimens provided by the 16 institutional and private collections represented 649 taxa (630 species formally identified morphologically and 19 ambiguous taxa) across 56 genera within the six European bee families. After filtering out contaminations and low-quality sequences, 1477 sequences were retained, belonging to 561 taxa and 53 genera (Table 1). Among the 2165 specimens processed using the standard protocol targeting the 658 bp length COI-5P marker, the overall sequencing success rate reached 70%. A total of 1439 sequences met barcode standards (> 500 bp in length, < 1% ambiguous bases), while 10 sequences were shorter, ranging from 444 to 499 bp. This result is quite encouraging, given that most collections providing bee tissue samples lacked storage conditions specifically designed for DNA preservation, which may have contributed to amplification or sequencing failures in some samples [55, 61]. The sequencing success rate also aligns with previous studies using comparable molecular approaches, which reported similar recovery rates ranging from 67% [42] to 83% [41], the latter based on specimens sequenced within four years after their capture in the field. However, despite employing a NGS approach suitable for old material [55], only 28 partially informative sequences (ranging from 130 to 456 bp) were recovered from the 95 specimens in Robert Delmas’s reference collection (Andrena spp. and Bombus spp. collected between 1905 and 1984).

Table 1.

Overall data regarding the CODABEILLES library. Number of bee specimen sampled, Number of sequences generated (> 500 bp and < 500 bp), Number of taxa and genera sampled and covered with a sequence. Taxa refers to either species or species groups in cases of uncertain species assignment

Nb of sampled specimens 2260
Nb of sequences generated 1477
Nb of sequences meeting barcode standard (> 500 bp & < 1% N) 1439
Nb of partial sequences (< 500 bp) 38
Nb of taxa with a sequence/Nb of taxa sampled 561/649
Nb of genera with a sequence/Nb of genera sampled 53/56

Combining both approaches enabled sequence coverage for 86% of the sampled taxa (Table 1). As reported in previous studies, sequencing success was variable amongst bee taxonomic groups [36, 37, 41]. For instance, in this study, 81% of Megachilidae specimens were successfully sequenced, highlighting that the molecular protocol was particularly effective for the genera Osmia and Megachile, with success rates of 90% and 87% respectively (Table 2, a and b). In contrast, only 43% of the specimens belonging to genus Hylaeus and 49% belonging to genus Andrena, provided a sequence (Table 2, b). In the worst-case scenarios, despite multiple attempts, certain species proved particularly resistant to amplification (Table 2, c). In the case of Andrena and Lasioglossum, this low success rate may be explained by inefficient primer annealing due to either polymorphic sequences or single nucleotide insertion or deletion within the primer binding site resulting in a lower binding efficiency or non-specific amplifications [37]. Otherwise, it may be attributed to the presence of heteroplasmy in Hylaeus species [62]. In such cases, alternative primer pairs or barcode markers are recommended. The approach implemented by Wood et al. [9] for Iberian bees — using two different primer pairs to amplify overlapping fragments of 325 bp and 418 bp within the 658 bp standard COI-5P gene — successfully improved DNA amplification for five of these recalcitrant species (but see section 4 Planned future development). Additionally, targeting the complementary 16S marker may help overcome this limitation, as reference barcodes for two of these recalcitrant species were recently deposited in the BOLD public database [49]

  • 2)

    A significant improvement in reference barcode coverage for the French bee fauna

Table 2.

Sequencing success rates calculated for a) each taxonomic family, b) the 10 most abundant genera, and c) the 9 most problematic species within the CODABEILLES project. These success rates can range from 0% (no successful sequencing) to 100% (all specimens successfully sequenced) and are visually represented using a color gradient from red (null or low success) through yellow (moderated success) to green (high success)

graphic file with name 12862_2025_2429_Tab2_HTML.jpg

The CODABEILLES initiative has played a key role in expanding the reference library for the six European bee families (Fig. 3). At the project’s launch in 2021, only 25% of the French bee fauna was covered with a reference barcode obtained from a voucher from France, increasing to 62% when including data from other countries [41, 42]. At the date of writing this manuscript, reference data for French species has significantly improved: 72% of the 982 wild bee species from France [11] are now covered by a French reference barcode. When incorporating reference sequences from projects conducted in other geographical areas over the same period (e.g. Iberian bee species, [9]), overall coverage reaches 94% of the French bee fauna.

Fig. 3.

Fig. 3

Improvement in barcode coverage for French wild bees since 2021

Reference sequences were obtained for 561 taxa, including 545 formally identified species (544 French species and one Algerian species). Additionally, 16 taxa could not be accurately identified, as they likely belong to species complexes requiring further taxonomic investigation for unambiguous morphological identification. The supplementary material 2 provides the status of the species as of February 2025, along with the corresponding synonyms under which few specimens have been processed in the CODABEILLES dataset. For four species, the voucher specimens providing reference sequences were collected outside of France: Amegilla quadrifasciata (DE VILLERS, 1789) obtained from Morocco; Andrena subopaca NYLANDER, 1848 obtained from Belgium; Nomada fallax PÉREZ, 1913 obtained from Portugal; and Nomada numida LEPELETIER, 1841 obtained from Italy. This information is accessible in supplementary material 2 and 3. The CODABEILLES initiative contributed new French reference barcodes for 454 of the 982 French species [11], 197 of which had no public reference barcode in the BOLD world database (Fig. 3). It also provided additional reference sequences that improved the characterization of intraspecific genetic divergence for 78 species. Alongside the recent work of Wood et al. [9], this project delivers new and complementary reference data to enhance taxonomic knowledge on the bee species from the Mediterranean Basin [8, 9].

A total of 59 wild bee species (6%) are still lacking barcode reference for the COI-5P marker. This includes species across the six European bee families: eight Andrenidae species, 21 Apidae, eight Colletidae, six Halictidae, 15 Megachilidae, and one species from Melittidae (Fig. 3). This number includes four species with a barcode but for which we lack confidence in the identification of the voucher. At a time when nature conservation is a priority, taxonomic research on bees is particularly active in Europe. Revisions are occurring across most genera and families, leading to frequent updates of wild bee checklists [10, 11, 22, 23, 6370]. Of the 59 species lacking a reference barcode, a significant part was either recently discovered on the French territory or recently described as new to science (16, including Lasioglossum inexpectatum FLAMINIO & PAULY, 2024; Aglaoapis sparsepunctata [23],Chelostoma incisa [23], Hoplitis agnielae [23], Hoplitis corsaria (WARNCKE, 1991); [23, 67, 71]). They were included late or not included in the CODABEILLE initiative. Other taxa were rare (16 cleptoparasitic bees and notoriously rare taxa such as Thyreus hellenicus, [11]) or we faced sequencing or identification issues with the specimens we sampled (13 cases). We are currently trouble-shooting the sequencing issues and we may obtain new barcodes for up to nine additional species (Section 4 Planned future developments and Supplementary material 2).

  • 3)

    COI DNA barcoding is effective for the delineation of most French bee species but highlights outstanding taxonomic inconsistencies

Out of the 1477 sequences obtained in the CODABEILLES dataset, a total of 1051 sequences exhibited congruence between morphological and molecular classifications, where each BIN was assigned to a single species and each species corresponded to a single BIN (Supplementary material 3). Excluding putative misidentifications and short sequences that could not be assigned to a BIN, this accounted for 76% of the bee samples, enabling the unambiguous classification of 78% of the taxa in the CODABEILLES dataset (i.e. 437 species). The scatterplot (Fig. 4, a) illustrates the overlap between maximum intraspecific distances (singleton excluded) and interspecific (nearest-neighbour) distances, for 540 pairwise comparisons based on 1386 sequences (length > 400 bp). With this subset of data, we found that for 90% of these pairwise comparisons the distance to nearest-neighbour species exceeds the maximum intraspecific distance by at least 2%. Overall, these results suggest that COI DNA barcoding would be a valuable approach for the identification of specimens and the delineation of a majority of bee species (437 species in the present study), as reported in comparable studies that partially covered the French bee fauna [41, 42]. However, our results are less conclusive regarding the relevance of COI DNA barcoding compared to findings for Iberian bees, where 95% of specimens identified at the species level were assigned to unique BINs [9].

Fig. 4.

Fig. 4

Barcoding gap analysis. a Scatterplot showing the overlap of the max intraspecific distances against the interspecific (nearest neighbour) distances. b Nomalized Divergence Histrogram

Indeed, these conclusions should be interpreted with caution, as we also observed clear cases of taxonomic discrepancies. Some specimens morphologically assigned to a single species were split into multiple BINs, while others shared the same BIN despite belonging to distinct species (Supplementary material 3). Among the mismatches between morphological identifications and molecular results (BIN assignment), 333 were linked to taxonomic inconsistencies probably due to unstable systematics—such as unclear species boundaries within known species complexes or potential cryptic diversity—while 66 mismatches were attributed to potential misidentifications or labelling errors. As of the writing of this manuscript, 145 specimens have been re-examined, resulting in either species reassignment for the specimens (75 specimens) or confirmation of the initial identification (70 specimens), further highlighting unresolved questions regarding species delineation in certain groups. More details on these discrepancies within the genera Andrena and Nomada are provided below. Phylogenetic relationships remain unresolved for many bee species. Generic and subgeneric concepts have not been consistently revised, and robust phylogenies are lacking for a substantial portion of bee taxa [72]. However, taxonomic identification keys are generally available for most described species. In unresolved species complexes, identifications are typically based on a combination of morphological characters, ecological traits, phenology, and distribution data. These sources are used collectively to support identifications, though in some cases uncertainty remains.

The histogram (Fig. 4, b) depicts the distribution of normalized divergence at both species and genus levels. Sequence divergence ranged from 0 to 18.21%, with mean distance of 0.94% within species, and from 0 to 25.78% with a mean of 13.18% within genus. Although a bimodal pattern was observed between intra- and interspecific distances, no clear barcoding gap emerged. A barcoding gap occurs when the intraspecific genetic divergence is an order of magnitude smaller than the interspecific genetic divergence within the group of organisms considered [73]. For example, one might expect the maximum genetic distance within a species not to exceed 3%, while the minimum distance between species (interspecific distance) would be around 7%. In such a case, a barcoding gap is observed between 3 and 7%. In the present study, intra- and interspecific distances overlapped, indicating that this clear separation was not consistently observed. Some species exhibited unexpectedly high intraspecific divergence, and others showed low (< 2%) or even null interspecific distances, potentially rendering DNA barcoding ineffective for their delineation.

The concept of a barcoding gap—typically defined as predefined sequence divergence threshold between species—was initially introduced in the literature as a convenient guideline for species identification, not only for Anthophila but for animals in general [25]. A recent study analysing all available COI-5P sequences from BOLD for European bee species reported an overlap between intra- and interspecific genetic distances, indicating the absence of an arbitrary barcoding gap across all Anthophila [40], although filtering the distance dataset to remove erroneous sequences helped define a barcoding gap ranging between 6.5% (maximum intraspecific distance) and 9% (minimum interspecific distance). The absence of a well-defined barcoding gap was also reported for Irish solitary bees [37] and for species from genus Lasioglossum in North America [74], one of the most species-rich bee genera. These studies on wild bees concluded that such a barcoding gap was biologically unlikely.

This may be attributed to several factors: (1) a sequence divergence threshold specific to a given bee genus, probably due to different coalescence times for each lineage [40, 49, 75, 76], (2) uneven barcoding efforts across geographical regions, leading to underrepresentation of certain areas and their diversity [29], and (3) erroneous data due to misidentified specimens in public repositories that biases genetic distance estimates [40, 75]. Indeed, overlap between intra- and interspecific distances frequently emerges with increased sampling across the species’ range. Several studies have suggested that the apparent presence of barcoding gaps may be an artifact of insufficient sampling, especially when the full geographic and ecological range of the species was not adequately covered [77, 78].

As a clade that emerged around 125 million years ago and representing over 20,000 species [79], bees exhibit extreme diversity, which may be incompatible with the existence of a universal COI DNA barcoding gap. The analysis of intraspecific and interspecific distances at the family level and within the most represented genera in the CODABEILLES dataset revealed varying distribution patterns across taxonomic groups, with no clear barcoding gap observed between intra- and interspecific distances (Supplementary material 4). Mean intraspecific divergence was 2.12% (range 0–18.14) and mean interspecific divergence was 16.94% (range 0–24.56) for species from Andrena, while mean intraspecific divergence was 0.61% (range 0–4.32) and mean interspecific divergence was 13.12% (range 2.34–20.43) for species from Osmia. Waiting for a more stable consensus on bee classification, sequence trimming—applied by Janko et al. [40] on a larger dataset—may help filter taxonomically ambiguous cases and clarify the barcoding gap observed at the European level for the genera Andrena (10%–13%) and Osmia (2.5%–5.5%).

In the present dataset, the genera Andrena and Nomada exhibited particularly high levels of mismatch between morphospecies and BINs. They were also the genera covered by the higher number of sequences in the present database (Table 2). The following paragraphs aims to highlight the taxonomic discrepancies observed within these two genera, emphasizing the need for further taxonomic investigations in the coming years.

  • Notable inconsistencies within the Andrena genus

Most of the inconsistencies observed within the Andrena genus corresponded to single species splitting into multiple BINs, while few cases referred to different species merging into a single BIN or a mix of both situations (Fig. 5). Notably, the specimen BCA0624 morphologically identified as A. assimilis RADOSZKOWSKI, 1876 (sampled under the synonym A. gallica SCHMIEDEKNECHT, 1883 in the CODABEILLLES dataset) and the specimen BCA0829 identified as A. thoracica (FABRICIUS, 1775) were both attributed to BOLD BIN #AAE1815. Public information provided on BOLD indicated that this genetic cluster encompassed mostly female specimens belonging to the following species: A. thoracica, A. limata SmitH, 1853, A. nitida (Müller, 1776). As females of A. assimilis are morphologically close to those of A. thoracica and A. limata, differing on the punctuation of the disc of the first tergite [80], we cannot exclude a misidentification of the specimen BCA0624 if these diagnostic character are variable within species. However, the boundaries between these three species are unclear, and Wood [80] has even recently highlighted the existence of three clades formed by different A. limata specimens, with no geographic pattern. Given the current state of systematics, the specimen A. gallica (BCA0624) would likely belong to A. limata clade #2 (sensu [80]). Meanwhile, the specimen identified as A. thoracica (BCA0829), although belonging to the same BIN, would be correctly identified being part of a monophyletic group, distinct from A. limata clade #2 (Supplementary material 5). The identity of these specimens remains to be confirmed, as the situation within this species group could not be clarified using COI alone [80]. This would require further investigations and raises questions about the robustness of the morphological criteria used to distinguish them.

Fig. 5.

Fig. 5

Sankey diagrams illustrating the mismatches observed between morphological and molecular species-level identifications within aAndrena et bNomada genera. The thickness of the link is proportional to the number of specimens (ranging from 1 to 6). A blue link represents different species sharing a common BIN, while a green link represents a single species splitting into multiple BINs

Two more pairs of species also shared common genetic clusters: A. mitis (NYLANDER, 1852) (2 specimens) and A. apicata SmitH, 1847 (1 specimen) under BIN #AAJ2193, as well as A. confinis STÖCKHERT, 1930 (3 specimens) and A. congruens SCHMIEDEKNECHT, 1882 (2 specimens) under BIN #AAF0994 (Fig. 5). Nonetheless, the A. mitis specimens exhibited about 0.82% divergence from the A. apicata specimen. Despite being grouped under the same BIN, this slight genetic divergence enables their distinction using a reference barcode. This suggests that COI DNA barcoding remains effective for species delineation even below the conventional 2% interspecific divergence threshold, particularly for species that may have undergone a recent speciation event [81]. These refer to situations highlighted in light green on the scatterplot (Fig. 4a)).

Among the discrepancies observed within the Andrena genus, some cases involved mixed patterns (Fig. 5). Specimens identified as A. ovatula (KIRBY, 1802) and A. wilkella (KIRBY, 1802) exhibited both BIN sharing and fragmentation into multiple BINs. This observation was expected, as these species belong to a well-known species complex. Recent taxonomic investigations, using short diagnostic barcodes, have enabled the separation of A. ovatula from A. afzeliella (KIRBY, 1802), previously identified as A. ovatula sensu lato [29]. The NJ tree inferred using all publicly available sequences from the Taeniandrena subgenus helped in clarifying the identity of the specimens from CODABEILLES library (Supplementary material 6). Specimens from A. wilkella formed a monophyletic group attributed to BIN #AAA8959. The BIN #AAK0399 corresponded to specimens of A. ovatula from diverse origins (Portugal, Spain, UK and France) but exhibiting a very low mean divergence of 0.09%. The BIN #AAP2754 should refer to the A. afzeliella genetic cluster, as demonstrated by Praz et al. [29]. This suggests that the following specimens should be re-observed in light of the description of Praz et al. [29], probably leading to a revision in the species attribution: specimens BCA0343, BCA0225 and BCA0911 (all assigned to BIN #AAP2754) may correspond to A. afzeliella.

A second instance of a mixed pattern was observed between specimens of A. humilis IMHOFF, 1832 and A. paucisquama NOSKIEWICZ, 1924, which yielded a rather unexpected result (Fig. 5). One specimen (BCA0743) morphologically attributed to A. paucisquama was assigned to BIN #AAP2755 (Supplementary material 7). This genetic cluster consisted of five specimens from various locations (Austria, Croatia, Greece and France) all identified as A. paucisquama and showing no genetic variation, an outcome that aligns with expectations (Supplementary material 7). Unexpectedly, however, a second A. paucisquama specimen (BCA1634, from Hérault, FR), was assigned to a different BIN (#AEL0191). This BIN also included another specimen (BCA0052, from Gers, FR), that belonged to the species complex composed of A. humilis and A. cinerea BRULLÉ, 1832. The BIN #AEL0191 exhibited 18.14% divergence from BIN #AAP2755 (A. paucisquama cluster), while showing 11.02% mean divergence from BIN #AEO0783 (A. cinerea cluster) and 12.19% mean divergence from a clade of five BINs associated with A. humilis (Supplementary material 7). Both specimens (BCA1634 and BCA0052) from the BIN #AEL0191 are stored in independent collections and were initially identified by different experts. Given their genetic similarity and the isolated clade they form – distinct from any other genetic reference – they warrant closer examinations, as, if not a case of misidentification, this could be a sign of cryptic diversity. Moreover, we observed a situation of high diversification within A. humilis with specimens splitting into five different BINs (Fig. 5). Wood [80] and Schmidt et al. [42] already reported the particularly high intra-specific variations exhibited by this species, and attributed to its range of distribution, A. humilis being the most widespread West Palaearctic Chlorandrena. In the present dataset, three specimens identified as A. humilis supported this broad species concept, being distributed across BINs that group A. humilis specimens from Germany, Austria and Belgium (specimen BCA0821 attributed to BIN #AAK0283, as well as specimens BCA0051 and BCA0367 attributed to BIN #AAP2740). However, two more specimens (BCA0369 in BIN #AER0534 and BCA0801 in BIN #AER0533) were strongly separated from the broad A. humilis clade by a mean distance of 15.79%. They appeared genetically relatively close to A. rhenana STÖCKHERT, 1930 (with a 4.47% mean divergence), while remaining isolated from any other genetic reference and raising further questions about the potential for cryptic diversity. Complementary genetic markers and thorough morphological examination are needed to precisely determine the status of such specimens.

Numerous additional cases of species splitting into multiple BINs were observed within the Andrena genus (Fig. 5). Specimens of A. hedikae JAEGER, 1934 were attributed to two different BINs (distant from 2.04%). When considering publicly available data, the species formed three clades. One clade contained specimens from Morocco, while the other two included specimens from various locations (Portugal, Spain, France). Although our data provide a better characterization of intraspecific divergence for this species with specimens collected from South West and East France (Gironde and Drôme), additional sequences -especially from South Eastern Europe- are still needed to further elucidate barcode variation in A. hedikae, as highlighted by Wood [80].

Four specimens originally identified as A. ampla WARNCKE, 1967 were attributed to two BINs (BCA0772 and BCA0774 in BIN #AAE4950, BCA0614 and BCA0615 in BIN#ABA2611). The Andrena proxima-complex has been the focus of recent investigations based on COI and UCE phylogenetic analyses enabling the separation of the species in this group: A. ampla, A. proxima (KIRBY, 1802) and A. alutacea STÖCKHERT, 1942 [82]. Beyond a genetic differentiation, the three species also display distinct phenologies, A. proxima flying significantly earlier in the season compared to A. alutacea, while A. ampla exhibits an intermediate flight period [82]. Although the identification of the specimens BCA0614 and BCA0615 were consistent with publicly available data, those of specimens BCA0772 and BCA0774 raised questions as they were genetically similar to specimens of A. proxima (null divergence). Moreover, the phenological data regarding these female specimens, collected on a 23rd May (143rd/144th day of the year), aligned with the flight period of A. proxima, suggesting that the specimens should be re-examined and their identification revised.

Phenological shifts among closely related species frequently occur within the genus Andrena [82]. Although A. flavilabris SCHENK, 1874 and A. decipiens SCHENK, 1861 are nearly indistinguishable, a genetic study confirmed a recent speciation event and the existence of the two taxa, a spring-flying species (A. flavilabris) and a summer-flying species (A. decipiens) [83]. However, the sequences provided in the present study did not align with expectations (Fig. 5). The four specimens originally identified as A. flavilabris were assigned to distinct BINs that exhibited a 2.56% mean divergence (BCA0348 and BCA0750 in BIN #AER0529, BCA0349 and BCA0350 in BIN #AEO2326). The BIN #AEO2326 exhibited null genetic variation and was also composed of specimens identified as A. decipiens from Italy and Spain (publicly available data). The collection dates for A. decipiens specimens ranged from 1 st July to 9th July. In contrast, the specimens BCA0349 and BCA0350 were collected on 16th of April and 18th of May, respectively, which is consistent with their assignment to A. flavilabris, but is rather unexpected since they share similar barcode sequences with A. decipiens. The genetic differentiation observed by Mandery et al. [83] between the two taxa was carried out using a 500 bp fragment of the 16S rRNA gene. In this case of evolutionary young taxa separation, it is possible that the standard COI barcode lacks diagnostic substitutions, making it unresolutive. Besides, and more surprisingly, the specimens BCA0348 and BCA0750 were the only representatives of the new BIN #AER0529. This unexpected level of intraspecific variation did not appear to be linked to their geographical origin (both collected in Rhône, FR) nor explained by their collection dates (17th October and 20th April, respectively). However, this raises questions about the potential bivoltine nature of the species. Further investigation is needed, involving barcoding of additional individuals to better characterize the extent of this variation.

High levels of intra-specific variations within a nominal species may draw attention to cases of unresolved taxonomy, differentiation of isolated populations or potential cryptic diversity. For instance, in the present data, several cases of newly divergent BINs were observed within species (Fig. 5). In A. lavandulae PÉREZ, 1902 (sampled under the synonym A. impressa WARNCKE, 1967 in the present dataset), specimen BCA1575 formed the new BIN #AFG7238, exhibiting a 2.25% mean divergence from BIN #AEO5002 which included multiple conspecific specimens from Spain, Portugal, Morocco and France. In A. niveata FRIESE, 1887, specimen BCA0394 formed the new BIN #AER0517, showing a 2.39% mean divergence from the BIN #AER0516. In A. ranunculi SCHMIEDEKNECHT, 1883, specimen BCA0783 formed the new BIN #AER0520, diverging by 5.96% from the BIN #AEL3586, which comprises conspecific specimens from Spain and France (notably, BCA0782 was collected at the exact same location on the same date). In A. nana (KIRBY, 1802), specimen BCA0772 formed the new BIN #AER0515, distant by 6.19% from the BIN #AAR3413, which contained conspecific references. Collected in Eastern France (Isère), specimen BCA0772 represented the first genetic record from this geographical area, while specimens from BIN #AAR3413 originated from Germany, Portugal, Morocco, Spain and South France (Gers). In A. pusilla PÉREZ, 1903, specimen BCA0758 also raised questions. It formed the new BIN #AER0519, exhibiting a high divergence (11.92%) from conspecific specimens assigned to BIN #ADM2268, yet showing relative genetic proximity (5.13%) to BINs #AAV9726 and #ADZ6919, which included specimens from A. simontornyella NOSKIEWICZ, 1939. The morphological re-examination of specimen BCA0758 did not align with the genetic findings and, at this stage, failed to provide further clarification.

These cases may reflect overlooked isolated populations or even possible undescribed species, further supporting the theory of rapid diversification reported in the Andrena genus [29, 80]. Nonetheless, the classification of specimens from a single species into multiple genetic clusters alone is insufficient to draw conclusions on new species boundaries. It is possible that the highlighted specimens represent additional cryptic species, however proper case-by-case investigations are needed, including the genotyping of particularly divergent populations along with the integration of complementary data (phenology, ecological, distribution) within an integrative taxonomy framework. By providing a more detailed characterization of inter- and intraspecific genetic variations, this study establishes a solid baseline for future taxonomic revisions of French bee species distributed across the Mediterranean Basin.

  • Notable inconsistencies within the Nomada genus

All Nomada specimens with inconsistent barcode results (Fig. 5) were re-examined by a taxonomist of the group (Eric Dufrêne). Specimen BCA1335, initially identified as N. mutabilis MORAWITZ, 1870, was the only specimen invalidated (but see details below). Most of the inconsistencies observed within the Nomada genus corresponded to different species merging into a single BIN while few cases referred to single species splitting into multiple BINs (Fig. 5).

Nomada alboguttata HERRICH-SCHÄFFER, 1839 (6 specimens) and N. baccata SmitH, 1844 (3 specimens) were grouped within the same genetic cluster (BIN #AAC8572). However, a slight interspecific divergence was observed, separating the species into two clades with a mean genetic distance of 0.36%. This finding aligns with the results of Mignot [84], who analyzed specimens from the same N. baccata population (collected by E. Dufrêne in Yvelines, France) using both a unilocus mitochondrial COI marker and multilocus nuclear UCEs. Nomada alboguttata is known to consist of three forms with phenological lags and different hosts for the first two forms, while the host of the third form remains unknown [85, 86]. Nomada baccata is clearly distinguished from N. alboguttata by its later flight period, different host, and subtle but consistent morphological differences. Although the interspecific distance is below 2% (Fig. 4a), see light green dots on the scatterplot), COI DNA barcoding can still serve as a diagnostic tool to distinguish between the two species.

A similar situation was observed for N. zonata PANZER, 1797 and N. piccioliana MAGRETTI, 1883. While these two species are morphologically similar, they can be reliably distinguished by a specialist. Despite sharing the same BIN (BIN #AAF3496), they formed two distinct clades with an average genetic divergence of 0.69% (Supplementary Material 8). This low but consistent divergence allows for their genetic differentiation. A notable point to highlight is that one specimen, BCA2251 collected in Corsica, and belonging to the subspecies N. zonata pulcherrima STOECKHERT, 1944 is genetically identical to the nominate subspecies N. zonata zonata PANZER, 1797.

Likewise, one specimen (BCA0939) identified as N. numida manni MORAWITZ, 1877, from Sardinia (IT) was assigned to the same BIN as specimens of N. illustris SCHMIEDEKNECHT, 1882, from France, Spain, and Portugal. The two species were separated into two clades by a mean distance of 1.08%, and exhibited clear morphological differences, despite belonging to the same species group (sensu [87, 88]) and the same subgenus (sensu [89]).

We also observed that two specimens of N. villosa THOMSON, 1870 were grouped in the same BIN as N. striata FABRICIUS, 1793, along with other publicly available specimens from both species. These two species are genetically close [89] and also morphologically close, forming a small, highly homogeneous group alongside N. symphyti STOECKHERT, 1930. Interestingly, in this BIN (#ABY7961), we observed three distinct clades (Supplementary Material 8): (1) the two N. striata specimens from Corsica, which also differ in their coloration; (2) the N. striata specimen from the Pyrénées-Orientales, grouped with Iberian specimens; and (3) the remaining specimens, comprising a mix of N. striata and N. villosa, from France and other parts of Europe, forming the third clade. This slight genetic differentiation, seemingly linked to geographical patterns and partially supported by morphological variations, would warrant further investigation.

In contrast to previous observations, we found a few cases where a single species was split into two distinct BINs. One notable example is N. fulvicornis FABRICIUS, 1793, which was divided into two highly divergent BINs (5.63% genetic distance apart). In BIN #ACE0147, we observed other publicly available specimens of N. fulvicornis alongside N. subcornuta (KIRBY, 1802), despite the latter’s recent reinstatement as a distinct species—though based on a geographically restricted sampling [89, 90]. In the second BIN (#ACF5896), we observed specimens BCA1319 and BCA1320 (collected from Ariège and Dordogne, France, respectively), along with additional N. fulvicornis specimens from other projects, originating from Switzerland and Slovakia. Furthermore, two additional genetically close BINs containing N. fulvicornis were found in publicly available data. Several subspecies of N. fulvicornis have been described, typically with clear geographic differentiation [90], but this pattern does not seem to apply in this case. Additionally, multiple morphological forms have been documented, and while some authors consider certain forms to represent valid species, a dedicated study would be required to clarify the taxonomic status of these lineages.

The last notable point concerns specimen BCA1335, initially identified as N. mutabilis MORAWITZ, 1872. This specimen formed a new and distinct BIN (#AFA1844) on its own. It is a male belonging to the N. armata group (sensu [87, 88]), within the subgenus Gestamen (sensu [89]). All closely related BINs in the cladogram corresponded to species from this group (Supplementary Material 8), which is geographically restricted to the western Palearctic. A detailed morphological examination of BCA1335, in comparison with all known species of the N. armata group, suggested that it may represent a new, yet undescribed species (Dufrêne & Philippe, in progress).

  • 4)

    Planned future developments

Next steps of the CODABEILLES project will involve: 1) the second attempt of amplification for failing samples, 2) the addition of barcode sequences for rare species currently lacking reference data, 3) the re-examination of specimens potentially misidentified or presenting a mismatch between morphological and genetic results and 4) the combination of multiple species delimitation methods to address complex situations.

  • Samples that fail to amplify at first attempt targeting the full-length (658 bp) COI-5P gene (716 specimens), will be alternatively processed using internal primers targeting shorter fragments. This approach is recommended in the second instance for degraded DNA of specimens stored in collection [91]. Two overlapping fragments of the COI-5P gene, 307 bp and 407 bp-long, will be obtained using two primer cocktail sets: C_LepFolF + MLepR2 and MLepF1 + C_LepFolR, respectively [92, 93]. Once obtained, the new sequences will be deposited under a complementary dataset (DS-CODAB02) on BOLD. This will improve coverage for 64 species with a voucher from France and 9 species with a voucher from another country (Fig. 3).

  • In the coming months, a focus will be placed on the 59 species currently lacking barcode reference from the French bee fauna. This will be investigated as part of the ongoing related project “IDMYBEES” (https://www.idmybee.com/the-project.html). Since the launch of the CODABEILLES project, close collaboration with experts has been a sine qua non for establishing the library, and it will remain essential for achieving this objective. If the target species are not available in existing collections, dedicated field sampling sessions may be required, focusing on specific habitats, flight periods, or host plants. Additional reference barcodes produced will also meet the expectations of international scientific community working on European wild bee species, particularly within the frame of the ORBIT initiative, which aims to develop resources for bee inventory and taxonomy (https://orbitproject.wordpress.com/about-the-project/).

  • As discussed above, some specimens will require closer re-examinations, either to revise their identification or to resolve more complex cases of taxonomic discrepancies between morphological and genetic species-level identifications. Such cases of taxonomic discrepancies may result from unclear boundaries within species complexes or a cryptic diversity. In this context, DNA barcodes can serve as a valuable diagnostic character for primary species hypotheses, a first step in a longer integrative taxonomy process [26]. For instance, several cases of unexpected genetic variations, confirmed by morphologically distinct characters, were observed within the Nomada and Bombus genera and should be further investigated. Confirming the existence of a new species will require dedicated studies on additional specimens from these divergent populations, and integrating phenotypic, genetic and ecological data. The use of complementary mitochondrial markers (e.g. 16S; [49]) and nuclear markers (e.g. UCEs [94],) along with the recent sequencing of chromosome-length genomes [9597]), will enhance resolution and improve the characterization of wild bee diversity. Close collaboration between traditional taxonomy and innovative molecular tools will undoubtedly advance our understanding of wild bee systematics. An integrative taxonomic approach—particularly one that includes phylogenetic inference—would help resolve these species complexes and may reveal new diagnostic morphological features, as demonstrated in the case of the Lasioglossum villosum species complex (KIRBY, 1802) [98]. This integration is essential for generating reliable reference data, as erroneous information could compromise the effectiveness of future DNA-based monitoring efforts [40, 99].

  • As recommended within an integrative taxonomy framework, species delimitation should not rely solely on multiple markers but also on the combination of several analytical methods, as each tool presents inherent limitations. In the present study, hypothetical species were grouped based on sequence data using the clustering tool provided by the BOLD platform—specifically, the BIN system [60]—on which the barcode data were deposited. However, numerous approaches exist to assess how DNA barcode data from distinct genetic clusters may correspond to biological species. Addressing species complex situations properly would require the integration of dedicated analytical methods within a more comprehensive framework [74, 100102]. For instance, the well-known Automatic Barcode Gap Discovery (ABGD, [103]) and the tree-based multi-rate Poisson Tree Processes (mPTP, [104]) are effective methods for delineating species from single-locus data. More recently, the program ASAP was introduced [105], offering the advantage of providing a score to identify the best species partition. These tools would be particularly well-suited for application to a dataset such as the one presented here.

  • 5)

    Towards the DNA-based monitoring of wild bee species and their interactions with flowers

The reference data provided by this study serves as a solid foundation for further taxonomic investigations and also constitutes reliable data for the implementation of DNA-based field monitoring of pollinators and their interactions, which is sought at both national and European scales. The incorporation of DNA barcoding, metabarcoding and environmental DNA (eDNA) approaches in biodiversity monitoring offers an opportunity to establish large-scale and long-term monitoring schemes [31, 106]. For instance, Creedy et al. [38] laid the groundwork for using DNA metabarcoding techniques to assess species diversity and abundance. Their study focused on UK bees collected from mass-trapped catches. Steinke et al. [107] showed that metabarcoding allows for large-scale monitoring of changes in species composition. This approach goes beyond the biomass measurements that have previously been the primary metric for tracking changes in arthropod communities. Moreover, new proof-of-concept methods for non-invasive insect DNA collection have the potential to transform insect monitoring, although further research is needed to assess their scalability and feasibility for routine use. Airborne eDNA metabarcoding demonstrated the ability to detect traces of various pollinators, such as butterflies and wild bees [108], while eDNA from flowers has been reported in several studies as capable of revealing a diversity of wild bee species [45, 109, 110]. Such an approach would enhance our understanding of species interactions and the ecological and evolutionary processes they support in ecosystems.

Conclusions

Thanks to the collaboration of sixteen specialists and institutions who provided access to their collections for tissue sampling, we present the CODABEILLES dataset (DS-CODAB01), which includes 1,477 reference barcodes covering approximately 560 French bee species. This contribution has significantly improved barcode coverage, increasing from 62 to 82% of French species with barcodes and from 25 to 72% with French voucher specimens since the project’s launch in 2021. The inclusion of data from other independent initiatives raises the coverage to 94% of the French bee fauna with a barcode, and to 86% when considering only barcodes linked to a French voucher specimen.

Our work allows the identification of most French bee species using their COI-5P gene barcode, paving the way for DNA-based routine monitoring of pollinators. Additionally, our study confirms that Apoidea Anthophila is a highly diverse and taxonomically challenging group, in which species diversity is likely still underestimated. Integrating molecular tools within an integrative taxonomic framework provides experts with new opportunities to contribute to species discovery and classification.

Supplementary Information

12862_2025_2429_MOESM1_ESM.png (226.9KB, png)

Supplementary Material 1. Institutional and personal reference collections providing tissue materials

12862_2025_2429_MOESM2_ESM.xlsx (75.9KB, xlsx)

Supplementary Material 2. Dataset DS-CODAB01

12862_2025_2429_MOESM3_ESM.xlsx (53.6KB, xlsx)

Supplementary Material 3. Barcode coverage for French wild bees

12862_2025_2429_MOESM4_ESM.png (317.4KB, png)

Supplementary Material 4. Divergence histograms per taxonomic family and for the 8 most abundant genera

12862_2025_2429_MOESM5_ESM.pdf (17.8KB, pdf)

Supplementary Material 5. Melandrena subgenus NJ tree including CODABEILLES data along with publicly available barcodes from BOLD

12862_2025_2429_MOESM6_ESM.pdf (11.8KB, pdf)

Supplementary Material 6. Taeniandrena subgenus NJ tree including CODABEILLES data along with publicly available barcodes from BOLD

12862_2025_2429_MOESM7_ESM.pdf (11.4KB, pdf)

Supplementary Material 7. Chlorandrena subgenus NJ tree including CODABEILLES data along with publicly available barcodes from BOLD

12862_2025_2429_MOESM8_ESM.pdf (47.7KB, pdf)

Supplementary Material 8. Nomada genus NJ tree including CODABEILLES data along with publicly available barcodes from BOLD

Acknowledgements

The authors warmly thank Kamila Canale-Tabet, Thibault Leroy, and their research team, as well as Jérôme Willm and Catherine Bonnet, for their valuable assistance with tissue sampling.

The CODABEILLES Consortium is composed of (presented in alphabetical order): Emilie Andrieu1, Emmanuelle Artige2, Matthieu Aubert3, Yvan Brugerolles4, Alexandre Cornuel-Willermoz5, Raphaël Da Silva Ropio1, Adeline Dumet6, David Genoud7, Benoît Geslin8, Laurent Guilbaud9, Bernard Kaufmann6, Lara Konecny6, Anne-Laure Jacquemart10, Danny Lebreton4, Vincent Leclercq11, Gabriel Nève12, Annie Ouin1, Christophe Philippe13, Bertrand Schatz14, Jean-Claude Streito2, and Héloïse Vallod1.

1 UMR Dynafor, INRAe, INP-AgroToulouse, Toulouse, France

2 UMR CBGP, INRAE, CIRAD, IRD, Montpellier SupAgro, Univ Montpellier, Montpellier, France

3 4 chem. de la Foux, hameau du Méjanel, 34380 Pégairolles-de-Buèges, France

4 Arthropologia—60 chemin du Jacquemet, 69890 La Tour-de-Salvagny, France

5 Office de l’Environnement de la Corse, Observatoire Conservatoire des Invétérés de Corse. 14, Avenue Jean Nicoli, 20250 Corte, France

6 Université Claude Bernard Lyon 1, LEHNA UMR 5023, CNRS, ENTPE, F-69622, Villeurbanne, France

7 9 rue Hector Berlioz 87240 Ambazac, France

8 Université de Rennes (UNIR), UMR 6553 ECOBIO, CNRS, 263 avenue du Général Leclerc, 35042 Rennes cedex, France

9 UR 406 Abeilles et Environnement Site Agroparc, Domaine Saint-Paul 84914 Avignon Cedex 9, France

10 Earth and Life Institute, UClouvain, Louvain-la-Neuve, Belgium

11 5, rue de l’Esplanade, Résidence de l’Arche, Bâtiment Prokofiev, Étage 4, 13090 Aix-en-Provence, France

12 IMBE, CNRS, IRD, Avignon University, Aix Marseille University, France

13 Amateur entomologist; Observatoire des Abeilles; 15, rue de l’Auxerrois 46000 Cahors, France

14 CEFE, CNRS, Univ Montpellier, EPHE, IRD, Montpellier, France

Authors’ contributions

MO, AP, and MP conceived and planned the study and coordinated sample collection with partners from the CODABEILLES Consortium. The CODABEILLES Consortium assisted with bee tissue sampling or specimen re-examination. RRo acted as the liaison with the Canadian Center for DNA Barcoding (Guelph) platform. MO, RRo, AP, and MP oversaw data analysis and interpretation. MO and AP analysed the sequences. ED and RRu contributed their taxonomic expertise on the genera Nomada and Andrena, respectively, and assisted with result interpretation. MO led the manuscript writing by providing the first draft, while ED contributed the initial draft for the Nomada section. MO, AM and AP generated figures and tables for the manuscript. All authors reviewed, provided critical feedback, and approved the final manuscript.

Funding

Study carried out thanks to the Pollinéco network and the financial support of INEE-CNRS and, above all, the French Ministry of the Environment (Ministère de la Transition Ecologique et de la Cohésion des Territoires). Study also financially supported by Office Français de la Biodiversité and the French Ministry of Agriculture and Food (Ministère de l’Agriculture et de la Souveraineté Alimentaire) in the frame of the National Plan Ecophyto2. The Institut National Polytechnique de Toulouse also provided financial support for this initiative, as well as the French National Research Agency through the ANR JCJC research grant IDMYBEES (ANR-22-CE02-0028) supporting A. Perrard.

Data availability

The reference library, generated under the CODABEILLES initiative for French wild bees, is available on the BOLD Public Data Portal. It can be downloaded in TSV format, including metadata and sequences, at 10.5883/DS-CODAB01. The data follow the standardized format provided by the Barcode Code Data Model Records (see the GitHub page: https://github.com/boldsystems-central/BCDM/tree/main and for details on field descriptions, refer to the field_definitions.tsv file). Records are also searchable within BOLD using the search engine. As an alternative way to explore and use these reference data, users can log in and access the library dataset directly through BOLD via the workbench platform.

Declarations

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Mélodie Ollivier, Email: melodie.ollivier@toulouse-inp.fr.

The CODABEILLES Consortium:

Emilie Andrieu, Emmanuelle Artige, Matthieu Aubert, Yvan Brugerolles, Alexandre Cornuel-Willermoz, Raphaël Da Silva Ropio, Adeline Dumet, David Genoud, Benoît Geslin, Laurent Guilbaud, Bernard Kaufmann, Lara Konecny, Anne-Laure Jacquemart, Danny Lebreton, Vincent Leclercq, Gabriel Nève, Annie Ouin, Christophe Philippe, Bertrand Schatz, Jean-Claude Streito, and Héloïse Vallod

References

  • 1.Wagner DL, Grames EM, Forister ML, et al. Insect decline in the Anthropocene: Death by a thousand cuts. PNAS. 2021;118(2):e2023989118. 10.1073/pnas.2023989118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Goulson D. The insect apocalypse, and why it matters. Curr Biol. 2019;29:R967–71. 10.1016/j.cub.2019.06.069. [DOI] [PubMed] [Google Scholar]
  • 3.Klein A-M, Vaissière BE, Cane JH, et al. Importance of pollinators in changing landscapes for world crops. Proc R Soc Lond B Biol Sci. 2007;274:303–13. 10.1098/rspb.2006.3721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Potts SG, Imperatriz-Fonseca V, Ngo HT, et al. Safeguarding pollinators and their values to human well-being. Nature. 2016;540:220–9. 10.1038/nature20588. [DOI] [PubMed] [Google Scholar]
  • 5.Ascher J, Pickering J. Discover Life bee species guide and world checklist (Hymenoptera: Apoidea: Anthophila). 2025. [Google Scholar]
  • 6.Leclercq N, Marshall L, Caruso G, et al. European bee diversity: taxonomic and phylogenetic patterns. J Biogeogr. 2023;50:1244–56. 10.1111/jbi.14614. [Google Scholar]
  • 7.Orr MC, Hughes AC, Chesters D, et al. Global Patterns and Drivers of Bee Distribution. Curr Biol. 2021;31:451-458.e4. 10.1016/j.cub.2020.10.053. [DOI] [PubMed] [Google Scholar]
  • 8.Schneider L, Lossouarn C, Geslin B, et al. Bees of the Mediterranean basin: biodiversity insights from specimens in the IMBE collection (Marseille, France). Biodivers Data J. 2024;12: e141734. 10.3897/BDJ.12.e141734. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wood T, Gaspar H, Le Divelec R, et al. The InBIO barcoding initiative database: DNA barcodes of Iberian bees. BDJ. 2024;12: e117172. 10.3897/BDJ.12.e117172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Reverté S, Miličić M, Ačanski J, et al. National records of 3000 European bee and hoverfly species: a contribution to pollinator conservation. Insect Conserv Divers. 2023;16:758–75. 10.1111/icad.12680. [Google Scholar]
  • 11.Ropars L, Aubert M, Genoud D, et al. Mise à jour de la liste des abeilles de France métropolitaine (Hymenoptera : Apocrita : Apoidea). Osmia. 2025;13:1–48. 10.47446/OSMIA13.1. [Google Scholar]
  • 12.Ghisbain G, Thiery W, Massonnet F, et al. Projected decline in European bumblebee populations in the twenty-first century. Nature. 2024;628:337–41. 10.1038/s41586-023-06471-0. [DOI] [PubMed] [Google Scholar]
  • 13.Singh AP, De K, Uniyal VP, Sathyakumar S. Unveiling of climate change-driven decline of suitable habitat for Himalayan bumblebees. Sci Rep. 2024;14:4983. 10.1038/s41598-024-52340-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.White SA, Dillon ME. Climate warming and bumble bee declines: the need to consider sub-lethal heat, carry-over effects, and colony compensation. Front Physiol. 2023;14:1251235. 10.3389/fphys.2023.1251235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.LeBuhn G, Vargas Luna J. Pollinator decline: what do we know about the drivers of solitary bee declines? Curr Opin Insect Sci. 2021;46:106–11. 10.1016/j.cois.2021.05.004. [DOI] [PubMed] [Google Scholar]
  • 16.Carrié R, Lopes M, Ouin A, Andrieu E. Bee diversity in crop fields is influenced by remotely-sensed nesting resources in surrounding permanent grasslands. Ecol Ind. 2018;90:606–14. 10.1016/j.ecolind.2018.03.054. [Google Scholar]
  • 17.Goulson D, Nicholls E, Botías C, Rotheray EL. Bee declines driven by combined stress from parasites, pesticides, and lack of flowers. Science. 2015;347:1255957. 10.1126/science.1255957. [DOI] [PubMed] [Google Scholar]
  • 18.Carrié R, Andrieu E, Ouin A, Steffan-Dewenter I. Interactive effects of landscape-wide intensity of farming practices and landscape complexity on wild bee diversity. Landscape Ecol. 2017;32:1631–42. 10.1007/s10980-017-0530-y. [Google Scholar]
  • 19.Park MG, Blitzer EJ, Gibbs J, et al. Negative effects of pesticides on wild bee communities can be buffered by landscape context. Proc R Soc Lond B Biol Sci. 2015;282: 20150299. 10.1098/rspb.2015.0299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Rivers-Moore J, Andrieu E, Vialatte A, Ouin A. Wooded semi-natural habitats complement permanent grasslands in supporting wild bee diversity in agricultural landscapes. Insects. 2020;11:812. 10.3390/insects11110812. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Potts S, Dauber J, Hochkirch A, et al. Proposal for an EU Pollinator Monitoring Scheme. In: JRC Publications Repository. 2020. https://publications.jrc.ec.europa.eu/repository/handle/JRC122225. Accessed 25 Jan 2025
  • 22.Ghisbain G, Rosa P, Bogusch P, et al. The new annotated checklist of the wild bees of Europe (Hymenoptera: Anthophila). Zootaxa. 2023;5327:1–147. 10.11646/zootaxa.5327.1.1. [DOI] [PubMed] [Google Scholar]
  • 23.Le Divelec R. Four new species of Megachilidae from Corsica and Sardinia (Hymenoptera: Apoidea). Annales de la Société entomologique de France (NS). 2024;60:601–24. 10.1080/00379271.2024.2419083. [Google Scholar]
  • 24.Wood TJ. Two new overlooked bee species from Spain (Hymenoptera: Anthophila: Andrenidae, Apidae). OSMIA. 2022;10:1–12. 10.47446/OSMIA10.1. [Google Scholar]
  • 25.Hebert PDN, Cywinska A, Ball SL, deWaard JR. Biological identifications through DNA barcodes. Proc R Soc Lond B Biol Sci. 2003;270:313–21. 10.1098/rspb.2002.2218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Miralles A, Puillandre N, Vences M. DNA barcoding in species delimitation: from genetic distances to integrative taxonomy. Methods Mol Biol. 2024;2744:77–104. 10.1007/978-1-0716-3581-0_4. [DOI] [PubMed] [Google Scholar]
  • 27.Pauly A, Devalez J, Sonet G, et al. DNA barcoding and male genital morphology reveal five new cryptic species in the West Palearctic bee Seladonia smaragdula (Vachal, 1895) (Hymenoptera: Apoidea: Halictidae). Zootaxa. 2015;4034:257–90. 10.11646/zootaxa.4034.2.2. [DOI] [PubMed] [Google Scholar]
  • 28.Williams PH, Brown MJF, Carolan JC, et al. Unveiling Cryptic Species Of The Bumblebee Subgenus Bombus S. Str. Worldwide With Coi Barcodes (Hymenoptera: Apidae). 2012. [Google Scholar]
  • 29.Praz C, Genoud D, Vaucher K, et al. Unexpected levels of cryptic diversity in European bees of the genus Andrena subgenus Taeniandrena (Hymenoptera, Andrenidae): implications for conservation. J Hymenopt Res. 2022;91:375–428. 10.3897/jhr.91.82761. [Google Scholar]
  • 30.Rondeau S, Gervais A, Leboeuf A, et al. Combining community science and taxonomist expertise for large-scale monitoring of insect pollinators: perspective and insights from Abeilles citoyennes. Conserv Sci Pract. 2023;5: e13015. 10.1111/csp2.13015. [Google Scholar]
  • 31.Chua PYS, Bourlat SJ, Ferguson C, et al. Future of DNA-based insect monitoring. Trends Genet. 2023;39:531–44. 10.1016/j.tig.2023.02.012. [DOI] [PubMed] [Google Scholar]
  • 32.Gueuning M, Ganser D, Blaser S, et al. Evaluating next-generation sequencing (NGS) methods for routine monitoring of wild bees: metabarcoding, mitogenomics or NGS barcoding. Mol Ecol Resour. 2019;19:847–62. 10.1111/1755-0998.13013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Taberlet P, Coissac E, Pompanon F, et al. Towards next-generation biodiversity assessment using DNA metabarcoding. Mol Ecol. 2012;21:2045–50. 10.1111/j.1365-294X.2012.05470.x. [DOI] [PubMed] [Google Scholar]
  • 34.Phillips JD, Gillis DJ, Hanner RH. Incomplete estimates of genetic diversity within species: implications for DNA barcoding. Ecol Evol. 2019;9:2996–3010. 10.1002/ece3.4757. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Packer L, Ruz L. DNA barcoding the bees (Hymenoptera: Apoidea) of Chile: species discovery in a reasonably well known bee fauna with the description of a new species of Lonchopria (Colletidae). Genome. 2017;60:414–30. 10.1139/gen-2016-0071. [DOI] [PubMed] [Google Scholar]
  • 36.Sheffield CS, Heron J, Gibbs J, et al. Contribution of DNA barcoding to the study of the bees (Hymenoptera: Apoidea) of Canada: progress to date. Can Entomol. 2017;149:736–54. 10.4039/tce.2017.49. [Google Scholar]
  • 37.Magnacca KN, Brown MJF. DNA barcoding a regional fauna: Irish solitary bees. Mol Ecol Resour. 2012;12:990–8. 10.1111/1755-0998.12001. [DOI] [PubMed] [Google Scholar]
  • 38.Creedy TJ, Norman H, Tang CQ, et al. A validated workflow for rapid taxonomic assignment and monitoring of a national fauna of bees (Apiformes) using high throughput DNA barcoding. Mol Ecol Resour. 2020;20:40–53. 10.1111/1755-0998.13056. [DOI] [PubMed] [Google Scholar]
  • 39.Herrera-Mesías F, Ep Jarboui IK, Weigand AM. A metabarcoding framework for wild bee assessment in Luxembourg. JHR. 2022;94:215–46. 10.3897/jhr.94.84617. [Google Scholar]
  • 40.Janko Š, Rok Š, Blaž K, et al. DNA barcoding insufficiently identifies European wild bees (Hymenoptera, Anthophila) due to undefined species diversity, genus-specific barcoding gaps and database errors. Mol Ecol Resour. 2024;24: e13953. 10.1111/1755-0998.13953. [DOI] [PubMed] [Google Scholar]
  • 41.Villalta I, Ledet R, Baude M, et al. A DNA barcode-based survey of wild urban bees in the Loire Valley. France Scientific Reports. 2021;11:4770. 10.1038/s41598-021-83631-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Schmidt S, Schmid-Egger C, Morinière J, et al. DNA barcoding largely supports 250 years of classical taxonomy: identifications for Central European bees (Hymenoptera, Apoidea partim). Mol Ecol Resour. 2015;15:985–1000. 10.1111/1755-0998.12363. [DOI] [PubMed] [Google Scholar]
  • 43.Praz C, Müller A, Genoud D. Hidden diversity in European bees: Andrena amieti sp. n., a new Alpine bee species related to Andrena bicolor (Fabricius, 1775) (Hymenoptera, Apoidea, Andrenidae). Alpine Entomol. 2019;3:11–38. 10.3897/alpento.3.29675. [Google Scholar]
  • 44.Makiola A, Compson ZG, Baird DJ, et al. Key questions for next-generation biomonitoring. Front Environ Sci. 2020;7: 197. 10.3389/fenvs.2019.00197. [Google Scholar]
  • 45.Thomsen PF, Sigsgaard EE. Environmental DNA metabarcoding of wild flowers reveals diverse communities of terrestrial arthropods. Ecol Evol. 2019;9:1665–79. 10.1002/ece3.4809. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Allen MC, Lockwood JL, Kwait R, et al. Using surface environmental DNA to assess arthropod biodiversity within a forested ecosystem. Environ DNA. 2023;5:1652–66. 10.1002/edn3.487. [Google Scholar]
  • 47.Elbrecht V, Taberlet P, Dejean T, et al. Testing the potential of a ribosomal 16S marker for DNA metabarcoding of insects. PeerJ. 2016;4: e1966. 10.7717/peerj.1966. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Marquina D, Andersson AF, Ronquist F. New mitochondrial primers for metabarcoding of insects, designed and evaluated using in silico methods. Mol Ecol Resour. 2019;19:90–104. 10.1111/1755-0998.12942. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Marquisseau A, Canale-Tabet K, Labarthe E, et al. Building a reliable 16s mini-barcode library of wild bees from Occitania, south-west of France. Biodivers Data J. 2025;13: e137540. 10.3897/BDJ.12.e137540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Kotrba M. The DNA barcoding project on German Diptera: an appreciative and critical analysis with four suggestions for improving the development and reliability of DNA-based identification. EJE. 2020;117:315–27. 10.14411/eje.2020.037. [Google Scholar]
  • 51.Gargominy O, Tercerie S, Régnier C, et al. TAXREF v15. 0, référentiel taxonomique pour la France. 2021. [Google Scholar]
  • 52.Ollivier M, Cilia G, Cejas D. Molecular Identification of Wild Bees. In: Cilia G, Ranalli R, Zavatta L, Flaminio S, editors. Hidden and Wild: An Integrated Study of European Wild Bees. Cham: Springer Nature Switzerland; 2025. p. 151–85.
  • 53.Folmer O, Black M, Hoeh W, et al. DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Mol Marine Biol Biotechnol. 1994;3:294–9. [PubMed] [Google Scholar]
  • 54.Hebert PDN, Penton EH, Burns JM, et al. Ten species in one: DNA barcoding reveals cryptic species in the neotropical skipper butterfly Astraptes fulgerator. Proc Natl Acad Sci. 2004;101:14812–7. 10.1073/pnas.0406166101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Prosser SWJ, de Waard JR, Miller SE, Hebert PDN. DNA barcodes from century-old type specimens using next-generation sequencing. Mol Ecol Resour. 2016;16:487–97. 10.1111/1755-0998.12474. [DOI] [PubMed] [Google Scholar]
  • 56.D’Ercole J, Prosser SWJ, Hebert PDN. A smrt approach for targeted amplicon sequencing of museum specimens (Lepidoptera)-patterns of nucleotide misincorporation. PeerJ. 2021;9: e10420. 10.7717/peerj.10420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Hebert PDN, Braukmann TWA, Prosser SWJ, et al. A sequel to Sanger: amplicon sequencing that scales. BMC Genomics. 2018;19:219. 10.1186/s12864-018-4611-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Tamura K, Stecher G, Kumar S. MEGA11: molecular evolutionary genetics analysis version 11. Mol Biol Evol. 2021;38:3022–7. 10.1093/molbev/msab120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Kimura M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol. 1980;16:111–20. 10.1007/BF01731581. [DOI] [PubMed] [Google Scholar]
  • 60.Ratnasingham S, Hebert PDN. A DNA-based registry for all animal species: the barcode index number (BIN) system. PLoS One. 2013;8: e66213. 10.1371/journal.pone.0066213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Nakahama N. Museum specimens: an overlooked and valuable material for conservation genetics. Ecol Res. 2021;36:13–23. 10.1111/1440-1703.12181. [Google Scholar]
  • 62.Magnacca KN, Brown MJ. Mitochondrial heteroplasmy and DNA barcoding in Hawaiian Hylaeus (Nesoprosopis) bees (Hymenoptera: Colletidae). BMC Evol Biol. 2010;10:174. 10.1186/1471-2148-10-174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Aubert M, Leclercq V. Melitta seitzi Alfken, 1927, une nouvelle mélitte pour la faune française (Hymenoptera, Melittidae). Bull Soc Entomol Fr. 2024;129:369–79. 10.32475/bsef_2366. [Google Scholar]
  • 64.Aubert M, Müller A, Praz C. A new osmiine bee with a spectacular geographic disjunction: Hoplitis (Hoplitis) onosmaevae sp. nov. (Hymenoptera, Anthophila, Megachilidae). Alpine Entomol. 2024;8:65–79. 10.3897/alpento.8.118039. [Google Scholar]
  • 65.Dorchin A, Michez D. Species of the Western Palaearctic genus Tetralonia Spinola, 1838 (Hymenoptera, Apidae) with atypical pollen hosts, with a key to the pollinosa-group, description of new species, and neotype designation for Apis malvae Rossi, 1790. Taxonomy. 2024;4:126–49. 10.3390/taxonomy4010007. [Google Scholar]
  • 66.Dufrêne É. Description d’une nouvelle espèce de Nomada Scopoli, 1770, de France (Corse) (Hymenoptera, Apidae). Bull Soc Entomol Fr. 2021;126:437–43. 10.32475/bsef_2184. [Google Scholar]
  • 67.Le Divelec R, Cornuel-Willermoz A, Aubert M, Perrard A. Annotated checklist of the megachilid bees of Corsica (Hymenoptera, Megachilidae). J Hymenopt Res. 2024;97:127–89. 10.3897/jhr.97.114614. [Google Scholar]
  • 68.Müller A. Palaearctic Osmia bees of the subgenera Hemiosmia, Tergosmia and Erythrosmia (Megachilidae, Osmiini): biology, taxonomy and key to species. Zootaxa. 2020;4778:zootaxa4778.2.1. 10.11646/zootaxa.4778.2.1. [DOI] [PubMed] [Google Scholar]
  • 69.Rasmont P, Wood TJ. An enigmatic Anthophorine bee from the south of France revealed as a new species: Anthophora (Paramegilla) ahlamae n. sp. (Hymenoptera: Apidae). Annales de la Société entomologique de France (NS). 2024;60:151–65. 10.1080/00379271.2024.2325688. [Google Scholar]
  • 70.Wood TJ, Ghisbain G, Michez D, Praz CJ. Revisions to the faunas of Andrena of the Iberian Peninsula and Morocco with the descriptions of four new species (Hymenoptera: Andrenidae). Eur J Taxon. 2021;758:147–93. 10.5852/ejt.2021.758.1431. [Google Scholar]
  • 71.Flaminio S, Pauly A, Cilia G, et al. Lasioglossum inexpectatum sp. nov., a new species from Sardinia and Corsica (Hymenoptera: Apoidea: Halictidae). Osmia. 2024;12:23–32. 10.47446/OSMIA12.4. [Google Scholar]
  • 72.Engel MS, Rasmussen C, Gonzalez VH . Bees, Phylogeny and Classification. In: Encyclopedia of Social Insects. 2020; Cham: Springer.
  • 73.Meier R, Zhang G, Ali F. The Use of Mean Instead of Smallest Interspecific Distances Exaggerates the Size of the “Barcoding Gap” and Leads to Misidentification. Syst Biol. 2008;57:809–13. 10.1080/10635150802406343. [DOI] [PubMed] [Google Scholar]
  • 74.Gibbs J. DNA barcoding a nightmare taxon: assessing barcode index numbers and barcode gaps for sweat bees. Genome. 2018;61:21–31. 10.1139/gen-2017-0096. [DOI] [PubMed] [Google Scholar]
  • 75.Čandek K, Kuntner M. DNA barcoding gap: reliable species identification over morphological and geographical scales. Mol Ecol Resour. 2015;15:268–77. 10.1111/1755-0998.12304. [DOI] [PubMed] [Google Scholar]
  • 76.Gonçalves LT, Françoso E, Deprá M. Shorter, better, faster, stronger? Comparing the identification performance of full-length and mini-DNA barcodes for apid bees (Hymenoptera: Apidae). Apidologie. 2022;53:55. 10.1007/s13592-022-00958-x. [Google Scholar]
  • 77.Meyer G, Clare R, Weber E. An experimental test of the evolution of increased competitive ability hypothesis in goldenrod, Solidago gigantea. Oecologia. 2005;144:299–307. 10.1007/s00442-005-0046-z. [DOI] [PubMed] [Google Scholar]
  • 78.Wiemers M, Fiedler K. Does the DNA barcoding gap exist? – a case study in blue butterflies (Lepidoptera: Lycaenidae). Front Zool. 2007;4:8. 10.1186/1742-9994-4-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Cardinal S. Bee (Hymenoptera: Apoidea: Anthophila) Diversity Through Time. In: Insect Biodiversity. 2018; Wiley. 851–867
  • 80.Wood TJ. The genus Andrena Fabricius, 1775 in the Iberian Peninsula (Hymenoptera, Andrenidae). J Hymenopt Res. 2023;96:241–484. 10.3897/jhr.96.101873. [Google Scholar]
  • 81.Bossert S, Wood TJ, Patiny S, et al. Phylogeny, biogeography and diversification of the mining bee family Andrenidae. Syst Entomol. 2022;47:283–302. 10.1111/syen.12530. [Google Scholar]
  • 82.McLaughlin G, Gueuning M, Genoud D, et al. Why are there so many species of mining bees (Hymenoptera, Andrenidae)? The possible roles of phenology and Wolbachia incompatibility in maintaining species boundaries in the Andrena proxima-complex. Syst Entomol. 2023;48:127–41. 10.1111/syen.12566. [Google Scholar]
  • 83.Mandery K, Kosuch J, Schuberth J. Untersuchungsergebnisse zum Artstatus von Andrena decipiens SCHENCK, 1861, Andrena flavilabris SCHENCK, 1874, und ihrem gemeinsamen Brutparasiten Nomada stigma FABRICIUS, 1804 Apidae Hymenoptera. 2008. [Google Scholar]
  • 84.Mignot M. Précision sur le statut taxonomique du complexe Nomada alboguttata Herrich-Schäffer, 1839 (Hymenoptera Université de Bourgogne Apidée). 2020. [Google Scholar]
  • 85.Schwarz M, Gusenleitner FJ, Westrich P, Dathe HH. Katalog der Bienen Österreichs, Deutschlands und der Schweiz (Hymenoptera, Apidae). Entomofauna. 1996;Suppl S8:1–398. [Google Scholar]
  • 86.Smit J. Identification key to the European species of the bee genus Nomada SCOPOLI, 1770 (Hymenoptera: Apidae), including 23 new species. 2018. p. 1–253. [Google Scholar]
  • 87.Alexander BA. Species-groups and cladistic analysis of the cleptoparastic [sic] bee genus Nomada (Hymenoptera: Apoidea). Univ Kans Sci Bull. 1994;55:175–236. 10.5962/bhl.part.776. [Google Scholar]
  • 88.Alexander BA, Schwarz M. A catalog of the species of Nomada (Hymenoptera: Apoidea) of the world. Univ Kans Sci Bull. 1994;55:239–69. [Google Scholar]
  • 89.Straka J, Benda D, Policarová J, et al. A phylogenomic monograph of West-Palearctic Nomada (Hymenoptera: Apidae). Insect Syst Divers. 2024;8: 1. 10.1093/isd/ixad024. [Google Scholar]
  • 90.Falk S. The story behind Kirby’s Nomad Bee Nomada subcorcuta (Kirby, 1802). 2017. [Google Scholar]
  • 91.Levesque-Beaudin V, Miller M, Dikow T, et al. A workflow for expanding DNA barcode reference libraries through ‘museum harvesting’ of natural history collections. BDJ. 2023;11:e100677. 10.3897/BDJ.11.e100677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Hajibabaei M, Janzen DH, Burns JM, et al. DNA barcodes distinguish species of tropical Lepidoptera. Proc Natl Acad Sci U S A. 2006;103:968–71. 10.1073/pnas.0510466103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Hebert PDN, deWaard JR, Zakharov EV, et al. A DNA ‘Barcode Blitz’: Rapid Digitization and Sequencing of a Natural History Collection. PLoS One. 2013;8: e68535. 10.1371/journal.pone.0068535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Gueuning M, Frey JE, Praz C. Ultraconserved yet informative for species delimitation: Ultraconserved elements resolve long-standing systematic enigma in Central European bees. Mol Ecol. 2020;29:4203–20. 10.1111/mec.15629. [DOI] [PubMed] [Google Scholar]
  • 95.Falk S, Monks J, University of Oxford and Wytham Woods Genome Acquisition Lab, et al. The genome sequence of the Coppice Mining Bee, Andrena helvola (Linnaeus, 1758). Wellcome Open Res. 2025;10:102. 10.12688/wellcomeopenres.23746.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Falk S, Mulley JF. The genome sequence of the short-fringed mining bee, Andrena dorsata (Kirby, 1802). Wellcome Open Res. 2023;8:373. 10.12688/wellcomeopenres.19756.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Jones BM, Rubin BER, Dudchenko O, et al. Convergent and complementary selection shaped gains and losses of eusociality in sweat bees. Nat Ecol Evol. 2023;7:557–69. 10.1038/s41559-023-02001-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Pauly A, Noël G, Sonet G, et al. Integrative taxonomy resuscitates two species in the Lasioglossum villosulum complex (Kirby, 1802) (Hymenoptera: Apoidea: Halictidae). Eur J Taxon. 2019. 10.5852/ejt.2019.541. [Google Scholar]
  • 99.Locatelli NS, McIntyre PB, Therkildsen NO, Baetscher DS. GenBank’s reliability is uncertain for biodiversity researchers seeking species-level assignment for eDNA. Proc Natl Acad Sci. 2020;117:32211–2. 10.1073/pnas.2007421117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Lin X, Stur E, Ekrem T. Exploring genetic divergence in a species-rich insect genus using 2790 DNA barcodes. PLoS One. 2015;10:e0138993. 10.1371/journal.pone.0138993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Ranasinghe UGSL, Eberle J, Thormann J, et al. Multiple species delimitation approaches with COI barcodes poorly fit each other and morphospecies – an integrative taxonomy case of Sri Lankan Sericini chafers (Coleoptera: Scarabaeidae). Ecol Evol. 2022;12: e8942. 10.1002/ece3.8942. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Vuataz L, Reding J-P, Reding A, et al. A comprehensive DNA barcoding reference database for Plecoptera of Switzerland. Sci Rep. 2024;14:6322. 10.1038/s41598-024-56930-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Puillandre N, Lambert A, Brouillet S, Achaz G. ABGD, Automatic barcode gap discovery for primary species delimitation. Mol Ecol. 2012;21:1864–77. 10.1111/j.1365-294X.2011.05239.x. [DOI] [PubMed] [Google Scholar]
  • 104.Kapli P, Lutteropp S, Zhang J, et al. Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo. Bioinformatics. 2017;33:1630–8. 10.1093/bioinformatics/btx025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Puillandre N, Brouillet S, Achaz G. ASAP: assemble species by automatic partitioning. Mol Ecol Resour. 2021;21:609–20. 10.1111/1755-0998.13281. [DOI] [PubMed] [Google Scholar]
  • 106.Piper AM, Batovska J, Cogan NOI, et al. Prospects and challenges of implementing DNA metabarcoding for high-throughput insect surveillance. Gigascience. 2019;8:giz092. 10.1093/gigascience/giz092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Steinke D, deWaard SL, Sones JE, et al. Message in a bottle—metabarcoding enables biodiversity comparisons across ecoregions. Gigascience. 2022;11:giac040. 10.1093/gigascience/giac040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Roger F, Ghanavi HR, Danielsson N, et al. Airborne environmental DNA metabarcoding for the monitoring of terrestrial insects—a proof of concept from the field. Environ DNA. 2022;4:790–807. 10.1002/edn3.290. [Google Scholar]
  • 109.Avalos G, Trott R, Ballas J, et al. Prospects of pollinator community surveillance using terrestrial environmental DNA metagenetics. Environ DNA. 2024;6: e492. 10.1002/edn3.492. [Google Scholar]
  • 110.Newton JP, Bateman PW, Heydenrych MJ, et al. Monitoring the birds and the bees: Environmental DNA metabarcoding of flowers detects plant–animal interactions. Environ DNA. 2023;5:488–502. 10.1002/edn3.399. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

12862_2025_2429_MOESM1_ESM.png (226.9KB, png)

Supplementary Material 1. Institutional and personal reference collections providing tissue materials

12862_2025_2429_MOESM2_ESM.xlsx (75.9KB, xlsx)

Supplementary Material 2. Dataset DS-CODAB01

12862_2025_2429_MOESM3_ESM.xlsx (53.6KB, xlsx)

Supplementary Material 3. Barcode coverage for French wild bees

12862_2025_2429_MOESM4_ESM.png (317.4KB, png)

Supplementary Material 4. Divergence histograms per taxonomic family and for the 8 most abundant genera

12862_2025_2429_MOESM5_ESM.pdf (17.8KB, pdf)

Supplementary Material 5. Melandrena subgenus NJ tree including CODABEILLES data along with publicly available barcodes from BOLD

12862_2025_2429_MOESM6_ESM.pdf (11.8KB, pdf)

Supplementary Material 6. Taeniandrena subgenus NJ tree including CODABEILLES data along with publicly available barcodes from BOLD

12862_2025_2429_MOESM7_ESM.pdf (11.4KB, pdf)

Supplementary Material 7. Chlorandrena subgenus NJ tree including CODABEILLES data along with publicly available barcodes from BOLD

12862_2025_2429_MOESM8_ESM.pdf (47.7KB, pdf)

Supplementary Material 8. Nomada genus NJ tree including CODABEILLES data along with publicly available barcodes from BOLD

Data Availability Statement

The reference library, generated under the CODABEILLES initiative for French wild bees, is available on the BOLD Public Data Portal. It can be downloaded in TSV format, including metadata and sequences, at 10.5883/DS-CODAB01. The data follow the standardized format provided by the Barcode Code Data Model Records (see the GitHub page: https://github.com/boldsystems-central/BCDM/tree/main and for details on field descriptions, refer to the field_definitions.tsv file). Records are also searchable within BOLD using the search engine. As an alternative way to explore and use these reference data, users can log in and access the library dataset directly through BOLD via the workbench platform.


Articles from BMC Ecology and Evolution are provided here courtesy of BMC

RESOURCES