Abstract
Earthworms are an important soil taxon as ecosystem engineers, providing a variety of crucial ecosystem functions and services. Little is known about their diversity and distribution at large spatial scales, despite the availability of considerable amounts of local-scale data. Earthworm diversity data, obtained from the primary literature or provided directly by authors, were collated with information on site locations, including coordinates, habitat cover, and soil properties. Datasets were required, at a minimum, to include abundance or biomass of earthworms at a site. Where possible, site-level species lists were included, as well as the abundance and biomass of individual species and ecological groups. This global dataset contains 10,840 sites, with 184 species, from 60 countries and all continents except Antarctica. The data were obtained from 182 published articles, published between 1973 and 2017, and 17 unpublished datasets. Amalgamating data into a single global database will assist researchers in investigating and answering a wide variety of pressing questions, for example, jointly assessing aboveground and belowground biodiversity distributions and drivers of biodiversity change.
Subject terms: Biodiversity, Community ecology, Biogeography
Measurement(s) | earthworm communities • Abundance • organic material • Diversity • environmental properties |
Technology Type(s) | digital curation |
Factor Type(s) | location |
Sample Characteristic - Organism | Lumbricina |
Sample Characteristic - Environment | soil |
Sample Characteristic - Location | global |
Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.13399118
Background & Summary
Soils are considered to be one of the most biodiverse terrestrial habitats1–3. Despite this, very little is known about the biodiversity that resides there compared to aboveground biodiversity, especially at the global scale1,4,5. This is surprising given the large number of local-scale biodiversity datasets available in the published literature. A number of studies have amalgamated local scale datasets, primarily for aboveground or marine organisms e.g.6,7, which can then be used for large-scale analyses e.g.8,9. Belowground biodiversity data are often overlooked in these large biodiversity databases4, and thus separate efforts to collate data are just now starting to emerge for certain belowground taxa, particularly microbes e.g.10,11.
Earthworms are involved in a large number of ecosystem functions and services, such as decomposition12, nutrient cycling13 and climate regulation14, amongst others13. In addition, they are often used as bioindicators of soil biodiversity and health15. Earthworms are relatively easy to sample; thus, a large amount of data are available16. Nevertheless, previous attempts to collate earthworm datasets have been geographically restricted17,18 or focused on country or regional species lists (e.g., DriloBASE; http://taxo.drilobase.org). By collating site-level diversity measures, we can also collect information on factors that might determine community composition, for example, measurements of soil properties or land use and cover.
Here, we describe a global database of local earthworm diversity and associated site-level characteristics from 10,840 sites in 60 countries (Fig. 1)19. Site-level information includes at least one sampled soil property, land use, and habitat cover for just over 58% of sites. Measurements of earthworm species richness (including species lists where available), total abundance, and biomass were collected at the site-level, and for some species occurrences i.e., abundance and biomass of the species recorded at a site. In addition, using expert opinion and details given by data providers, we classified each earthworm species into ecological groups based on their feeding and burrowing behaviours (epigeics, endogeics, anecics, epi-endogeics; more details below20).
The compilation of this dataset is timely. It can be used to answer long-standing questions in ecology in relation to this important belowground faunal group (e.g., global diversity patterns16). And in light of the IPBES Global Assessment21 and the loss of biodiversity, the dataset has the potential to be used to address the pressing issue of the consequences of environmental change on soil biodiversity. These data are suitable for linking with other soil databases, such as BETSI (http://betsi.cesab.org/), a database of soil organism traits22. Linking trait information with site-level diversity would then allow analyses of functional diversity. In addition, as nearly all sites have geographic coordinates, other environmental data layers (e.g., related to climate variables, land use or soil abiotic factors) could be linked to the site-level diversity measures (e.g.16,). Belowground diversity measures could also be linked to similar diversity measurements aboveground, thus enabling investigations across ecosystems to identify patterns of diversity and biodiversity changes23.
Methods
This work was conceptualised and discussed during two ‘sWorm’ workshops in 2016 and 2017, funded by sDiv, the synthesis centre of the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig. More than 20 international scientists with expertise in earthworms, soil science, and/or data management met at each of the workshops.
On 18th December 2016, Web of Science was used to search the available literature for articles that had sampled the earthworm community. Keywords were used that captured measurements of diversity of all taxa within Oligochaetes: ((Earthworm* OR Oligochaeta OR Megadril* OR Haplotaxida OR Annelid* OR Lumbric* OR Clitellat* OR Acanthodrili* OR Ailoscoleci* OR Almid* OR Benhamiin* OR riodrilid* OR Diplocard* OR Enchytraeid* OR Eudrilid* OR Exxid* OR Glossoscolecid* OR Haplotaxid* OR Hormogastrid* OR Kynotid* OR Lutodrilid* OR Megascolecid* OR Microchaetid* OR Moniligastrid* OR Ocnerodrilid* OR Octochaet* OR Sparganophilid* OR Tumakid*) AND (Diversity OR “Species richness” OR “OTU” OR Abundance OR individual* OR Density OR “tax* richness” OR “Number” OR Richness OR Biomass))
This search returned 7,783 papers. All titles and abstracts of papers post-2000 were screened (6140 papers), and were excluded if they did not make reference to data suitable for the analysis. As it was most likely that raw data would need to be requested, papers in the literature search published before 2000 were not screened and excluded, as it was unlikely that available author contact details were up-to-date. After this initial screening, PDFs of all remaining papers (n = 986) were manually screened to determine whether data were suitable (see below). 477 papers made reference to data that was suitable.
In addition, to find unpublished data or to target underrepresented regions, inquiries were made to specific earthworm researchers regarding suitable datasets (e.g., by directly contacting researchers, giving presentations at the Second Global Soil Biodiversity Conference and the International Symposium of Earthworm Ecology). No date restrictions were placed on such datasets, and thus, some were published prior to 2000.
In order to be included in the database, the individual article was required to have sampled earthworm diversity using an appropriate quantitative methodology (such as hand-sorting of a soil quadrat e.g.24, or chemical expulsion e.g.25) at two or more sites that varied in their land-use/habitat cover or soil properties. At a minimum, we required data on the total abundance or fresh biomass of earthworms at each site, and if possible, the number of species (ideally with species binomials), and the abundance and biomass of each species. In addition, geographic coordinates of the sites were required, and at each site, data collectors ideally had sampled at least one of the following soil properties: soil pH (in H2O, KCl, CaCl2), soil organic carbon (%), soil organic matter (%), sand/silt/clay content (%), soil texture (USDA classification26), Cation Exchange Capacity (CEC), Base Saturation (%), Carbon:Nitrogen ratio, soil moisture (%), and soil type (WRB/FAO classification27).
Where possible, available data were extracted from the suitable articles. For each suitable article, the meta-data (e.g., the article title and DOI) was compiled (Online-only Table 1). Data were extracted from the article text, tables, figures, or supplementary material (e.g., using ImageJ28). Where data were not given but were required (Online-only Table 2), authors of the articles were contacted and the raw data (or missing information) were requested. If the authors did not respond, and the required information could not be obtained using an alternate method, the data were not entered into the database. All data were extracted into online data templates, with data from one article (i.e., a dataset) being entered into an individual template, referred to as a ‘file’. Each file was given a unique ID, and in total 199 files were created and made open-access.
Online-only Table 1.
Field | Format | Information | Required field (*) |
---|---|---|---|
File | Text | Unique ID for each article | * |
Article_Title | Text | Title of the article the data was published in. If unpublished then NA or “Unpublished” | |
Article_Year | Integer | The year of the article was published (NA if unpublished) | |
Article_FirstAuthorSurname | Text | The surname of the first author of the article | * |
PaperContact_Surname | Text | The surname of the corresponding author of the article | |
Article_Journal | Text | The journal the article was published in (NA if unpublished) | |
Article_DOI | Text | The DOI of the article (NA if not available or unpublished) | |
Data_DOI | Text | The DOI of the data (if different from the article, NA if not available) | |
Number_of_Studies | Integer | The total number of studies in the file | * |
Total_Number_ofSites | Integer | The total number of sites across all studies in the file | * |
Total_Number_ofSpecies | Integer | The total number of species in the dataset (if given) | |
Entire.Community | Yes/no | Was the entire earthworm community sampled, or just selected species. If this information is unknown or unclear NA is given | * |
Data.From.Paper | Yes/no | Was the data taken from the paper, or did the author provide raw values | * |
Other.soil.organisms.sampled | Yes/no | Were any other soil organisms sampled at the same sites | * |
Online-only Table 2.
Field | Format | Possible Values | Information | Required field |
---|---|---|---|---|
File | Text | Unique ID for each article. Assigned to all studies within a single publication | * | |
Study_Name | Text | Unique ID for each study. Assigned to all sites within a single study | * | |
Site_Name | Text | Unique ID given to each site. A site that is sampled in different studies will have the same ID | * | |
Observational | Multiple choice | Observation; Experimental | Was the data from an observational study or an experimental study. Experimental studies may be unrealistic in their treatments, and are often over a smaller area (resulting in similar/identical coordinates) | * |
Latitude (decimal degrees) | Numerical | The latitude of the site (in decimal degrees only) | * | |
Longitude (decimal degrees) | Numerical | The longitude of the site (in decimal degrees only) | ||
Altitude (m) | Numerical | The altitude of the site (in metres only) | ||
Country | Text | The country the site was in (as given by data collector) | * | |
Sample_StartDate_Month | Integer | 1–12 | The month the sampling started | |
Sample_StartDate_Year | Integer | Less than 2018 | The year the sampling started | * |
Sample_EndDate_Month | Integer | 1–12 | The month the sampling ended | |
Sample_EndDate_Year | Integer | Less than 2018 | The year the sampling ended | |
ExtractionMethod | Multiple choice | Visual search, Hand sorting, Chemical extraction (Mustard), Chemical extraction (Formalin), Octet Method (electric shock), Hand sorting + Chemical extraction (Mustard), Hand sorting + Chemical Extraction (Formalin), Other, Other Multiple, Unknown | The methodology used to sample the earthworms | * |
Sampled Area | Numerical | The area over which sample(s) were taken. Typically the size of the quadrat or soil block | * | |
Sampled Area Unit | Multiple choice | cm2, cm3, m2, m3, Unknown, Other | The unit of the sampled area | * |
Sampling Effort | Numerical | The number of times the site was sampled to obtain the earthworm community metric(s) provided | * | |
pH | Numerical | Sampled pH of the site | At least one required | |
pH Collection Method | Multiple choice | H2O, KCl, CaCl2, Other, Unknown | The suspension solvent used in the measurement | |
pH_mean | Multiple choice | Yes/no | Denotes whether the value given for the pH is a mean from individual values across a depth profile | |
CEC | Numerical | The Cation Exchange Capacity of the site | ||
CEC_unit | Text | The unit of the CEC measurement | ||
XXX_mean | Multiple choice | Yes/no | Denotes whether the value given for the pH is a mean from individual values across a depth profile | |
Base Saturation(%) | Numerical | The base saturation of the site | ||
BaseSaturation_mean | Multiple choice | Yes/no | Denotes whether the value given for the base saturation is a mean from individual values across a depth profile | |
Organic Carbon (%) | Numerical | The organic carbon content of the soil at the site | ||
OC_mean | Multiple choice | Yes/no | Denotes whether the value given for the organic carbon is a mean from individual values across a depth profile | |
Soil Organic Matter (%) | Numerical | The soil organic matter content at the site | ||
SOM_mean | Multiple choice | Yes/no | Denotes whether the value given for the soil organic matter is a mean from individual values across a depth profile | |
C/N ratio | Numerical | The carbon to nitrogen ratio at the site | ||
CN_mean | Multiple choice | Yes/no | Denotes whether the value given for the carbon:nitrogen ratio is a mean from individual values across a depth profile | |
Sand (%)/Silt (%)/Clay (%) | Numerical | The percentage of sand, silt and clay at the site | ||
Sand_silt_clay_mean | Multiple choice | Yes/no | Denotes whether the value given for the percentage of sand, silt and clay is a mean from individual values across a depth profile | |
USDA_SoilTexture | Multiple choice | clay, sandy clay, sandy clay loam, clay loam, loam, sandy loam, loamy sand, sand, silt clay, silty clay loam, silt loam, silt | The texture of the soil at the site | |
Soil Moisture(%) | Numerical | The soil moisture at the site | ||
WRB/FAO_SoilType | Multiple choice | Acrisols, Albeluvisols, Alisols, Andosols, Anthrosols, Arenosols, Calcisols, Cambisols, Chernozem, Cryosols, Durisol, Ferralsols, Fluvisol, Gleysols, Gypsisols, Histosols, Kastanozem, Leptosols, Lixisols, Luvisols, Nitisols, Phaeozem, Planosols, Plinthosols, Podzols, Regosols, Retisols, Solonchaks, Solonetz, Technosols, Umbrisols, Vertisols | The type of soil at the site. Using the WRB/FAO classification26, but only classified when given by the data providers in the same system | |
LandUse | Multiple choice | Primary vegetation, Secondary vegetation, Production - Arable, Production - Crop plantations, Production - Wood plantation, Pasture, Urban, Unknown | The category of land use29,30 that the site was classified as. Classification was based on descriptions of the site in the text of the original publication or subsequent correspondence with the data provider | * |
HabitatCover | Multiple choice | Broadleaf evergreen forest, Broadleaf deciduous forest, Needleleaf evergreen forest, Needleleaf deciduous forest, Mixed forest, Tree open, Shrub, Herbaceous, Herbaceous with spare tree/shrub, Sparse vegetation, Cropland, Paddy field, Cropland/Other vegetation mosaic, Mangrove, Wetland, Bare area (consolidated, e.g. rock), Bare area (unconsolidated, e.g. sand), Urban, Snow/Ice, Water bodies, Unknown | The category of habitat cover (ESA CCI-LC 300 m; https://www.esa-landcover-cci.org/) that the site was classified as. Classification was based on descriptions of the site in the text of the original publication or subsequent correspondence with the data provider | * |
Management System | Multiple choice | Annual crop, Integrated systems, Perennial crops, Pastures (grazed lands), Tree plantations, Unknown, NA | The management system at the site. Sites with no management, i.e. pristine or recovering sites, were categorised as ‘NA’. Classification system based on expert opinion. | |
Tillage/Pesticide/Fertilizer/Selectively harvested/Clear cut/Fire/Grazing all year/Rotation/Monoculture/Planted | Boolean | The presence or absence of each pressure at the site. The applicability of these depended on the ‘management system’. Thus, they were left empty or filled as ‘NA’ when not applicable (see Supplementary Material 1) | ||
Habitat as described | Text | Free text field for a description of the site based on the original article or on emails from the data provider | * | |
SpeciesRichness | Numerical | The species richness at the site (if available) | At least one required | |
SpeciesRichnessUnit | Multiple choice | Number of species, Species per cm2, Species per cm3, Species per m2, Species per m3, Other | The units of the species richness value | |
Site_WetBiomass | Numerical | The total wet biomass of the site (if available) | ||
Site_WetBiomassUnits | Multiple choice | g, g/m2 | The units of the biomass value | |
Site_Abundance | Numerical | The total abundance of the site (if available) | ||
Site_Abundance Units | Multiple choice | Number of individuals, Individuals per cm2, Individuals per cm3, Individuals per m2, Individuals per m3 | The units of the abundance value |
A file could contain multiple ‘studies’, where each study was either a different sampling event i.e., multiple samples taken at the same site over time, and/or different sampling methodology. Each study was assigned a unique study ID. Sampled diversity of earthworms is highly dependent on the extraction method used29. If a dataset did not contain consistent sampling methodologies across all sites (i.e., some sites sampled with hand sorting and others hand sorting + chemical extraction), thus making it inappropriate to compare earthworm communities, the dataset was split into a separate study for each consistent methodology. If sites had been sampled multiple times, either across multiple years or within years, and the data were available for each sampling period, then only data from the first and the last sampling period were used. Each sampling period was entered as a study, which can help prevent temporal autocorrelation during analysis, e.g., when using a mixed-effects modelling approach.
A site was defined as a single location where the earthworm community was sampled using an appropriate quantitative methodology. Within each study, each site was given a unique ID (usually based on an ID given in the original source). For each site, information on the sampling methodology, soil properties, and land-use/habitat cover, along with the diversity measurements (site-level species richness, abundance and/or biomass) were entered into the data template (see Online-only Table 2 for full list of variables and the format that was required for the data template). Where possible, data were entered into the data template in the same format as given in the original source. To help enable this, columns often had separate fields to record the units. However, for some fields, values needed to be standardised prior to data entry, such as for the site coordinates and some soil properties (e.g., sand/silt/clay content).
All available and required soil properties for each site were entered into the template. Where a site had soil properties sampled at different depths (e.g., at 0–15, 15–30, and 30–40 cm), the weighted average of the values was entered into the templates. The value was then indicated as being a mean (Online-only Table 2).
The fields for habitat cover, land-use, and management system were predefined categories based on ESA CCI-LC (https://www.esa-landcover-cci.org/), the Land-use Harmonization dataset30,31 (Fig. 2), and expert opinion (during the sWorm workshops), respectively. These classification systems were chosen based on knowledge of what external pressures might be important for explaining earthworm communities, whilst also ensuring consistency across all regions of the globe. Based on information given within the published article, or from the data providers directly, every site was classified into one of the categories for each of these fields. When information was missing, sites were classified as “unknown”. Additional information on the land use and management system classification definitions shown in Tables 1 and 2, respectively.
Table 1.
Land use category | Definition |
---|---|
Primary | Relatively undisturbed natural habitat |
Secondary | Recovering, previously disturbed natural habitat |
Pasture | Land used for the grazing of livestock |
Production - Arable | Land used for crop production (e.g., wheat, rice, corn) |
Production - Plantations crops | Land used for plantations crops (e.g., coffee, vineyards, oil palm) |
Production – Wood plantations | Land used for timber production (e.g., teak) |
Urban | Land converted to dense urban settlement |
Unknown | If the land use is not given or is not clear |
Table 2.
Management Intensity measure | Annual crops | Integrated systems | Perennial crops | Pastures (grazed lands) | Tree plantations |
---|---|---|---|---|---|
Tillage | × | × | |||
Pesticide | × | × | × | × | × |
Fertilizer | × | × | × | × | × |
Selectively harvested | × | × | |||
Clear cut | × | × | |||
Fire | × | × | × | × | × |
Stocking rate | × | ||||
Grazing all-year | × | ||||
Rotation | × | × | × | ||
Monoculture | × | × | × | × | × |
Planted | × |
For each managed site (i.e., not natural vegetation) the management system could also be identified (table headers), and additional management intensity variables could be also captured (table rows). However, not every management intensity variable was applicable for each management system, thus restrictions were placed. ‘×’ indicates which management intensity variable was applicable to each management system.
As sampling effort also impacts diversity measurements32, the sampling effort at each site was recorded. Effort was recorded in two ways:
The area that was sampled, e.g., of a quadrat or soil block, or the area across all e.g., quadrats. This depended on how the data were presented.
The number of times a site was sampled, either temporally or spatially. If a site was sampled over multiple time periods, it would be the number of occasions the site was sampled. If the site had multiple samples (e.g,, multiple quadrats) and the diversity measure is an average, the sampling effort would be 1. If the diversity is a total measure (e.g., the total number of species across all quadrats) the sampling effort would be the total number of e.g., quadrats.
When datasets contained information at a higher resolution than total abundance or biomass of earthworms at a site (i.e., at ecological group, genus, or species level), this information was entered into the species occurrence table (Online-only Table 3). Each row contained a measurement of an observation (e.g. species, morphospecies, genus, life stage or ecological group) at a single site. The measurement could be the presence only, abundance, or fresh biomass of the record. Where possible, for each row we also included the life stage (adult or juvenile), whether the species was native to the location or not, and the ecological group (epigeic, endogeic, anecic, epi-endogeic). Thus, if the diversity measure was for all the juveniles at the site regardless of species, columns such as the species binomial and genus would be empty, but life stage completed. Every species binomials and ecological group assignment were checked using DriloBASE and by earthworm taxonomists (GB, MJIB, MLCB, PL), see ‘Technical Validation’.
Online-only Table 3.
Field | Format | Possible Values | Information | Required field |
---|---|---|---|---|
File | Text | Unique ID for each article. Assigned to all studies within a single publication | * | |
Study_Name | Text | Unique ID for each study. Assigned to all sites within a single study | * | |
Site_Name | Text | Unique ID given to each site. A site that is sampled in different studies will have the same ID. | * | |
OriginalSpeciesBinomial | Text | The species binomial of the observation as given by the data collector, prior to revision by earthworm experts | ||
SpeciesBinomial | Text | The species binomial of the observation (following revision by earthworm experts) | At least one required | |
MorphospeciesID | Text | An indicator (i.e., number or letter) of a observation that has been only identified to morphospecies | ||
Genus | Text | The genus of the observation | ||
Family | Text | The family of the observation | ||
Ecological_group | Multiple choice | Epigeic, Anecic, Endogeic, Epi-Endogeic, Unknown | The ecological group of the observation (following revision by earthworm experts) | |
LifeStage | Multiple choice | Adult, Juvenile, Unknown | The life stage of the observation | |
Native/Non-native | Multiple choice | Native, Non-native, Unknown | Whether the observation is native or non-native in the sampled region | |
Abundance | Numerical | The total abundance of the observation at a specific site | ||
Abundance Unit | Multiple choice | Number of individuals, Individuals per cm2, Individuals per cm3, Individuals per m2, Individuals per m3 | The units of the abundance value | |
WetBiomass | Numerical | The total biomass of the observation at a specific site | ||
WetBiomassUnits | Multiple choice | g, g/m2 | The units of the biomass value |
For each dataset, this datasheet was only used if species occurrence data were available.
Where site-level diversity measures were given by the data provider, these were entered into the site-level sheet. Where site-level diversity measures were not given, but could be calculated from the species occurrence information, that was done in R33, following data entry and prior to subsequent analyses. The species present at each site, as given in the species occurrence data, were used for calculating species richness, this included species identified as sub-species. If data collectors identified a specimen as a morphospecies (i.e., a species delineation based solely on morphological characteristics, typically identified to genus level with a unique ID differentiating from other species of the same genus, as determined by the original data collector), it was included in the species richness estimate as an additional species. Unidentified species grouped as ‘unknown’ were excluded (Fig. 3). As juveniles of many earthworm species are hard to identify to species level29,34, juveniles were excluded from the calculation (even identified at family level). All earthworms (including juveniles) found at a site were included in the total biomass and abundance calculations.
After the ecological grouping (epigeic, endogeic, anecic, and epi-endogeic) of each species had been assigned and/or checked by the earthworm taxonomists, diversity measures within each ecological group at a site were also calculated. As with the site-level metrics, the species richness within each ecological group was calculated using only species with binomials or morphospecies. Biomass and abundance of each ecological group at a site was calculated regardless of species identity. The total number of the ecological groups at each site was calculated regardless of abundance, biomass, life stage or native status of the species included (maximum ecological group richness = 4).
Data Records
The data presented here are available in the iDiv data portal (10.25829/idiv.1880-17-3189. Dataset ID: 1880)19 in a static form. In addition, the full dataset will be hosted by Edaphobase (www.portal.edaphobase). In the future, the version in Edaphobase might change (i.e., with species names revisions, or requests from the data providers) and will hopefully be added to with additional earthworm records (or other soil taxa).
The data is stored in three tables; meta-data (Online-only Table 1), site-level (Online-only Table 2), and species occurrence (Online-only Table 3). The file ID links the meta-data to the site-level data, and the Study ID and the Site ID, link the site-level data to the species occurrence table.
For all suitable datasets, the meta-data information was completed. The meta-data contains bibliographic information on the original paper which analysed, or published, the data, as well as contact information of the person who provided the raw data (not included in the release of the database for privacy reasons). The meta-data also included the number of sites and studies within the file, so that validation checks could be completed. Online-only Table 1 shows all fields within the meta-data, personal information of data providers has not been made available.
Information on all sampled sites within each dataset was recorded in the site-level table (Online-only Table 2). Each row represents a single site within a study, with information on the sampling methodology, soil properties, and how the land was used, managed, and covered. The site-level earthworm community metrics (species richness, abundance and biomass) are also included if available.
Site-level species lists, or abundance, and/or biomass measures for individual records are given in the species occurrence table (Online-only Table 1). Each row is a measurement of an observation at a site (22,690 non-zero observations in total). An observation could relate to a species (with a scientific binomial, e.g., the abundance of Lumbricus terrestris at a site, or a morphospecies identification), a genus, life stage, ecological group, or native/non-native group (e.g., the abundance of all non-native species at a site). Details of native/non-native status of a species was only available when provided by the original data collector.
Technical Validation
Templates used to enter the individual datasets were designed so that fields were only allowed certain values and formats where possible. This helped to reduce spelling errors, slight inconsistencies, and incorrect values being entered. Data providers were contacted if details within their raw data were unclear. As multiple people entered data into the templates, detailed documentation was created at the start of the project to ensure consistency amongst those involved. In addition, a subset of datasets was checked by several curators.
All earthworm species names were checked against DriloBASE (http://taxo.drilobase.org) to identify potential synonyms and spelling mistakes. Following that, earthworm specialists and taxonomists (GB, MJIB, MLCB and PL) checked the scientific names, removed synonyms and updated names if taxonomies had changed. Where ecological groupings were missing, the earthworm taxonomists also added them where possible, based on the available literature.
Usage Notes
Land-use fields were based on classification schemes, and may not be the most suitable for the analysis of earthworms. We included a free-text field (“Habitat as described”) that could be used by future researchers to define their own classification scheme for land-use or habitat cover.
As diversity measures are highly influenced by sampling methodology, we included information on sampling methods in the database (Fig. 4). In addition, we would expect that variation in diversity would differ between the individual datasets due to, for example, inter-observer variability. We highly recommend that statistical methods used on this database take these between-dataset variations into account.
Despite our efforts to obtain a global dataset, there is a geographic bias (Fig. 1), such that sites are highly clustered in certain regions (e.g., Europe), sparse in others (e.g., South America), or lacking (e.g., southern Africa, northern Russia). To reduce such biases, we attempted to contact as many researchers as possible in such areas to acquire data. Although this helped to improve the data coverage, it did not remove the gaps. We hope to address these gaps in the future, but in the meantime, researchers should be aware of the influence these biases might have on their analyses35,36.
Acknowledgements
This database and paper are a product of two sWorm workshops at sDiv, the synthesis center at iDiv. We thank M. Winter and the sDiv team for their help in organizing the sWorm workshops, and the Biodiversity Informatics Unit (BDU) at iDiv for their assistance in making the data open access. H.R.P.P., B.K-R., and the sWorm workshops were supported by the sDiv [Synthesis Centre of the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig (DFG FZT 118)]. H.R.P.P., O.F. and N.E. acknowledge funding by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 677232 to NE). K.S.R. and W.H.v.d.P. were supported by ERC-ADV grant 323020 to W.H.v.d.P. Also supported by iDiv (DFG FZT118) Flexpool proposal 34600850 (C.A.G. and N.E.); the Academy of Finland (285882) and the Natural Sciences and Engineering Research Council of Canada (postdoctoral fellowship and RGPIN-2019-05758) (E.K.C.); German Federal Ministry of Education and Research (01LO0901A) (D.J.R.); ERC-AdG 694368 (M.R.); the TULIP Laboratory of Excellence (ANR-10-LABX-41) (M.L); and the BBSRC David Phillips Fellowship to F.T.d.V. (BB/L02456X/1). In addition, data collection was funded by the Russian Foundation for Basic Research (12-04-01538-а, 12-04-01734-a, 14-44-03666-r_center_a, 15-29-02724-ofi_m, 16-04-01878-a 19-05-00245, 19-04-00-609-a); Tarbiat Modares University; Aurora Organic Dairy; UGC(NERO) (F. 1-6/Acctt./NERO/2007-08/1485); Natural Sciences and Engineering Research Council (RGPIN-2017-05391); Slovak Research and Development Agency (APVV-0098-12); Science for Global Development through Wageningen University; Norman Borlaug LEAP Programme and International Atomic Energy Agency (IAEA); São Paulo Research Foundation - FAPESP (12/22510-8); Oklahoma Agricultural Experiment Station; INIA - Spanish Agency (SUM 2006-00012-00-0); Royal Canadian Geographical Society; Environmental Protection Agency (Ireland) (2005-S-LS-8); University of Hawai’i at Mānoa (HAW01127H; HAW01123M); European Union FP7 (FunDivEurope, 265171; ROUTES 265156); U.S. Department of the Navy, Commander Pacific Fleet (W9126G-13-2-0047); Science and Engineering Research Board (SB/SO/AS-030/2013) Department of Science and Technology, New Delhi, India; Strategic Environmental Research and Development Program (SERDP) of the U.S. Department of Defense (RC-1542); Maranhão State Research Foundation (FAPEMA 03135/13, 02471/17); Coordination for the Improvement of Higher Education Personnel (CAPES 3281/2013); Ministry of Education, Youth and Sports of the Czech Republic (LTT17033); Colorado Wheat Research Foundation; Zone Atelier Alpes, French National Research Agency (ANR-11-BSV7-020-01, ANR-09-STRA-02-01, ANR 06 BIODIV 009-01); Austrian Science Fund (P16027, T441); Landwirtschaftliche Rentenbank Frankfurt am Main; Welsh Government and the European Agricultural Fund for Rural Development (Project Ref. A AAB 62 03 qA731606); SÉPAQ, Ministry of Agriculture and Forestry of Finland; Science Foundation Ireland (EEB0061); University of Toronto (Faculty of Forestry); National Science and Engineering Research Council of Canada; Haliburton Forest & Wildlife Reserve; NKU College of Arts & Sciences Grant; Österreichische Forschungsförderungsgesellschaft (837393 and 837426); Mountain Agriculture Research Unit of the University of Innsbruck; Higher Education Commission of Pakistan; Kerala Forest Research Institute, Peechi, Kerala; UNEP/GEF/TSBF-CIAT Project on Conservation and Sustainable Management of Belowground Biodiversity; Ministry of Agriculture and Forestry of Finland; Complutense University of Madrid/European Union FP7 project BioBio (FPU UCM 613520); GRDC; AWI; LWRRDC; DRDC; CONICET (National Scientific and Technical Research Council) and FONCyT (National Agency of Scientific and Technological Promotion) (PICT, PAE, PIP), Universidad Nacional de Luján y FONCyT (PICT 2293 (2006)); Fonds de recherche sur la nature et les technologies du Québec (131894); Deutsche Forschungsgemeinschaft (SCHR1000/3-1, SCHR1000/6-1, 6-2 (FOR 1598), WO 670/7-1, WO 670/7-2, & SCHA 1719/1-2), CONACYT (FONDOS MIXTOS TABASCO/PROYECTO11316); NSF (DGE-0549245, DGE-0549245, DEB-BE-0909452, NSF1241932, LTER Program DEB-97–14835); Institute for Environmental Science and Policy at the University of Illinois at Chicago; Dean’s Scholar Program at UIC; Garden Club of America Zone VI Fellowship in Urban Forestry from the Casey Tree Endowment Fund; J.E. Weaver Competitive Grant from the Nebraska Chapter of The Nature Conservancy; The College of Liberal Arts and Sciences at Depaul University; Elmore Hadley Award for Research in Ecology and Evolution from the UIC Dept. of Biological Sciences, Spanish CICYT (AMB96-1161; REN2000-0783/GLO; REN2003-05553/GLO; REN2003-03989/GLO; CGL2007-60661/BOS); Yokohama National University; MEXT KAKENHI (25220104); Japan Society for the Promotion of Science KAKENHI (25281053, 17KT0074, 25252026); ADEME (0775C0035); Ministry of Science, Innovation and Universities of Spain (CGL2017-86926-P); Syngenta Philippines; UPSTREAM; LTSER (Val Mazia/Matschertal); Marie Sklodowska Curie Postdoctoral Fellowship (747607); National Science & Technology Base Resource Survey Project of China (2018FY100306); McKnight Foundation (14–168); Program of Fundamental Researches of Presidium of Russian Academy of Sciences (AААА-A18–118021490070–5); Brazilian National Council for Scientific and Technological Development (CNPq 310690/2017–0, 404191/2019–3, 307486/2013–3); French Ministry of Foreign and European Affairs; Bavarian Ministry for Food, Agriculture and Forestry (Project No B62); INRA AIDY project; MIUR PRIN 2008; Idaho Agricultural Experiment Station; Estonian Science Foundation; Ontario Ministry of the Environment, Canada; Russian Science Foundation (16-17-10284); National Natural Science Foundation of China (41371270); Australian Research Council (FT120100463); USDA Forest Service-IITF. The authors would like to thank all supervisors, students, collaborators, technicians, data analysts, land owners/managers, and anyone else involved with the collection, processing, and/or publication of the primary datasets, both for this manuscript and16. Namely: Peter M. Kotanen, Jessica G. Davis, S.N. Ramanujam, J.M. Julka, Csaba Csuzdi, P. Bescansa, M. Moriones, C. González, Creighton Litton, Danielle Celentano, Sandriel Sousa, Samuel James, C. Hakseth, C. Mills, Hirohi Takeda, Sandriel Sousa Costa, Kyungsoo Yoo, Sebastien De Danieli, Philippe Choler, Pierre Taberlet, Lauric Cecillon, Erwin Meyer, Felix Gerlach, Doris Beutler, Christina Marley, Rhun Fychan, Ruth Sanderson, Mervi Nieminen, Taisto Sirén, Mariana Alem, Carlos Regalsky, Tara Sackett, Erin Bayne, Sarah Hamilton, Alexander Rief, Catarina Praxedes, Rosana Sandler, Juliane Palm, Anne Zangerlé, Anne-Kathrin Schneider, Erwin Zehe, David H. Wise, Liam Heneghan, Yoshikazu Kawaguchi, Irene L. López-Sañudo, Almudena Mateos, Pilar Meléndez, Raquel Santos, Marta Yebra, Tamara Vsevolodova-Perel, Maxim Bobrovsky, Natalya Ivanova, Eufemio Rasco Jr., Robert W. Mysłajek, Jianxiong Li, Jiangping Qiu, A. Barne, Antonio Gómez-Sal, Tanya Handa, Mark Vellend, Hans de Wandeler, Sarah Placella, Lee Frelich, Peter Reich. Open Access funding enabled and organized by Projekt DEAL.
Online-only Tables
Author contributions
The sWorm workshops were organised by N.E., E.K.C. and H.R.P.P., with funding acquired by N.E., E.K.C. and M.P.T. Data collation and formatting was led by H.R.P.P., with assistance from J.K., M.J.I.B., G.B., K.B.G. and B.S. Harmonisation of earthworm species names was completed by G.B., M.J.I.B., M.L.C.B. and P.L. Advice and feedback on data collation protocols was provided by E.M.B., M.J.I.B., G.B., O.F., C.A.G., B.K.R., A.O., D.R., and D.H.W. Writing of the manuscript was led by H.R.P.P. All authors provided input and comments on the manuscript. The majority of authors provided data to the database.
Code availability
All code used to format and clean the dataset for publication is available on GitHub (www.github.com/helenphillips).
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Erin K. Cameron, Nico Eisenhauer.
References
- 1.Giller PS. The diversity of soil communities, the poor man’s tropical rainforest? Biodivers. Conserv. 1996;5:135–168. doi: 10.1007/BF00055827. [DOI] [Google Scholar]
- 2.Decaëns T, Jiménez JJ, Gioia C, Measey GJ, Lavelle P. The values of soil animals for conservation biology. Eur. J. Soil Biol. 2006;42:S23–S38. doi: 10.1016/j.ejsobi.2006.07.001. [DOI] [Google Scholar]
- 3.Bardgett RD, van der Putten WH. Belowground biodiversity and ecosystem functioning. Nat. 2014;515 505:505–511. doi: 10.1038/nature13855. [DOI] [PubMed] [Google Scholar]
- 4.Phillips HRP, et al. Red list of a black box. Nat. Ecol. Evol. 2017;1:0103. doi: 10.1038/s41559-017-0103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Orgiazzi, A. et al. Global Soil Biodiversity Atlas. European Commission, Publications (2016).
- 6.Dornelas M, et al. BioTIME: A database of biodiversity time series for the Anthropocene. Glob. Ecol. Biogeogr. 2018;27:760–786. doi: 10.1111/geb.12729. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hudson LN, et al. The database of the PREDICTS (Projecting Responses of Ecological Diversity In Changing Terrestrial Systems) project. Ecol. Evol. 2017;7:145–188. doi: 10.1002/ece3.2579. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Dornelas M, et al. Assemblage time series reveal biodiversity change but not systematic loss. Science. 2014;344:296–299. doi: 10.1126/science.1248484. [DOI] [PubMed] [Google Scholar]
- 9.Newbold T, et al. Global effects of land use on local terrestrial biodiversity. Nature. 2015;520:45–50. doi: 10.1038/nature14324. [DOI] [PubMed] [Google Scholar]
- 10.Ramirez KS, et al. Detecting macroecological patterns in bacterial communities across independent studies of global soils. Nat. Microbiol. 2018;3:189–196. doi: 10.1038/s41564-017-0062-x. [DOI] [PubMed] [Google Scholar]
- 11.Delgado-Baquerizo M, et al. A global atlas of the dominant bacteria found in soil. Science. 2018;359:320–325. doi: 10.1126/science.aap9516. [DOI] [PubMed] [Google Scholar]
- 12.Milcu A, Partsch S, Scherber C, Weisser WW, Scheu S. Earthworms and legumes control litter decomposition in a plant diversity gradient. Ecology. 2008;89:1872–1882. doi: 10.1890/07-1377.1. [DOI] [PubMed] [Google Scholar]
- 13.Blouin M, et al. A review of earthworm impact on soil function and ecosystem services. Eur. J. Soil Sci. 2013;64:161–182. doi: 10.1111/ejss.12025. [DOI] [Google Scholar]
- 14.Zhang W, et al. Earthworms facilitate carbon sequestration through unequal amplification of carbon stabilization compared with mineralization. Nat. Commun. 2013;4:2576. doi: 10.1038/ncomms3576. [DOI] [PubMed] [Google Scholar]
- 15.Paoletti MG. The role of earthworms for assessment of sustainability and as bioindicators. Agric. Ecosyst. Environ. 1999;74:137–155. doi: 10.1016/S0167-8809(99)00034-1. [DOI] [Google Scholar]
- 16.Phillips HRP, et al. Global distribution of earthworm diversity. Science. 2019;366:480–485. doi: 10.1126/science.aax4851. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Rutgers M, et al. Mapping earthworm communities in Europe. Appl. Soil Ecol. 2016;97:98–111. doi: 10.1016/j.apsoil.2015.08.015. [DOI] [Google Scholar]
- 18.Burkhardt U, et al. The Edaphobase project of GBIF-Germany-A new online soil-zoological data warehouse. Appl. Soil Ecol. 2014;83:3–12. doi: 10.1016/j.apsoil.2014.03.021. [DOI] [Google Scholar]
- 19.Phillips HRP, 2020. Global data on earthworm abundance, biomass, diversity and corresponding environmental properties. (iDiv Data Repository) German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig. [DOI]
- 20.Bouché, M. B. Strategies lombriciennes. Ecol. Bull. 122–132 (1977).
- 21.IPBES. Summary for policymakers of the global assessment report on biodiversity and ecosystem services of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services. 56 (2019).
- 22.Pey B, et al. Current use of and future needs for soil invertebrate functional traits in community ecology. Basic Appl. Ecol. 2014;15:194–206. doi: 10.1016/j.baae.2014.03.007. [DOI] [Google Scholar]
- 23.Cameron, E. K. et al. Global mismatches in aboveground and belowground biodiversity. Conserv. Biol. 430 (2019) [DOI] [PubMed]
- 24.Anderson, J. M. & Ingram, J. S. I. Tropical Soil Biology and Fertility: A handbook of methods. Trop. Soil Biol. Fertil. A Handb. methods 2 Ed., 88–91 (1993).
- 25.ISO. Soil quality – Sampling of soil invertebrates – Part 1: Hand-sorting and extraction of earthworms (ISO/FDIS 23611-1:2012). (2012).
- 26.USDA. Soil Survey Manual Agriculture. Handbook 18. USDA, Nat. Resour. Conserv. Serv. (2017)
- 27.FAO/WRB. World reference base for soil resources 2014. World Soil Resources Reports No. 106 (2014).
- 28.Schindelin J, et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods. 2012;9:676–682. doi: 10.1038/nmeth.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Bartlett MD, et al. A critical review of current methods in earthworm ecology: From individuals to populations. Eur. J. Soil Biol. 2010;46:67–73. doi: 10.1016/j.ejsobi.2009.11.006. [DOI] [Google Scholar]
- 30.Hoskins AJ, et al. Downscaling land-use data to provide global 30” estimates of five land-use classes. Ecol. Evol. 2016;6:3040–3055. doi: 10.1002/ece3.2104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Hurtt GC, et al. Harmonization of land-use scenarios for the period 1500–2100: 600 years of global gridded annual land-use transitions, wood harvest, and resulting secondary lands. Clim. Change. 2011;109:117–161. doi: 10.1007/s10584-011-0153-2. [DOI] [Google Scholar]
- 32.Magurran, A. E. Measuring biological diversity. (John Wiley & Sons, 2004).
- 33. R Core Team. R: A language and environment for statistical computing. (2016).
- 34.Sims RW, Gerard BM. Earthworms. Keys and notes for the identification and study of the species. New Zeal. J. Zool. 1988;15:447–448. doi: 10.1080/03014223.1988.10422974. [DOI] [Google Scholar]
- 35.Gonzalez A, et al. Estimating local biodiversity change: a critique of papers claiming no net loss of local diversity. Ecology. 2016;97:1949–1960. doi: 10.1890/15-1759.1. [DOI] [PubMed] [Google Scholar]
- 36.Cameron EK, et al. Global gaps in soil biodiversity data. Nat. Ecol. Evol. 2018;2:1042–1043. doi: 10.1038/s41559-018-0573-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Phillips HRP, 2020. Global data on earthworm abundance, biomass, diversity and corresponding environmental properties. (iDiv Data Repository) German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig. [DOI]
Data Availability Statement
All code used to format and clean the dataset for publication is available on GitHub (www.github.com/helenphillips).