Skip to main content
Scientific Data logoLink to Scientific Data
. 2021 May 21;8:136. doi: 10.1038/s41597-021-00912-z

Global data on earthworm abundance, biomass, diversity and corresponding environmental properties

Helen R P Phillips 1,2,3,, Elizabeth M Bach 4,5, Marie L C Bartz 6,7, Joanne M Bennett 1,8,9, Rémy Beugnon 1,2, Maria J I Briones 10, George G Brown 11, Olga Ferlian 1,2, Konstantin B Gongalsky 12,13, Carlos A Guerra 1,8, Birgitta König-Ries 1,14, Julia J Krebs 1,2, Alberto Orgiazzi 15, Kelly S Ramirez 16, David J Russell 17, Benjamin Schwarz 18, Diana H Wall 4,5, Ulrich Brose 1,19, Thibaud Decaëns 20, Patrick Lavelle 21, Michel Loreau 22, Jérôme Mathieu 23,24, Christian Mulder 25, Wim H van der Putten 16,26, Matthias C Rillig 27, Madhav P Thakur 16, Franciska T de Vries 28, David A Wardle 29, Christian Ammer 30,31, Sabine Ammer 32, Miwa Arai 33, Fredrick O Ayuke 34,35, Geoff H Baker 36, Dilmar Baretta 37, Dietmar Barkusky 38, Robin Beauséjour 39, Jose C Bedano 40, Klaus Birkhofer 41, Eric Blanchart 42, Bernd Blossey 43, Thomas Bolger 44,45, Robert L Bradley 39, Michel Brossard 42, James C Burtis 46, Yvan Capowiez 47, Timothy R Cavagnaro 48, Amy Choi 49, Julia Clause 50, Daniel Cluzeau 51, Anja Coors 52, Felicity V Crotty 53,54, Jasmine M Crumsey 55, Andrea Dávalos 56, Darío J Díaz Cosín 57, Annise M Dobson 58, Anahí Domínguez 40, Andrés Esteban Duhour 59, Nick van Eekeren 60, Christoph Emmerling 61, Liliana B Falco 62, Rosa Fernández 63, Steven J Fonte 64, Carlos Fragoso 65, André L C Franco 66, Abegail Fusilero 67,68, Anna P Geraskina 69, Shaieste Gholami 70, Grizelle González 71, Michael J Gundale 72, Mónica Gutiérrez López 57, Branimir K Hackenberger 73, Davorka K Hackenberger 73, Luis M Hernández 74, Jeff R Hirth 75, Takuo Hishi 76, Andrew R Holdsworth 77, Martin Holmstrup 78, Kristine N Hopfensperger 79, Esperanza Huerta Lwanga 80,81, Veikko Huhta 82, Tunsisa T Hurisso 64,83, Basil V Iannone III 84, Madalina Iordache 85, Ulrich Irmler 86, Mari Ivask 87, Juan B Jesús 57, Jodi L Johnson-Maynard 88, Monika Joschko 38, Nobuhiro Kaneko 89, Radoslava Kanianska 90, Aidan M Keith 91, Maria L Kernecker 92, Armand W Koné 93, Yahya Kooch 94, Sanna T Kukkonen 95, H Lalthanzara 96, Daniel R Lammel 27, Iurii M Lebedev 12,13,97, Edith Le Cadre 98, Noa K Lincoln 99, Danilo López-Hernández 100, Scott R Loss 101, Raphael Marichal 102, Radim Matula 103, Yukio Minamiya 104, Jan Hendrik Moos 105,106, Gerardo Moreno 107, Alejandro Morón-Ríos 108, Hasegawa Motohiro 109, Bart Muys 110, Johan Neirynck 111, Lindsey Norgrove 112, Marta Novo 57, Visa Nuutinen 113, Victoria Nuzzo 114, P Mujeeb Rahman 115, Johan Pansu 116,117, Shishir Paudel 101,118, Guénola Pérès 51,119, Lorenzo Pérez-Camacho 120, Jean-François Ponge 121, Jörg Prietzel 122, Irina B Rapoport 123, Muhammad Imtiaz Rashid 124, Salvador Rebollo 120, Miguel Á Rodríguez 125, Alexander M Roth 126,127, Guillaume X Rousseau 74,128, Anna Rozen 129, Ehsan Sayad 70, Loes van Schaik 81, Bryant Scharenbroch 130,131, Michael Schirrmann 132, Olaf Schmidt 133,134, Boris Schröder 135, Julia Seeber 136,137, Maxim P Shashkov 138,139, Jaswinder Singh 140, Sandy M Smith 49, Michael Steinwandter 137, Katalin Szlavecz 141, José Antonio Talavera 142, Dolores Trigo 57, Jiro Tsukamoto 143, Sheila Uribe-López 144, Anne W de Valença 145, Iñigo Virto 146, Adrian A Wackett 147, Matthew W Warren 148, Emily R Webster 149, Nathaniel H Wehr 150, Joann K Whalen 151, Michael B Wironen 152, Volkmar Wolters 153, Pengfei Wu 154, Irina V Zenkova 155, Weixin Zhang 156, Erin K Cameron 3,157,#, Nico Eisenhauer 1,2,#
PMCID: PMC8140120  PMID: 34021166

Abstract

Earthworms are an important soil taxon as ecosystem engineers, providing a variety of crucial ecosystem functions and services. Little is known about their diversity and distribution at large spatial scales, despite the availability of considerable amounts of local-scale data. Earthworm diversity data, obtained from the primary literature or provided directly by authors, were collated with information on site locations, including coordinates, habitat cover, and soil properties. Datasets were required, at a minimum, to include abundance or biomass of earthworms at a site. Where possible, site-level species lists were included, as well as the abundance and biomass of individual species and ecological groups. This global dataset contains 10,840 sites, with 184 species, from 60 countries and all continents except Antarctica. The data were obtained from 182 published articles, published between 1973 and 2017, and 17 unpublished datasets. Amalgamating data into a single global database will assist researchers in investigating and answering a wide variety of pressing questions, for example, jointly assessing aboveground and belowground biodiversity distributions and drivers of biodiversity change.

Subject terms: Biodiversity, Community ecology, Biogeography


Measurement(s) earthworm communities • Abundance • organic material • Diversity • environmental properties
Technology Type(s) digital curation
Factor Type(s) location
Sample Characteristic - Organism Lumbricina
Sample Characteristic - Environment soil
Sample Characteristic - Location global

Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.13399118

Background & Summary

Soils are considered to be one of the most biodiverse terrestrial habitats13. Despite this, very little is known about the biodiversity that resides there compared to aboveground biodiversity, especially at the global scale1,4,5. This is surprising given the large number of local-scale biodiversity datasets available in the published literature. A number of studies have amalgamated local scale datasets, primarily for aboveground or marine organisms e.g.6,7, which can then be used for large-scale analyses e.g.8,9. Belowground biodiversity data are often overlooked in these large biodiversity databases4, and thus separate efforts to collate data are just now starting to emerge for certain belowground taxa, particularly microbes e.g.10,11.

Earthworms are involved in a large number of ecosystem functions and services, such as decomposition12, nutrient cycling13 and climate regulation14, amongst others13. In addition, they are often used as bioindicators of soil biodiversity and health15. Earthworms are relatively easy to sample; thus, a large amount of data are available16. Nevertheless, previous attempts to collate earthworm datasets have been geographically restricted17,18 or focused on country or regional species lists (e.g., DriloBASE; http://taxo.drilobase.org). By collating site-level diversity measures, we can also collect information on factors that might determine community composition, for example, measurements of soil properties or land use and cover.

Here, we describe a global database of local earthworm diversity and associated site-level characteristics from 10,840 sites in 60 countries (Fig. 1)19. Site-level information includes at least one sampled soil property, land use, and habitat cover for just over 58% of sites. Measurements of earthworm species richness (including species lists where available), total abundance, and biomass were collected at the site-level, and for some species occurrences i.e., abundance and biomass of the species recorded at a site. In addition, using expert opinion and details given by data providers, we classified each earthworm species into ecological groups based on their feeding and burrowing behaviours (epigeics, endogeics, anecics, epi-endogeics; more details below20).

Fig. 1.

Fig. 1

Locations of the 276 studies included in the database. Each circle represents the centre of a study (a collection of sites where earthworms were sampled with a consistent method). The size of the circle indicates the number of sites within the study. Transparency is used only for aiding visualisation.

The compilation of this dataset is timely. It can be used to answer long-standing questions in ecology in relation to this important belowground faunal group (e.g., global diversity patterns16). And in light of the IPBES Global Assessment21 and the loss of biodiversity, the dataset has the potential to be used to address the pressing issue of the consequences of environmental change on soil biodiversity. These data are suitable for linking with other soil databases, such as BETSI (http://betsi.cesab.org/), a database of soil organism traits22. Linking trait information with site-level diversity would then allow analyses of functional diversity. In addition, as nearly all sites have geographic coordinates, other environmental data layers (e.g., related to climate variables, land use or soil abiotic factors) could be linked to the site-level diversity measures (e.g.16,). Belowground diversity measures could also be linked to similar diversity measurements aboveground, thus enabling investigations across ecosystems to identify patterns of diversity and biodiversity changes23.

Methods

This work was conceptualised and discussed during two ‘sWorm’ workshops in 2016 and 2017, funded by sDiv, the synthesis centre of the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig. More than 20 international scientists with expertise in earthworms, soil science, and/or data management met at each of the workshops.

On 18th December 2016, Web of Science was used to search the available literature for articles that had sampled the earthworm community. Keywords were used that captured measurements of diversity of all taxa within Oligochaetes: ((Earthworm* OR Oligochaeta OR Megadril* OR Haplotaxida OR Annelid* OR Lumbric* OR Clitellat* OR Acanthodrili* OR Ailoscoleci* OR Almid* OR Benhamiin* OR riodrilid* OR Diplocard* OR Enchytraeid* OR Eudrilid* OR Exxid* OR Glossoscolecid* OR Haplotaxid* OR Hormogastrid* OR Kynotid* OR Lutodrilid* OR Megascolecid* OR Microchaetid* OR Moniligastrid* OR Ocnerodrilid* OR Octochaet* OR Sparganophilid* OR Tumakid*) AND (Diversity OR “Species richness” OR “OTU” OR Abundance OR individual* OR Density OR “tax* richness” OR “Number” OR Richness OR Biomass))

This search returned 7,783 papers. All titles and abstracts of papers post-2000 were screened (6140 papers), and were excluded if they did not make reference to data suitable for the analysis. As it was most likely that raw data would need to be requested, papers in the literature search published before 2000 were not screened and excluded, as it was unlikely that available author contact details were up-to-date. After this initial screening, PDFs of all remaining papers (n = 986) were manually screened to determine whether data were suitable (see below). 477 papers made reference to data that was suitable.

In addition, to find unpublished data or to target underrepresented regions, inquiries were made to specific earthworm researchers regarding suitable datasets (e.g., by directly contacting researchers, giving presentations at the Second Global Soil Biodiversity Conference and the International Symposium of Earthworm Ecology). No date restrictions were placed on such datasets, and thus, some were published prior to 2000.

In order to be included in the database, the individual article was required to have sampled earthworm diversity using an appropriate quantitative methodology (such as hand-sorting of a soil quadrat e.g.24, or chemical expulsion e.g.25) at two or more sites that varied in their land-use/habitat cover or soil properties. At a minimum, we required data on the total abundance or fresh biomass of earthworms at each site, and if possible, the number of species (ideally with species binomials), and the abundance and biomass of each species. In addition, geographic coordinates of the sites were required, and at each site, data collectors ideally had sampled at least one of the following soil properties: soil pH (in H2O, KCl, CaCl2), soil organic carbon (%), soil organic matter (%), sand/silt/clay content (%), soil texture (USDA classification26), Cation Exchange Capacity (CEC), Base Saturation (%), Carbon:Nitrogen ratio, soil moisture (%), and soil type (WRB/FAO classification27).

Where possible, available data were extracted from the suitable articles. For each suitable article, the meta-data (e.g., the article title and DOI) was compiled (Online-only Table 1). Data were extracted from the article text, tables, figures, or supplementary material (e.g., using ImageJ28). Where data were not given but were required (Online-only Table 2), authors of the articles were contacted and the raw data (or missing information) were requested. If the authors did not respond, and the required information could not be obtained using an alternate method, the data were not entered into the database. All data were extracted into online data templates, with data from one article (i.e., a dataset) being entered into an individual template, referred to as a ‘file’. Each file was given a unique ID, and in total 199 files were created and made open-access.

Online-only Table 1.

Information captured in the data template relating to meta-data.

Field Format Information Required field (*)
File Text Unique ID for each article *
Article_Title Text Title of the article the data was published in. If unpublished then NA or “Unpublished”
Article_Year Integer The year of the article was published (NA if unpublished)
Article_FirstAuthorSurname Text The surname of the first author of the article *
PaperContact_Surname Text The surname of the corresponding author of the article
Article_Journal Text The journal the article was published in (NA if unpublished)
Article_DOI Text The DOI of the article (NA if not available or unpublished)
Data_DOI Text The DOI of the data (if different from the article, NA if not available)
Number_of_Studies Integer The total number of studies in the file *
Total_Number_ofSites Integer The total number of sites across all studies in the file *
Total_Number_ofSpecies Integer The total number of species in the dataset (if given)
Entire.Community Yes/no Was the entire earthworm community sampled, or just selected species. If this information is unknown or unclear NA is given *
Data.From.Paper Yes/no Was the data taken from the paper, or did the author provide raw values *
Other.soil.organisms.sampled Yes/no Were any other soil organisms sampled at the same sites *

Online-only Table 2.

Information captured in the data template, relating to site-level information.

Field Format Possible Values Information Required field
File Text Unique ID for each article. Assigned to all studies within a single publication *
Study_Name Text Unique ID for each study. Assigned to all sites within a single study *
Site_Name Text Unique ID given to each site. A site that is sampled in different studies will have the same ID *
Observational Multiple choice Observation; Experimental Was the data from an observational study or an experimental study. Experimental studies may be unrealistic in their treatments, and are often over a smaller area (resulting in similar/identical coordinates) *
Latitude (decimal degrees) Numerical The latitude of the site (in decimal degrees only) *
Longitude (decimal degrees) Numerical The longitude of the site (in decimal degrees only)
Altitude (m) Numerical The altitude of the site (in metres only)
Country Text The country the site was in (as given by data collector) *
Sample_StartDate_Month Integer 1–12 The month the sampling started
Sample_StartDate_Year Integer Less than 2018 The year the sampling started *
Sample_EndDate_Month Integer 1–12 The month the sampling ended
Sample_EndDate_Year Integer Less than 2018 The year the sampling ended
ExtractionMethod Multiple choice Visual search, Hand sorting, Chemical extraction (Mustard), Chemical extraction (Formalin), Octet Method (electric shock), Hand sorting + Chemical extraction (Mustard), Hand sorting + Chemical Extraction (Formalin), Other, Other Multiple, Unknown The methodology used to sample the earthworms *
Sampled Area Numerical The area over which sample(s) were taken. Typically the size of the quadrat or soil block *
Sampled Area Unit Multiple choice cm2, cm3, m2, m3, Unknown, Other The unit of the sampled area *
Sampling Effort Numerical The number of times the site was sampled to obtain the earthworm community metric(s) provided *
pH Numerical Sampled pH of the site At least one required
pH Collection Method Multiple choice H2O, KCl, CaCl2, Other, Unknown The suspension solvent used in the measurement
pH_mean Multiple choice Yes/no Denotes whether the value given for the pH is a mean from individual values across a depth profile
CEC Numerical The Cation Exchange Capacity of the site
CEC_unit Text The unit of the CEC measurement
XXX_mean Multiple choice Yes/no Denotes whether the value given for the pH is a mean from individual values across a depth profile
Base Saturation(%) Numerical The base saturation of the site
BaseSaturation_mean Multiple choice Yes/no Denotes whether the value given for the base saturation is a mean from individual values across a depth profile
Organic Carbon (%) Numerical The organic carbon content of the soil at the site
OC_mean Multiple choice Yes/no Denotes whether the value given for the organic carbon is a mean from individual values across a depth profile
Soil Organic Matter (%) Numerical The soil organic matter content at the site
SOM_mean Multiple choice Yes/no Denotes whether the value given for the soil organic matter is a mean from individual values across a depth profile
C/N ratio Numerical The carbon to nitrogen ratio at the site
CN_mean Multiple choice Yes/no Denotes whether the value given for the carbon:nitrogen ratio is a mean from individual values across a depth profile
Sand (%)/Silt (%)/Clay (%) Numerical The percentage of sand, silt and clay at the site
Sand_silt_clay_mean Multiple choice Yes/no Denotes whether the value given for the percentage of sand, silt and clay is a mean from individual values across a depth profile
USDA_SoilTexture Multiple choice clay, sandy clay, sandy clay loam, clay loam, loam, sandy loam, loamy sand, sand, silt clay, silty clay loam, silt loam, silt The texture of the soil at the site
Soil Moisture(%) Numerical The soil moisture at the site
WRB/FAO_SoilType Multiple choice Acrisols, Albeluvisols, Alisols, Andosols, Anthrosols, Arenosols, Calcisols, Cambisols, Chernozem, Cryosols, Durisol, Ferralsols, Fluvisol, Gleysols, Gypsisols, Histosols, Kastanozem, Leptosols, Lixisols, Luvisols, Nitisols, Phaeozem, Planosols, Plinthosols, Podzols, Regosols, Retisols, Solonchaks, Solonetz, Technosols, Umbrisols, Vertisols The type of soil at the site. Using the WRB/FAO classification26, but only classified when given by the data providers in the same system
LandUse Multiple choice Primary vegetation, Secondary vegetation, Production - Arable, Production - Crop plantations, Production - Wood plantation, Pasture, Urban, Unknown The category of land use29,30 that the site was classified as. Classification was based on descriptions of the site in the text of the original publication or subsequent correspondence with the data provider *
HabitatCover Multiple choice Broadleaf evergreen forest, Broadleaf deciduous forest, Needleleaf evergreen forest, Needleleaf deciduous forest, Mixed forest, Tree open, Shrub, Herbaceous, Herbaceous with spare tree/shrub, Sparse vegetation, Cropland, Paddy field, Cropland/Other vegetation mosaic, Mangrove, Wetland, Bare area (consolidated, e.g. rock), Bare area (unconsolidated, e.g. sand), Urban, Snow/Ice, Water bodies, Unknown The category of habitat cover (ESA CCI-LC 300 m; https://www.esa-landcover-cci.org/) that the site was classified as. Classification was based on descriptions of the site in the text of the original publication or subsequent correspondence with the data provider *
Management System Multiple choice Annual crop, Integrated systems, Perennial crops, Pastures (grazed lands), Tree plantations, Unknown, NA The management system at the site. Sites with no management, i.e. pristine or recovering sites, were categorised as ‘NA’. Classification system based on expert opinion.
Tillage/Pesticide/Fertilizer/Selectively harvested/Clear cut/Fire/Grazing all year/Rotation/Monoculture/Planted Boolean The presence or absence of each pressure at the site. The applicability of these depended on the ‘management system’. Thus, they were left empty or filled as ‘NA’ when not applicable (see Supplementary Material 1)
Habitat as described Text Free text field for a description of the site based on the original article or on emails from the data provider *
SpeciesRichness Numerical The species richness at the site (if available) At least one required
SpeciesRichnessUnit Multiple choice Number of species, Species per cm2, Species per cm3, Species per m2, Species per m3, Other The units of the species richness value
Site_WetBiomass Numerical The total wet biomass of the site (if available)
Site_WetBiomassUnits Multiple choice g, g/m2 The units of the biomass value
Site_Abundance Numerical The total abundance of the site (if available)
Site_Abundance Units Multiple choice Number of individuals, Individuals per cm2, Individuals per cm3, Individuals per m2, Individuals per m3 The units of the abundance value

A file could contain multiple ‘studies’, where each study was either a different sampling event i.e., multiple samples taken at the same site over time, and/or different sampling methodology. Each study was assigned a unique study ID. Sampled diversity of earthworms is highly dependent on the extraction method used29. If a dataset did not contain consistent sampling methodologies across all sites (i.e., some sites sampled with hand sorting and others hand sorting + chemical extraction), thus making it inappropriate to compare earthworm communities, the dataset was split into a separate study for each consistent methodology. If sites had been sampled multiple times, either across multiple years or within years, and the data were available for each sampling period, then only data from the first and the last sampling period were used. Each sampling period was entered as a study, which can help prevent temporal autocorrelation during analysis, e.g., when using a mixed-effects modelling approach.

A site was defined as a single location where the earthworm community was sampled using an appropriate quantitative methodology. Within each study, each site was given a unique ID (usually based on an ID given in the original source). For each site, information on the sampling methodology, soil properties, and land-use/habitat cover, along with the diversity measurements (site-level species richness, abundance and/or biomass) were entered into the data template (see Online-only Table 2 for full list of variables and the format that was required for the data template). Where possible, data were entered into the data template in the same format as given in the original source. To help enable this, columns often had separate fields to record the units. However, for some fields, values needed to be standardised prior to data entry, such as for the site coordinates and some soil properties (e.g., sand/silt/clay content).

All available and required soil properties for each site were entered into the template. Where a site had soil properties sampled at different depths (e.g., at 0–15, 15–30, and 30–40 cm), the weighted average of the values was entered into the templates. The value was then indicated as being a mean (Online-only Table 2).

The fields for habitat cover, land-use, and management system were predefined categories based on ESA CCI-LC (https://www.esa-landcover-cci.org/), the Land-use Harmonization dataset30,31 (Fig. 2), and expert opinion (during the sWorm workshops), respectively. These classification systems were chosen based on knowledge of what external pressures might be important for explaining earthworm communities, whilst also ensuring consistency across all regions of the globe. Based on information given within the published article, or from the data providers directly, every site was classified into one of the categories for each of these fields. When information was missing, sites were classified as “unknown”. Additional information on the land use and management system classification definitions shown in Tables 1 and 2, respectively.

Fig. 2.

Fig. 2

The number of sites (grey bars) and the number of studies (red dots) for each category in (a) the land-use system, and (b) the habitat-cover system. Sites could only be categorised within one category, but studies do contain sites that span multiple categories.

Table 1.

Definitions for the land use category.

Land use category Definition
Primary Relatively undisturbed natural habitat
Secondary Recovering, previously disturbed natural habitat
Pasture Land used for the grazing of livestock
Production - Arable Land used for crop production (e.g., wheat, rice, corn)
Production - Plantations crops Land used for plantations crops (e.g., coffee, vineyards, oil palm)
Production – Wood plantations Land used for timber production (e.g., teak)
Urban Land converted to dense urban settlement
Unknown If the land use is not given or is not clear

The land use classification was based on the Land-use Harmonization dataset30,31, to map to the original classification system, ‘Production – Wood plantations’ and ‘Production – Plantation crops’ would be ‘Secondary’ and ‘Production – Arable’ would be ‘Cropland’.

Table 2.

A management classification system was created during the sWorm wokshops.

Management Intensity measure Annual crops Integrated systems Perennial crops Pastures (grazed lands) Tree plantations
Tillage × ×
Pesticide × × × × ×
Fertilizer × × × × ×
Selectively harvested × ×
Clear cut × ×
Fire × × × × ×
Stocking rate ×
Grazing all-year ×
Rotation × × ×
Monoculture × × × × ×
Planted ×

For each managed site (i.e., not natural vegetation) the management system could also be identified (table headers), and additional management intensity variables could be also captured (table rows). However, not every management intensity variable was applicable for each management system, thus restrictions were placed. ‘×’ indicates which management intensity variable was applicable to each management system.

As sampling effort also impacts diversity measurements32, the sampling effort at each site was recorded. Effort was recorded in two ways:

  1. The area that was sampled, e.g., of a quadrat or soil block, or the area across all e.g., quadrats. This depended on how the data were presented.

  2. The number of times a site was sampled, either temporally or spatially. If a site was sampled over multiple time periods, it would be the number of occasions the site was sampled. If the site had multiple samples (e.g,, multiple quadrats) and the diversity measure is an average, the sampling effort would be 1. If the diversity is a total measure (e.g., the total number of species across all quadrats) the sampling effort would be the total number of e.g., quadrats.

When datasets contained information at a higher resolution than total abundance or biomass of earthworms at a site (i.e., at ecological group, genus, or species level), this information was entered into the species occurrence table (Online-only Table 3). Each row contained a measurement of an observation (e.g. species, morphospecies, genus, life stage or ecological group) at a single site. The measurement could be the presence only, abundance, or fresh biomass of the record. Where possible, for each row we also included the life stage (adult or juvenile), whether the species was native to the location or not, and the ecological group (epigeic, endogeic, anecic, epi-endogeic). Thus, if the diversity measure was for all the juveniles at the site regardless of species, columns such as the species binomial and genus would be empty, but life stage completed. Every species binomials and ecological group assignment were checked using DriloBASE and by earthworm taxonomists (GB, MJIB, MLCB, PL), see ‘Technical Validation’.

Online-only Table 3.

The information captured in the “species occurrence” data sheet. An observation could refer to a species (either with a scientific binomial or a morphospecies identification), or a genus, life stage, ecological group, or native/non-native group.

Field Format Possible Values Information Required field
File Text Unique ID for each article. Assigned to all studies within a single publication *
Study_Name Text Unique ID for each study. Assigned to all sites within a single study *
Site_Name Text Unique ID given to each site. A site that is sampled in different studies will have the same ID. *
OriginalSpeciesBinomial Text The species binomial of the observation as given by the data collector, prior to revision by earthworm experts
SpeciesBinomial Text The species binomial of the observation (following revision by earthworm experts) At least one required
MorphospeciesID Text An indicator (i.e., number or letter) of a observation that has been only identified to morphospecies
Genus Text The genus of the observation
Family Text The family of the observation
Ecological_group Multiple choice Epigeic, Anecic, Endogeic, Epi-Endogeic, Unknown The ecological group of the observation (following revision by earthworm experts)
LifeStage Multiple choice Adult, Juvenile, Unknown The life stage of the observation
Native/Non-native Multiple choice Native, Non-native, Unknown Whether the observation is native or non-native in the sampled region
Abundance Numerical The total abundance of the observation at a specific site
Abundance Unit Multiple choice Number of individuals, Individuals per cm2, Individuals per cm3, Individuals per m2, Individuals per m3 The units of the abundance value
WetBiomass Numerical The total biomass of the observation at a specific site
WetBiomassUnits Multiple choice g, g/m2 The units of the biomass value

For each dataset, this datasheet was only used if species occurrence data were available.

Where site-level diversity measures were given by the data provider, these were entered into the site-level sheet. Where site-level diversity measures were not given, but could be calculated from the species occurrence information, that was done in R33, following data entry and prior to subsequent analyses. The species present at each site, as given in the species occurrence data, were used for calculating species richness, this included species identified as sub-species. If data collectors identified a specimen as a morphospecies (i.e., a species delineation based solely on morphological characteristics, typically identified to genus level with a unique ID differentiating from other species of the same genus, as determined by the original data collector), it was included in the species richness estimate as an additional species. Unidentified species grouped as ‘unknown’ were excluded (Fig. 3). As juveniles of many earthworm species are hard to identify to species level29,34, juveniles were excluded from the calculation (even identified at family level). All earthworms (including juveniles) found at a site were included in the total biomass and abundance calculations.

Fig. 3.

Fig. 3

The number of (a) studies and (b) sites that measured each of the three community metrics. The points at the vertices indicate the number of studies or sites with only one community metric. The points on the edges indicates the number of studies or sites with the community metrics represented at the connecting two vertices. Finally, the point in the centre indicates the number of studies or sites with all three community metrics. For example, in (a), 145 studies measured biomass, shown in the blue polygon. 4 studies measured only biomass, 7 measured biomass and species richness, 44 measured biomass and abundance, and 90 measured all three metrics.

After the ecological grouping (epigeic, endogeic, anecic, and epi-endogeic) of each species had been assigned and/or checked by the earthworm taxonomists, diversity measures within each ecological group at a site were also calculated. As with the site-level metrics, the species richness within each ecological group was calculated using only species with binomials or morphospecies. Biomass and abundance of each ecological group at a site was calculated regardless of species identity. The total number of the ecological groups at each site was calculated regardless of abundance, biomass, life stage or native status of the species included (maximum ecological group richness = 4).

Data Records

The data presented here are available in the iDiv data portal (10.25829/idiv.1880-17-3189. Dataset ID: 1880)19 in a static form. In addition, the full dataset will be hosted by Edaphobase (www.portal.edaphobase). In the future, the version in Edaphobase might change (i.e., with species names revisions, or requests from the data providers) and will hopefully be added to with additional earthworm records (or other soil taxa).

The data is stored in three tables; meta-data (Online-only Table 1), site-level (Online-only Table 2), and species occurrence (Online-only Table 3). The file ID links the meta-data to the site-level data, and the Study ID and the Site ID, link the site-level data to the species occurrence table.

For all suitable datasets, the meta-data information was completed. The meta-data contains bibliographic information on the original paper which analysed, or published, the data, as well as contact information of the person who provided the raw data (not included in the release of the database for privacy reasons). The meta-data also included the number of sites and studies within the file, so that validation checks could be completed. Online-only Table 1 shows all fields within the meta-data, personal information of data providers has not been made available.

Information on all sampled sites within each dataset was recorded in the site-level table (Online-only Table 2). Each row represents a single site within a study, with information on the sampling methodology, soil properties, and how the land was used, managed, and covered. The site-level earthworm community metrics (species richness, abundance and biomass) are also included if available.

Site-level species lists, or abundance, and/or biomass measures for individual records are given in the species occurrence table (Online-only Table 1). Each row is a measurement of an observation at a site (22,690 non-zero observations in total). An observation could relate to a species (with a scientific binomial, e.g., the abundance of Lumbricus terrestris at a site, or a morphospecies identification), a genus, life stage, ecological group, or native/non-native group (e.g., the abundance of all non-native species at a site). Details of native/non-native status of a species was only available when provided by the original data collector.

Technical Validation

Templates used to enter the individual datasets were designed so that fields were only allowed certain values and formats where possible. This helped to reduce spelling errors, slight inconsistencies, and incorrect values being entered. Data providers were contacted if details within their raw data were unclear. As multiple people entered data into the templates, detailed documentation was created at the start of the project to ensure consistency amongst those involved. In addition, a subset of datasets was checked by several curators.

All earthworm species names were checked against DriloBASE (http://taxo.drilobase.org) to identify potential synonyms and spelling mistakes. Following that, earthworm specialists and taxonomists (GB, MJIB, MLCB and PL) checked the scientific names, removed synonyms and updated names if taxonomies had changed. Where ecological groupings were missing, the earthworm taxonomists also added them where possible, based on the available literature.

Usage Notes

Land-use fields were based on classification schemes, and may not be the most suitable for the analysis of earthworms. We included a free-text field (“Habitat as described”) that could be used by future researchers to define their own classification scheme for land-use or habitat cover.

As diversity measures are highly influenced by sampling methodology, we included information on sampling methods in the database (Fig. 4). In addition, we would expect that variation in diversity would differ between the individual datasets due to, for example, inter-observer variability. We highly recommend that statistical methods used on this database take these between-dataset variations into account.

Fig. 4.

Fig. 4

The number of sites sampled with each sampling method across the different earthworm studies.

Despite our efforts to obtain a global dataset, there is a geographic bias (Fig. 1), such that sites are highly clustered in certain regions (e.g., Europe), sparse in others (e.g., South America), or lacking (e.g., southern Africa, northern Russia). To reduce such biases, we attempted to contact as many researchers as possible in such areas to acquire data. Although this helped to improve the data coverage, it did not remove the gaps. We hope to address these gaps in the future, but in the meantime, researchers should be aware of the influence these biases might have on their analyses35,36.

Acknowledgements

This database and paper are a product of two sWorm workshops at sDiv, the synthesis center at iDiv. We thank M. Winter and the sDiv team for their help in organizing the sWorm workshops, and the Biodiversity Informatics Unit (BDU) at iDiv for their assistance in making the data open access. H.R.P.P., B.K-R., and the sWorm workshops were supported by the sDiv [Synthesis Centre of the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig (DFG FZT 118)]. H.R.P.P., O.F. and N.E. acknowledge funding by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 677232 to NE). K.S.R. and W.H.v.d.P. were supported by ERC-ADV grant 323020 to W.H.v.d.P. Also supported by iDiv (DFG FZT118) Flexpool proposal 34600850 (C.A.G. and N.E.); the Academy of Finland (285882) and the Natural Sciences and Engineering Research Council of Canada (postdoctoral fellowship and RGPIN-2019-05758) (E.K.C.); German Federal Ministry of Education and Research (01LO0901A) (D.J.R.); ERC-AdG 694368 (M.R.); the TULIP Laboratory of Excellence (ANR-10-LABX-41) (M.L); and the BBSRC David Phillips Fellowship to F.T.d.V. (BB/L02456X/1). In addition, data collection was funded by the Russian Foundation for Basic Research (12-04-01538-а, 12-04-01734-a, 14-44-03666-r_center_a, 15-29-02724-ofi_m, 16-04-01878-a 19-05-00245, 19-04-00-609-a); Tarbiat Modares University; Aurora Organic Dairy; UGC(NERO) (F. 1-6/Acctt./NERO/2007-08/1485); Natural Sciences and Engineering Research Council (RGPIN-2017-05391); Slovak Research and Development Agency (APVV-0098-12); Science for Global Development through Wageningen University; Norman Borlaug LEAP Programme and International Atomic Energy Agency (IAEA); São Paulo Research Foundation - FAPESP (12/22510-8); Oklahoma Agricultural Experiment Station; INIA - Spanish Agency (SUM 2006-00012-00-0); Royal Canadian Geographical Society; Environmental Protection Agency (Ireland) (2005-S-LS-8); University of Hawai’i at Mānoa (HAW01127H; HAW01123M); European Union FP7 (FunDivEurope, 265171; ROUTES 265156); U.S. Department of the Navy, Commander Pacific Fleet (W9126G-13-2-0047); Science and Engineering Research Board (SB/SO/AS-030/2013) Department of Science and Technology, New Delhi, India; Strategic Environmental Research and Development Program (SERDP) of the U.S. Department of Defense (RC-1542); Maranhão State Research Foundation (FAPEMA 03135/13, 02471/17); Coordination for the Improvement of Higher Education Personnel (CAPES 3281/2013); Ministry of Education, Youth and Sports of the Czech Republic (LTT17033); Colorado Wheat Research Foundation; Zone Atelier Alpes, French National Research Agency (ANR-11-BSV7-020-01, ANR-09-STRA-02-01, ANR 06 BIODIV 009-01); Austrian Science Fund (P16027, T441); Landwirtschaftliche Rentenbank Frankfurt am Main; Welsh Government and the European Agricultural Fund for Rural Development (Project Ref. A AAB 62 03 qA731606); SÉPAQ, Ministry of Agriculture and Forestry of Finland; Science Foundation Ireland (EEB0061); University of Toronto (Faculty of Forestry); National Science and Engineering Research Council of Canada; Haliburton Forest & Wildlife Reserve; NKU College of Arts & Sciences Grant; Österreichische Forschungsförderungsgesellschaft (837393 and 837426); Mountain Agriculture Research Unit of the University of Innsbruck; Higher Education Commission of Pakistan; Kerala Forest Research Institute, Peechi, Kerala; UNEP/GEF/TSBF-CIAT Project on Conservation and Sustainable Management of Belowground Biodiversity; Ministry of Agriculture and Forestry of Finland; Complutense University of Madrid/European Union FP7 project BioBio (FPU UCM 613520); GRDC; AWI; LWRRDC; DRDC; CONICET (National Scientific and Technical Research Council) and FONCyT (National Agency of Scientific and Technological Promotion) (PICT, PAE, PIP), Universidad Nacional de Luján y FONCyT (PICT 2293 (2006)); Fonds de recherche sur la nature et les technologies du Québec (131894); Deutsche Forschungsgemeinschaft (SCHR1000/3-1, SCHR1000/6-1, 6-2 (FOR 1598), WO 670/7-1, WO 670/7-2, & SCHA 1719/1-2), CONACYT (FONDOS MIXTOS TABASCO/PROYECTO11316); NSF (DGE-0549245, DGE-0549245, DEB-BE-0909452, NSF1241932, LTER Program DEB-97–14835); Institute for Environmental Science and Policy at the University of Illinois at Chicago; Dean’s Scholar Program at UIC; Garden Club of America Zone VI Fellowship in Urban Forestry from the Casey Tree Endowment Fund; J.E. Weaver Competitive Grant from the Nebraska Chapter of The Nature Conservancy; The College of Liberal Arts and Sciences at Depaul University; Elmore Hadley Award for Research in Ecology and Evolution from the UIC Dept. of Biological Sciences, Spanish CICYT (AMB96-1161; REN2000-0783/GLO; REN2003-05553/GLO; REN2003-03989/GLO; CGL2007-60661/BOS); Yokohama National University; MEXT KAKENHI (25220104); Japan Society for the Promotion of Science KAKENHI (25281053, 17KT0074, 25252026); ADEME (0775C0035); Ministry of Science, Innovation and Universities of Spain (CGL2017-86926-P); Syngenta Philippines; UPSTREAM; LTSER (Val Mazia/Matschertal); Marie Sklodowska Curie Postdoctoral Fellowship (747607); National Science & Technology Base Resource Survey Project of China (2018FY100306); McKnight Foundation (14–168); Program of Fundamental Researches of Presidium of Russian Academy of Sciences (AААА-A18–118021490070–5); Brazilian National Council for Scientific and Technological Development (CNPq 310690/2017–0, 404191/2019–3, 307486/2013–3); French Ministry of Foreign and European Affairs; Bavarian Ministry for Food, Agriculture and Forestry (Project No B62); INRA AIDY project; MIUR PRIN 2008; Idaho Agricultural Experiment Station; Estonian Science Foundation; Ontario Ministry of the Environment, Canada; Russian Science Foundation (16-17-10284); National Natural Science Foundation of China (41371270); Australian Research Council (FT120100463); USDA Forest Service-IITF. The authors would like to thank all supervisors, students, collaborators, technicians, data analysts, land owners/managers, and anyone else involved with the collection, processing, and/or publication of the primary datasets, both for this manuscript and16. Namely: Peter M. Kotanen, Jessica G. Davis, S.N. Ramanujam, J.M. Julka, Csaba Csuzdi, P. Bescansa, M. Moriones, C. González, Creighton Litton, Danielle Celentano, Sandriel Sousa, Samuel James, C. Hakseth, C. Mills, Hirohi Takeda, Sandriel Sousa Costa, Kyungsoo Yoo, Sebastien De Danieli, Philippe Choler, Pierre Taberlet, Lauric Cecillon, Erwin Meyer, Felix Gerlach, Doris Beutler, Christina Marley, Rhun Fychan, Ruth Sanderson, Mervi Nieminen, Taisto Sirén, Mariana Alem, Carlos Regalsky, Tara Sackett, Erin Bayne, Sarah Hamilton, Alexander Rief, Catarina Praxedes, Rosana Sandler, Juliane Palm, Anne Zangerlé, Anne-Kathrin Schneider, Erwin Zehe, David H. Wise, Liam Heneghan, Yoshikazu Kawaguchi, Irene L. López-Sañudo, Almudena Mateos, Pilar Meléndez, Raquel Santos, Marta Yebra, Tamara Vsevolodova-Perel, Maxim Bobrovsky, Natalya Ivanova, Eufemio Rasco Jr., Robert W. Mysłajek, Jianxiong Li, Jiangping Qiu, A. Barne, Antonio Gómez-Sal, Tanya Handa, Mark Vellend, Hans de Wandeler, Sarah Placella, Lee Frelich, Peter Reich. Open Access funding enabled and organized by Projekt DEAL.

Online-only Tables

Author contributions

The sWorm workshops were organised by N.E., E.K.C. and H.R.P.P., with funding acquired by N.E., E.K.C. and M.P.T. Data collation and formatting was led by H.R.P.P., with assistance from J.K., M.J.I.B., G.B., K.B.G. and B.S. Harmonisation of earthworm species names was completed by G.B., M.J.I.B., M.L.C.B. and P.L. Advice and feedback on data collation protocols was provided by E.M.B., M.J.I.B., G.B., O.F., C.A.G., B.K.R., A.O., D.R., and D.H.W. Writing of the manuscript was led by H.R.P.P. All authors provided input and comments on the manuscript. The majority of authors provided data to the database.

Code availability

All code used to format and clean the dataset for publication is available on GitHub (www.github.com/helenphillips).

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Erin K. Cameron, Nico Eisenhauer.

References

  • 1.Giller PS. The diversity of soil communities, the poor man’s tropical rainforest? Biodivers. Conserv. 1996;5:135–168. doi: 10.1007/BF00055827. [DOI] [Google Scholar]
  • 2.Decaëns T, Jiménez JJ, Gioia C, Measey GJ, Lavelle P. The values of soil animals for conservation biology. Eur. J. Soil Biol. 2006;42:S23–S38. doi: 10.1016/j.ejsobi.2006.07.001. [DOI] [Google Scholar]
  • 3.Bardgett RD, van der Putten WH. Belowground biodiversity and ecosystem functioning. Nat. 2014;515 505:505–511. doi: 10.1038/nature13855. [DOI] [PubMed] [Google Scholar]
  • 4.Phillips HRP, et al. Red list of a black box. Nat. Ecol. Evol. 2017;1:0103. doi: 10.1038/s41559-017-0103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Orgiazzi, A. et al. Global Soil Biodiversity Atlas. European Commission, Publications (2016).
  • 6.Dornelas M, et al. BioTIME: A database of biodiversity time series for the Anthropocene. Glob. Ecol. Biogeogr. 2018;27:760–786. doi: 10.1111/geb.12729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hudson LN, et al. The database of the PREDICTS (Projecting Responses of Ecological Diversity In Changing Terrestrial Systems) project. Ecol. Evol. 2017;7:145–188. doi: 10.1002/ece3.2579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Dornelas M, et al. Assemblage time series reveal biodiversity change but not systematic loss. Science. 2014;344:296–299. doi: 10.1126/science.1248484. [DOI] [PubMed] [Google Scholar]
  • 9.Newbold T, et al. Global effects of land use on local terrestrial biodiversity. Nature. 2015;520:45–50. doi: 10.1038/nature14324. [DOI] [PubMed] [Google Scholar]
  • 10.Ramirez KS, et al. Detecting macroecological patterns in bacterial communities across independent studies of global soils. Nat. Microbiol. 2018;3:189–196. doi: 10.1038/s41564-017-0062-x. [DOI] [PubMed] [Google Scholar]
  • 11.Delgado-Baquerizo M, et al. A global atlas of the dominant bacteria found in soil. Science. 2018;359:320–325. doi: 10.1126/science.aap9516. [DOI] [PubMed] [Google Scholar]
  • 12.Milcu A, Partsch S, Scherber C, Weisser WW, Scheu S. Earthworms and legumes control litter decomposition in a plant diversity gradient. Ecology. 2008;89:1872–1882. doi: 10.1890/07-1377.1. [DOI] [PubMed] [Google Scholar]
  • 13.Blouin M, et al. A review of earthworm impact on soil function and ecosystem services. Eur. J. Soil Sci. 2013;64:161–182. doi: 10.1111/ejss.12025. [DOI] [Google Scholar]
  • 14.Zhang W, et al. Earthworms facilitate carbon sequestration through unequal amplification of carbon stabilization compared with mineralization. Nat. Commun. 2013;4:2576. doi: 10.1038/ncomms3576. [DOI] [PubMed] [Google Scholar]
  • 15.Paoletti MG. The role of earthworms for assessment of sustainability and as bioindicators. Agric. Ecosyst. Environ. 1999;74:137–155. doi: 10.1016/S0167-8809(99)00034-1. [DOI] [Google Scholar]
  • 16.Phillips HRP, et al. Global distribution of earthworm diversity. Science. 2019;366:480–485. doi: 10.1126/science.aax4851. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Rutgers M, et al. Mapping earthworm communities in Europe. Appl. Soil Ecol. 2016;97:98–111. doi: 10.1016/j.apsoil.2015.08.015. [DOI] [Google Scholar]
  • 18.Burkhardt U, et al. The Edaphobase project of GBIF-Germany-A new online soil-zoological data warehouse. Appl. Soil Ecol. 2014;83:3–12. doi: 10.1016/j.apsoil.2014.03.021. [DOI] [Google Scholar]
  • 19.Phillips HRP, 2020. Global data on earthworm abundance, biomass, diversity and corresponding environmental properties. (iDiv Data Repository) German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig. [DOI]
  • 20.Bouché, M. B. Strategies lombriciennes. Ecol. Bull. 122–132 (1977).
  • 21.IPBES. Summary for policymakers of the global assessment report on biodiversity and ecosystem services of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services. 56 (2019).
  • 22.Pey B, et al. Current use of and future needs for soil invertebrate functional traits in community ecology. Basic Appl. Ecol. 2014;15:194–206. doi: 10.1016/j.baae.2014.03.007. [DOI] [Google Scholar]
  • 23.Cameron, E. K. et al. Global mismatches in aboveground and belowground biodiversity. Conserv. Biol. 430 (2019) [DOI] [PubMed]
  • 24.Anderson, J. M. & Ingram, J. S. I. Tropical Soil Biology and Fertility: A handbook of methods. Trop. Soil Biol. Fertil. A Handb. methods 2 Ed., 88–91 (1993).
  • 25.ISO. Soil quality – Sampling of soil invertebrates – Part 1: Hand-sorting and extraction of earthworms (ISO/FDIS 23611-1:2012). (2012).
  • 26.USDA. Soil Survey Manual Agriculture. Handbook 18. USDA, Nat. Resour. Conserv. Serv. (2017)
  • 27.FAO/WRB. World reference base for soil resources 2014. World Soil Resources Reports No. 106 (2014).
  • 28.Schindelin J, et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods. 2012;9:676–682. doi: 10.1038/nmeth.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Bartlett MD, et al. A critical review of current methods in earthworm ecology: From individuals to populations. Eur. J. Soil Biol. 2010;46:67–73. doi: 10.1016/j.ejsobi.2009.11.006. [DOI] [Google Scholar]
  • 30.Hoskins AJ, et al. Downscaling land-use data to provide global 30” estimates of five land-use classes. Ecol. Evol. 2016;6:3040–3055. doi: 10.1002/ece3.2104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Hurtt GC, et al. Harmonization of land-use scenarios for the period 1500–2100: 600 years of global gridded annual land-use transitions, wood harvest, and resulting secondary lands. Clim. Change. 2011;109:117–161. doi: 10.1007/s10584-011-0153-2. [DOI] [Google Scholar]
  • 32.Magurran, A. E. Measuring biological diversity. (John Wiley & Sons, 2004).
  • 33. R Core Team. R: A language and environment for statistical computing. (2016).
  • 34.Sims RW, Gerard BM. Earthworms. Keys and notes for the identification and study of the species. New Zeal. J. Zool. 1988;15:447–448. doi: 10.1080/03014223.1988.10422974. [DOI] [Google Scholar]
  • 35.Gonzalez A, et al. Estimating local biodiversity change: a critique of papers claiming no net loss of local diversity. Ecology. 2016;97:1949–1960. doi: 10.1890/15-1759.1. [DOI] [PubMed] [Google Scholar]
  • 36.Cameron EK, et al. Global gaps in soil biodiversity data. Nat. Ecol. Evol. 2018;2:1042–1043. doi: 10.1038/s41559-018-0573-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. Phillips HRP, 2020. Global data on earthworm abundance, biomass, diversity and corresponding environmental properties. (iDiv Data Repository) German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig. [DOI]

Data Availability Statement

All code used to format and clean the dataset for publication is available on GitHub (www.github.com/helenphillips).


Articles from Scientific Data are provided here courtesy of Nature Publishing Group

RESOURCES