Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2021 Aug 12;19(8):e3001336. doi: 10.1371/journal.pbio.3001336

Global and national trends, gaps, and opportunities in documenting and monitoring species distributions

Ruth Y Oliver 1,2,*, Carsten Meyer 3,4,5, Ajay Ranipeta 1,2, Kevin Winner 1,2, Walter Jetz 1,2,*
Editor: Craig Moritz6
PMCID: PMC8360587  PMID: 34383738

Abstract

Conserving and managing biodiversity in the face of ongoing global change requires sufficient evidence to assess status and trends of species distributions. Here, we propose novel indicators of biodiversity data coverage and sampling effectiveness and analyze national trajectories in closing spatiotemporal knowledge gaps for terrestrial vertebrates (1950 to 2019). Despite a rapid rise in data coverage, particularly in the last 2 decades, strong geographic and taxonomic biases persist. For some taxa and regions, a tremendous growth in records failed to directly translate into newfound knowledge due to a sharp decline in sampling effectiveness. However, we found that a nation’s coverage was stronger for species for which it holds greater stewardship. As countries under the post-2020 Global Biodiversity Framework renew their commitments to an improved, rigorous biodiversity knowledge base, our findings highlight opportunities for international collaboration to close critical information gaps.


Conserving and managing biodiversity in the face of ongoing global change requires sufficient evidence to assess status and trends of species distributions. This study analyzes national trajectories in closing spatiotemporal knowledge gaps for terrestrial vertebrates (1950-2019) based on novel indicators of data coverage and sampling effectiveness.

Introduction

Detection, understanding, and management of global biodiversity change and its manifold consequences [1,2] in a rapidly transforming world rely on comprehensive evidence to establish baselines and assess changes. As discussions of the post-2020 Global Biodiversity Framework of the Convention on Biological Diversity (CBD) enter their final stage, the availability of data and metrics to assess progress toward agreed-upon targets has taken a central role [37]. The fundamental need for an improved and shared knowledge base of global biodiversity is recognized in the proposed Target 19, which requires the availability of reliable information on biodiversity status and trends [8].

Descriptions of species’ geographical ranges and their temporal dynamics are fundamental biodiversity measures [9], as captured in the species distribution Essential Biodiversity Variable [10]. The status and trends of species’ geographic distributions are directly related to species’ ecological relevance, population size, and extinction risk, and are thus central to the conservation and management of species and their ecological functions [1113]. Ambitions to limit threats to species and ensure the integrity of ecosystems, which are central goals of the post-2020 Global Biodiversity Framework under discussion [8], critically rely on effective documentation and monitoring of species distributions and changes over time [6,7,14].

Thanks to significant advances in data collection, mobilization, and aggregation [1517], publicly accessible occurrence data are growing rapidly [9,14,18], with over 1.6 billion occurrence records across sources and taxa available in the Global Biodiversity Information Facility (GBIF). These data represent an increasing array of sources, including museum specimens, field observations, acoustic and visual sensors, and citizen science efforts [19]. Digital platforms such as Map of Life (MOL) have begun to integrate these data through models to bolster a multitude of research and conservation applications [10,20].

Increases in data quantity alone, however, provide little information about overall progress toward an effective spatial biodiversity knowledge base, as records may be highly redundant and cover a limited set of species and regions [21]. Indeed, prior work has revealed significant taxonomic and geographic gaps in the existing data [9,2126] and highlighted the importance of accounting for expected diversity and scale sensitivity in data coverage assessments [19,21,2729]. Scientists have identified a range of socioeconomic, linguistic, and ecological drivers for gaps and biases in the current data and identified geographical access, availability of local funding resources, and participation in data-sharing networks as key correlates of data gaps [21,30].

The aforementioned gaps in knowledge highlight the importance of a more informed and coordinated approach to developing an effective spatial biodiversity evidence base. Developing such an evidence base requires metrics that allow changes in biodiversity data coverage over time to inform decision-making. As political units responsible for coordination and stewards of their biodiversity, nations hold the key to incentivizing an improved information base and stand to gain the greatest benefits from broadly improved biodiversity information by enabling monitoring and robust management decisions. For example, the activities of the Mexican National Commission on Biodiversity (CONABIO), a permanent commission of the Mexican federal government, have led to strongly increased biodiversity information in that country that supports conservation decisions in the region [31]. Despite the urgent need to meet international targets and numerous documentations of growing data [32,33], published work has yet to provide quantitative metrics to track nations’ progress in closing spatiotemporal biodiversity data gaps [27,3436].

Here, we provide 2 national indicators in support of the global assessment, monitoring, and decision-support around annual trends in spatiotemporal biodiversity information. These metrics are integrated within a flexible, updatable analytical framework. Specifically, we present and globally implement the MOL Species Status Information Index (SSII), which was developed under the auspices of the GEO Biodiversity Observation Network [37] (https://mol.org/indicators/coverage) in support of IPBES reporting (https://ipbes.net/core-indicators) and global assessment processes [8], as well as the Species Sampling Effectiveness Index (SSEI). We use the indicator framework to compare global and national trends in spatiotemporal biodiversity knowledge since 1950 for over 31,000 terrestrial vertebrate species and over 450 million verified and taxonomically harmonized occurrence records at the level of species, nations, and the globe. We provide a first global assessment for trends in data coverage and sampling effectiveness for terrestrial vertebrates as well as infrastructure to continuously track these indices into the future at MOL (https://mol.org/indicators/coverage).

The SSII quantifies spatiotemporal biodiversity data coverage for a particular grid resolution and species geographic range expectation (Fig 1A). The Global SSII tracks the proportion of expected range cells with records, either for a single species or averaged across multiple species (Fig 1B). The National SSII is calculated using the same method as the Global SSII but is restricted to the range cells inside a particular country (Fig 1B). Steward’s SSII follows the National SSII calculation but additionally applies a species-level weight to account for different national stewardships of species (Fig 1B). Nations’ varying responsibilities are determined by the portion of a species’ global range they hold (e.g., 1 for country endemics; see Fig 1A for illustration and Text A in S1 File for formal description). For a given species, SSII quantifies the proportion of the range with data but not how effectively these data are distributed across the proportion of the range it covers. We characterize sampling effectiveness by relating the realized spatial distribution of records to the ideal uniform distribution based on Shannon’s entropy (Fig 1C, Text A in S1 File) normalized to vary between 0 and 1, a metric we call the SSEI. The SSEI is similar to other information theoretic evenness metrics, such as Pielou’s index of species evenness, which is also based on normalized entropy [38]. SSEI has the same properties as SSII and can be calculated at the species, national, or global level and additionally can be adjusted by national stewardship for species.

Fig 1. SSII and SSEI metrics of biodiversity data coverage and effectiveness.

Fig 1

The metrics are illustrated for 2 hypothetical species with geographic range delineated by binary (e.g., expert range) maps and are assessed for an example 110-km equal-area grid. (a) National stewardship of species is calculated based on the relative portion of species’ ranges falling inside a country. (b) At the species level, the SSII is given as the proportion of cells expected occupied with records in a given year. In this hypothetical example, coverage is 0.83 and 0.67 for species where 5 out of 6 and 2 out 3 expected grid cells have data. Steward’s SSII adjusts this coverage by their respective national stewardship (0.83 and 0.2). Species-level SSII can be aggregated to the national level via 2 formulations. National SSII for a given taxonomic group takes the mean coverage across all species expected in a country (0.75). Steward’s SSII adjusts the mean coverage across species by their respective national stewardship (0.8). (c) SSEI compares the entropy of the realized distribution of records to that of the ideal distribution (see Text A in S1 File), where uneven sampling (lower SSEI) is considered less effective than more even sampling (higher SSEI). National SSEI takes the mean across all species expected in a country. (d) Glossary of relevant terms. Artwork from plylopics.org (see Text A in S1 File). GBIF, Global Biodiversity Information Facility; MOL, Map of Life; SSEI, Species Sampling Effectiveness Index; SSII, Species Status Information Index.

We illustrate the SSII and SSEI for the years 2000 to 2019 for the jaguar (Panthera onca) and collared peccary (Pecari tajacu), 2 widely distributed species with heterogeneous sampling (Fig 2, Table A in S1 File). The number of records collected annually for the peccary was substantially higher than for the jaguar, ranging from 2- to 10-fold higher data collection (Fig 2A–2C). Subsequently, Global SSII was consistently higher for the peccary than the jaguar, but the difference in values was narrower than the difference in data collection would suggest (Fig 2D).

Fig 2. Species and national example patterns and trends.

Fig 2

SSII and SSEI trends illustrated for 2 species, the jaguar (Panthera onca) and collared peccary (Pecari tajacu). (a, b) The expected occupied cells are shown in dark gray, and total number of records collected 2010–2019 in color. (c–e) Species-level time series of the total number of records (c), Global SSII for the whole species range (i.e., all countries with expected range) (d), and Global SSEI (e) across their expected range. (f, g) Resulting National and Steward’s SSII (f) and SSEI (g) for 4 countries. Photographs from Wikimedia (see Text A in S1 File). National boundaries from gadm.org. Numerical values available in Tables A and B in S1 File. The data underlying this figure may be found in https://mol.org/indicators/coverage and https://github.com/MapofLife/biodiversity-data-gaps. SSEI, Species Sampling Effectiveness Index; SSII, Species Status Information Index.

Such results suggest a much lower sampling effectiveness, as indexed by the SSEI, for the peccary compared to the jaguar, indicating that many peccary records were concentrated in the same regions. SSII improved markedly for the peccary in recent years, reaching 0.03 (i.e., 3% global range cells with annual records). This increase was associated with increasing SSEI, as the number of records collected were only slightly elevated (Fig 2E). National and Steward’s SSII calculated for these 2 species was highest in Costa Rica and lowest in Brazil (Fig 2F, Table B in S1 File). National SSEI was generally highest in Brazil and lowest in Colombia (Fig 2G).

Global and national trends in data coverage and sampling effectiveness

Biodiversity data collection has rapidly proliferated, particularly over the last 2 decades (Fig 3A). However, the proliferation of species records and their translation into biodiversity knowledge has played out along substantially different trajectories among taxa. For example, bird species consistently had the largest number of records, with approximately 1,000-fold greater number of records collected annually and 3-fold greater percentage of expected species recorded compared to other terrestrial vertebrates (Fig 3B). Yet, SSII for birds only exceeded the 3 other groups after 1980 but has since shown near-linear growth in taxon-wide SSII (Fig 3C). Although data collection in terms of number of records for mammal species consistently outpaced that for amphibians and reptiles, data coverage for mammals was lowest in recent years (Fig 3C). Coincident with this rapid rise in data collection and coverage for birds species, however, was a rapid decline in sampling effectiveness (Fig 3D–3F).

Fig 3. Global trends data coverage and sampling effectiveness across 4 terrestrial vertebrate groups.

Fig 3

(a–c) Trends in total annual record counts (a), percentage of expected species recorded (b), and the Global SSII (c). Global SSII is based on data coverage across species’ ranges without consideration of national boundaries. Alternatively, Global SSII for a species is the sum of Steward’s SSII across the nations where it is expected to occur. (d) Relationship between annual total record counts and Global SSII. (e) Trends in Global SSEI. (f) Relationship between percentage of expected species recorded and Global SSEI. (c, e) Lines and shading represent means and 95% confidence intervals across species within classes. (d, f) Relationships are shown over the past 70 years (1950–2019). Colors in a–f indicate birds (blue), mammals (orange), amphibians (purple), and reptiles (green). Artwork from phylopics.org (see Text A in S1 File). The data underlying this figure may be found in https://mol.org/indicators/coverage and https://github.com/MapofLife/biodiversity-data-gaps. SSEI, Species Sampling Effectiveness Index; SSII, Species Status Information Index.

Biodiversity knowledge continued to be highly geographically biased over the previous decade (2010 to 2019), with the most complete data coverage found primarily within the United States, Europe, South Africa, and Australia (Fig 4A). Globally, only approximately half of nations (42%) showed increasing, significant trends (p < 0.01) in coverage averaged across taxa over the previous decade (Fig 4B). For those nations showing increases, trends are driven primarily by the rapid increase in avian distribution data (Fig 4C, Fig A and Table C in S1 File). Nearly half of nations (47%) showed significantly increasing data coverage for birds, whereas less than 20% of nations had increasing trends for other taxa (Fig 4C). This suggests that despite increasing data availability for all taxa, a majority of nations are not making progress in closing information gaps for mammals, amphibians, and reptiles.

Fig 4. National patterns and trends in spatial biodiversity data coverage and sampling effectiveness.

Fig 4

(a, d) Mean Steward’s SSII (a) and National SSEI (d) over the previous decade (2010–2019) averaged across terrestrial vertebrates; the relationship between data coverage and sampling effectiveness is shown as inset. (b, e) Change rate in Steward’s SSII (b) and National SSEI (e) over the previous decade. Maximum values for each color bin are labeled below each map. (c, f) Percentage of nations with no significant (p < 0.01) trends (beige) and significant decreasing (blue) or increasing (red) trends in Steward’s SSII (c) and SSEI (f) over the previous decade for birds, mammals, amphibians, and reptiles. Artwork from phylopics.org (see Text A in S1 File). National boundaries from gadm.org. The data underlying this figure may be found in https://mol.org/indicators/coverage and https://github.com/MapofLife/biodiversity-data-gaps. SSEI, Species Sampling Effectiveness Index; SSII, Species Status Information Index.

Interestingly, several world regions that historically most comprehensively sampled the full suite of local species across their geographic ranges are no longer continuing along increasing trajectories (Fig 4B). For example, Western Europe, South Africa, and Australia appear to have slowed in their coverage progress (i.e., SSII across taxa), possibly reflecting challenges in the continued mobilization of existing datasets or a lack of impetus to engage in new initiatives [39]. However, we anticipate that even under constant effort, nations’ coverage may asymptote as marginal gains become more challenging to achieve. Therefore, asymptoting trajectories in data coverage may suggest that nations are operating at maximum capacity. Thus, nations with slowing trends may best contribute to CBD goals by partially shifting their investments in national biodiversity data creation toward supporting targeted data mobilization and capacity-building in nations that have so far lagged behind through direct partnerships [40]. By contrast, much of Asia, South America, and Western and Northern Africa had increasing coverage over the previous decade from initially low values, suggesting encouraging information prospects if trends continue (Fig 4B). Our results underscore the importance of regionally targeted capacity-building and data mobilization initiatives that support regions with historically limited data coverage. Such efforts currently underway include GBIF’s Biodiversity Information Fund for Asia and Biodiversity Information for Development program focused in sub-Saharan Africa, the Caribbean, and the Pacific.

Despite the astounding accumulation of biodiversity records, not all data have translated to new knowledge of species distributions [41]. While potentially useful for other ecological applications, sampling effectiveness of biodiversity data as indexed by the SSEI varied considerably among nations over the previous decade, with lowest effectiveness typically within Western Europe, North America, and Australia (Fig 4D). National trends in effectiveness also appear to be largely driven by the trends in sampling effectiveness for bird species, which has declined rapidly over the past 2 decades, constraining the direct conversion of the immense accumulation of data into data coverage (Fig 4E and 4F).

Differences and trade-offs in data coverage and effectiveness among taxa appear to be largely driven by the way in which data are collected for different taxonomic groups. As of 2016, nearly all records for birds in GBIF (>90%) came from direct observations, as opposed to museum specimens that constituted the primary source of records for amphibians, reptiles, and, to a lesser degree, mammals [26]. However, these differences in sources are likely to narrow as citizen scientist programs not restricted to birds continue to grow in popularity (e.g., iNaturalist). SSII for birds did not surpass that for other classes until the 1980s, despite having an order of magnitude greater number of records. Further, for the same number of records, birds had the lowest SSII among terrestrial vertebrates and appear to only have achieved the highest SSII through sheer volume of records, as opposed to strategic sampling (Fig 3D). Although data coverage for birds increased throughout much of the 20th century, the launch of citizen science platforms such as eBird [42] in the early 2000s undoubtedly played a large role in the expeditious increase in coverage [19]. However, this onslaught of observations has not been maximally leveraged to enhance the global biodiversity information base, as seen in the coincident decline in avian sampling effectiveness (Fig 3E).

The accelerated pace of data coverage for birds compared to other vertebrate groups points to the tremendous role that non-museum–based data collection can play in closing knowledge gaps [4345]. However, the rapid decline in sampling effectiveness we found for bird species, coincident with the growth in citizen science platforms, suggests that these data have not collected to optimally support closing knowledge gaps. While the contributions of citizen science have been invaluable, expanding the impact of citizen science initiatives for information growth will likely benefit from initiatives and guidance addressing the most effective and complementary contributions (i.e., addresses undersampled species or regions) [46,47]. The rapidly changing landscape of citizen science initiatives will need to be complemented by further supporting and growing coordinated programs through international organizations or government agencies that ensure improved data coverage. Citizen science platforms could shift incentives from numbers of records collected or species identified to the value of records contributed. Quantifying and identifying particularly important data contributions through products such as the SSII and SSEI, which can be updated and delivered through the MOL infrastructure, can support naturalists and initiatives to fill key geographic and taxonomic gaps.

Typologies of national monitoring efforts

National biodiversity monitoring is influenced by a myriad of social, political, economic, and geographic factors [21,22,30,48,49]. We categorized nations into the following 4 main types based on Steward’s SSII status and trends over the previous decade: (1) coverage less than the global mean with no or decreasing trend (2010 to 2019) (42% of nations); (2) coverage less than the global mean with an increasing trend (24%); (3) coverage greater than the global mean with no or decreasing trend (17%); and (4) coverage greater than the global mean with an increasing trend (17%) (Fig 5A). We highlight national trajectory examples from each group (Fig 5B). Status and trends in Steward’s SSII differed strongly among continents (Fig 5C).

Fig 5. Typologies of nations’ data coverage and trends.

Fig 5

(a) Mean values and change rates in Steward’s SSII over the previous decade (2010–2019). Horizontal dashed line represents the global mean of Steward’s SSII. Left panels show nations with no significant or decreasing trends in coverage. Right panels show nations with significant (p < 0.01) increasing trends in coverage. We categorized nations into the following 4 main types based on Steward’s SSII status and trends over the previous decade: (1) coverage less than the global mean with no or decreasing trend (2010–2019) (42% of nations); (2) coverage less than the global mean with an increasing trend (24%); (3) coverage greater than the global mean with no or decreasing trend (17%); and (4) coverage greater than the global mean with an increasing trend (17%). (b) Example time series for nations within each type. (c) National assignment to quadrants. Bar plot shows percentages of nations within each quadrant. National boundaries from gadm.org. The data underlying this figure may be found in https://mol.org/indicators/coverage and https://github.com/MapofLife/biodiversity-data-gaps. SSII, Species Status Information Index.

Biodiversity data coverage within Mexico has followed a strong, and increasing, trajectory in both the 20th and 21st centuries. Despite lower coverage through periods of the 20th century, South Africa has had similarly strong and increasing data coverage over the previous decade. Many nations that had historically limited data coverage showed recent increases in coverage, for example, Brazil. These trajectories in data coverage may be due to political decisions and national infrastructure, which supports biodiversity data collection and mobilization. For example, the establishment of a national biodiversity program (CONABIO) [31] and large-scale atlasing efforts, such as the Southern African Bird Atlas Project [50,51].

Through their national commitment to the CBD targets to decrease species extinctions, nations are asked to monitor the species for which they hold greatest responsibility, or, in the case of endemic species, full responsibility. By comparing National and Steward’s SSII, we found that a majority of nations (50%) preferentially survey species for which they hold a high proportion of the global ranges (Fig B in S1 File). This may reflect a tendency for endemic biodiversity to confer special cultural importance and for societal interests to influence research agendas [49] or simply reflect the preferences of citizen scientists aiming to boost their life lists. Selective monitoring based on nations’ stewardship of species may beneficially promote conservation agendas within nations that have primary control of habitats that species rely on. With this goal in mind, our analysis highlights when nations fall behind on sampling species for which they have high stewardship and thus play a particularly large role in species’ conservation (e.g., Indonesia and Costa Rica).

Future directions for tracking global biodiversity knowledge

We recognize the limitations of SSII, or any single metric of biodiversity data coverage, to address the range of research and monitoring needs. Our formulation assumes a specific set of taxonomic, spatial, and temporal units and places a burden on nations with particularly high diversity or large national areas to achieve high scores. Furthermore, the annual time units and a relatively coarse grid for the SSII patterns presented here are insensitive to spatiotemporally dense data that could reveal seasonal dynamics and additional insights offered by repeat samples (e.g., in occupancy modeling frameworks) [52]. Similarly, by penalizing uneven sampling, the SSEI ignores applications that require repeat sampling. In its current form, the SSII also does not account for coverage in environmental space (e.g., as relevant for model-based inference and Essential Biodiversity Variable production) [10]. Further, because the SSII is currently based on static representations of species ranges, it does not capture range dynamics, such as in new invasions or range shifts [9]. This could be addressed through timely updates of range expectations or other invasion-specific information [53]. Dynamically tracking species distributions will be particularly important in cases where species ranges shift across national boundaries, resulting in new monitoring responsibilities. Our methodology and analysis infrastructure is capable of flexibly accommodating different spatial resolutions (Fig C and Text A in S1 File) as more precise information on species’ ranges becomes available (e.g., through species distribution modeling) for a broader range of taxa.

A group of alternative approaches to quantify sampling completeness rely on parametric or nonparametric richness estimates based on extrapolation of assemblage species accumulation curves [27,3436]. These approaches provide an important complementary contribution especially for extremely undersampled or underdescribed taxa where globally comprehensive species range expectations, which are necessary for SSII calculation, remain unavailable. However, richness estimates from extrapolation approaches can vary dramatically with the specific methodology used and structure of input data. As such, there are competing recommendations for their development and use [5456]. The SSII avoids potential pitfalls and the limited transparency of extrapolation approaches by relating record collection directly to best-possible species-level expectations. Therefore, the SSII allows for decision support at the species level, which is not possible with extrapolation approaches. While this study is limited in scope to terrestrial vertebrates, the framework is easily extended to address other taxonomic groups and realms with ongoing, comprehensive distribution mapping efforts, such as plants and certain marine and invertebrate groups [18,57]. The SSII offers an effective initial characterization of biodiversity information at the species, national, and global scales, with the potential to extend the metric to account for different spatiotemporal grains (Fig C in S1 File), taxa, and data types.

Conclusions

The framework and indicators presented here offer a quantitative and comparable characterization of species, national, and global trajectories in closing biodiversity information gaps. The need for more comprehensive quantitative and standardized biodiversity information to support policy and action not only underpins improved Essential Biodiversity Variables [10] but is also recognized as critically needed in recent assessments of IPBES and the post-2020 Global Biodiversity Framework. Our findings suggest that trends in data coverage fundamentally differ by taxa and region and highlight the need to complement and reassess biodiversity sampling strategies to most effectively translate data collection into biodiversity knowledge useful for management and decision-making.

Supporting information

S1 File

Text A. Methods, Supplementary Text, and Supplementary Acknowledgments to support main text. Fig A. National patterns in data collection, coverage, and sampling effectiveness (2010–2019). (a) Change rates in Steward’s SSII and National SSEI. Dashed lines represent zero slopes. (b, c) Relationship and mismatch between Steward’s SSII and total spatiotemporal records collected nationally (b) and the percentage of expected species nationally recorded (c). (d) Relationship between the percentage of expected species nationally recorded and mean National SSEI. The data underlying this figure may be found in https://mol.org/indicators/coverage and https://github.com/MapofLife/biodiversity-data-gaps. Fig B. National stewardship in data coverage. (a) National and Steward’s SSII over the previous decade (2010–2019). Points are colored by the percent difference between National and Steward’s SSII. Dashed line represents the 1:1 line between variables. (b) Relative stewardship of nations, as estimated by percent difference, over the previous decade. Color scale matches that in panel (a). National boundaries from gadm.org. The data underlying this figure may be found in https://mol.org/indicators/coverage and https://github.com/MapofLife/biodiversity-data-gaps. Fig C. Empirical demonstration of the effects of spatial resolution on the SSII and SSEI. (a, b) Thresholded species distribution model output (Ellis-Soto and colleagues, 2021) rescaled to 3 spatial resolutions (110, 55, and 27.5 km) for 2 hummingbird species, (a) the Glowing puffleg (Eriocnemis vestita) and (b) White-sided hillstar (Oreotrochilus leucopleurus). Grid cells are colored by the number of records collected between 2000–2019. (c) Annual SSII (solid lines) and SSEI (dashed lines) computed at 3 spatial resolutions. (d–i) Comparison of SSII (d–f) and SSEI (g–i) values among spatial resolutions (d, g: 100 vs. 55 km; e, h: 55 vs. 27.5 km; f, i: 110 vs. 27.5 km). Gray shading shows 95% confidence interval. Colored text displays slope estimates and 95% confidence intervals for each species (blue: Eriocnemis vestita; green: Oreotrochilus leucopleurus). The data underlying this figure may be found in https://mol.org/indicators/coverage and https://github.com/MapofLife/biodiversity-data-gaps. Fig D. Theoretical examples of the Species Sampling Effectiveness Index (SSEI). Each line corresponds to theoretical cases with different levels of evenness of the distribution of biodiversity records for an idealized species with the same range size. In these examples, the proportion of the sampled range with a single record vs. alternate values (1, 2, 10, 100, and 1,000) is adjusted from 0 to 1. SSEI is highest in cases with uniform or near-uniform sampling (i.e., all grid cells either contain 1 or 2 records). SSEI is lowest in cases with highly uneven sampling (i.e., a mixture of grid cells with either a single record or 100–1,000 records). These examples also highlight that SSEI is identical in the cases where redundant sampling is uniform (i.e., values are the same if all cells have a 1, 10, or 1,000 records). Additionally, SSEI approaches the maximum value when only a small minority of cells contain more than a single record (i.e., the proportion of cells with a single record >90%). Table A. Species example coverage and sampling effectiveness values. Values presented for the jaguar (Panthera onca) and collared peccary (Pecari tajacu) as demonstrated in Fig 2C–2E. The data underlying this table may be found in https://mol.org/indicators/coverage and https://github.com/MapofLife/biodiversity-data-gaps. Table B. National example data coverage and sampling effectiveness values. Values presented for the jaguar (Panthera onca) and collared peccary (Pecari tajacu) as demonstrated in Fig 2F and 2G. The data underlying this table may be found in https://mol.org/indicators/coverage and https://github.com/MapofLife/biodiversity-data-gaps. Table C. National data coverage and sampling effectiveness values over the previous decade (2010–2019). ISO3 codes and mean values for National and Steward’s SSII and SSEI for nations. The data underlying this table may be found in https://mol.org/indicators/coverage and https://github.com/MapofLife/biodiversity-data-gaps.

(PDF)

Acknowledgments

We thank the Map of Life team for their support and expertise, particularly Vijay Barve, Yanina Sica, and Michelle Duong. We are also grateful to the GBIF team, specifically Joe Miller, Tim Hirsch, and Tim Robertson for their help with data and manuscript feedback.

Abbreviations

CBD

Convention on Biological Diversity

GBIF

Global Biodiversity Information Facility

MOL

Map of Life

SSEI

Species Sampling Effectiveness Index

SSII

Species Status Information Index

Data Availability

All supporting data and scripts are available for download at https://github.com/MapofLife/biodiversity-data-gaps associated with the following DOI: https://doi.org/10.48600/MOL-3Y3Z-DW77. National indicator values are directly accessible for download at mol.org/indicators/coverage.

Funding Statement

This study is supported by the EO Wilson Biodiversity Foundation, National Science Foundation grant DEB-1441737 and National Aeronautics and Space Administration grants 80NSSC17K0282 and 80NSSC18K0435 to W.J. C.M. acknowledges funding by the Volkswagen Foundation through a Freigeist Fellowship (A118199), and additional support by iDiv, funded by the German Research Foundation (DFG–FZT 118, 202548816). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Bonebrake TC, Brown CJ, Bell JD, Blanchard JL, Chauvenet A, Champion C, et al. Managing consequences of climate-driven species redistribution requires integration of ecology, conservation and social science. Biol Rev. 2018;93(1):284–305. doi: 10.1111/brv.12344 [DOI] [PubMed] [Google Scholar]
  • 2.Pecl GT, Araújo MB, Bell JD, Blanchard J, Bonebrake TC, Chen I-C, et al. Biodiversity redistribution under climate change: Impacts on ecosystems and human well-being. Science. 2017Mar31;355(6332):eaai9214. doi: 10.1126/science.aai9214 [DOI] [PubMed] [Google Scholar]
  • 3.Díaz S, Zafra-Calvo N, Purvis A, Verburg PH, Obura D, Leadley P, et al. Set ambitious goals for biodiversity and sustainability. Science. 2020Oct23;370(6515):411–3. doi: 10.1126/science.abe1530 [DOI] [PubMed] [Google Scholar]
  • 4.Jung M, Arnell A, de Lamo X, García-Rangel S, Lewis M, Mark J, et al. Areas of global importance for terrestrial biodiversity, carbon, and water [Internet]. Ecology; 2020Apr [cited 2020 Oct 22]. Available from: http://biorxiv.org/lookup/doi/10.1101/2020.04.16.021444 [Google Scholar]
  • 5.Leclère D, Obersteiner M, Barrett M, Butchart SHM, Chaudhary A, De Palma A, et al. Bending the curve of terrestrial biodiversity needs an integrated strategy. Nature. 2020Sep;585(7826):551–6. doi: 10.1038/s41586-020-2705-y [DOI] [PubMed] [Google Scholar]
  • 6.Rounsevell MDA, Harfoot M, Harrison PA, Newbold T, Gregory RD, Mace GM. A biodiversity target based on species extinctions. Science. 2020Jun12;368(6496):1193–5. doi: 10.1126/science.aba6592 [DOI] [PubMed] [Google Scholar]
  • 7.Visconti P, Butchart SHM, Brooks TM, Langhammer PF, Marnewick D, Vergara S, et al. Protected area targets post-2020. Science. 2019Apr11;eaav6886. doi: 10.1126/science.aav6886 [DOI] [PubMed] [Google Scholar]
  • 8.Convention on Biological Diversity. Update of the zero draft of the post-2020 global biodiversity framework. 2020; [Google Scholar]
  • 9.Lenoir J, Bertrand R, Comte L, Bourgeaud L, Hattab T, Murienne J, et al. Species better track climate warming in the oceans than on land. Nat Ecol Evol. 2020Aug;4(8):1044–59. doi: 10.1038/s41559-020-1198-2 [DOI] [PubMed] [Google Scholar]
  • 10.Jetz W, McGeoch MA, Guralnick R, Ferrier S, Beck J, Costello MJ, et al. Essential biodiversity variables for mapping and monitoring species populations. Nat Ecol Evol. 2019Apr;3(4):539–51. doi: 10.1038/s41559-019-0826-1 [DOI] [PubMed] [Google Scholar]
  • 11.Bland LM, Collen B, Orme CDL, Bielby J. Predicting the conservation status of data-deficient species. Conserv Biol. 2015;29(1):250–9. doi: 10.1111/cobi.12372 [DOI] [PubMed] [Google Scholar]
  • 12.Boitani L, Maiorano L, Baisero D, Falcucci A, Visconti P, Rondinini C. What spatial data do we need to develop global mammal conservation strategies? Philos Trans R Soc Lond B Biol Sci. 2011Sep27;366(1578):2623–32. doi: 10.1098/rstb.2011.0117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Guisan A, Tingley R, Baumgartner JB, Naujokaitis-Lewis I, Sutcliffe PR, Tulloch AIT, et al. Predicting species distributions for conservation decisions. Ecol Lett. 2013;16(12):1424–35. doi: 10.1111/ele.12189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Dornelas M, Antão LH, Moyes F, Bates AE, Magurran AE, Adam D, et al. BioTIME: A database of biodiversity time series for the Anthropocene. Glob Ecol Biogeogr. 2018;27(7):760–86. doi: 10.1111/geb.12729 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Edwards JL, Lane MA, Nielsen ES. Interoperability of Biodiversity Databases: Biodiversity Information on Every Desktop. Science. 2000Sep29;289(5488):2312–4. doi: 10.1126/science.289.5488.2312 [DOI] [PubMed] [Google Scholar]
  • 16.Graham CH, Ferrier S, Huettman F, Moritz C, Peterson AT. New developments in museum-based informatics and applications in biodiversity analysis. Trends Ecol Evol. 2004Sep1;19(9):497–503. doi: 10.1016/j.tree.2004.07.006 [DOI] [PubMed] [Google Scholar]
  • 17.Guralnick R, Walls R, Jetz W. Humboldt Core–toward a standardized capture of biological inventories for biodiversity monitoring, modeling and assessment. Ecography. 2018;41(5):713–25. [Google Scholar]
  • 18.Bruelheide H, Dengler J, Jiménez-Alfaro B, Purschke O, Hennekens SM, Chytrý M, et al. sPlot–A new tool for global vegetation analyses. J Veg Sci. 2019;30(2):161–86. [Google Scholar]
  • 19.Amano T, Lamming JDL, Sutherland WJ. Spatial Gaps in Global Biodiversity Information and the Role of Citizen Science. Bioscience. 2016May1;66(5):393–400. [Google Scholar]
  • 20.Jetz W, McPherson JM, Guralnick RP. Integrating biodiversity distribution knowledge: toward a global map of life. Trends Ecol Evol. 2012Mar1;27(3):151–9. doi: 10.1016/j.tree.2011.09.007 [DOI] [PubMed] [Google Scholar]
  • 21.Meyer C, Kreft H, Guralnick R, Jetz W. Global priorities for an effective information basis of biodiversity distributions. Nat Commun. 2015Sep8;6(1):8221. doi: 10.1038/ncomms9221 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Boakes EH, McGowan PJK, Fuller RA, Chang-qing D, Clark NE, O’Connor K, et al. Distorted Views of Biodiversity: Spatial and Temporal Bias in Species Occurrence Data. PLoS Biol. 2010Jun1;8(6):e1000385. doi: 10.1371/journal.pbio.1000385 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Feeley KJ, Stroud JT, Perez TM. Most ‘global’ reviews of species’ responses to climate change are not truly global. Divers Distrib. 2017;23(3):231–4. [Google Scholar]
  • 24.Lenoir J, Svenning J-C. Climate-related range shifts–a global multidimensional synthesis and new research directions. Ecography. 2015;38(1):15–28. [Google Scholar]
  • 25.Meyer C, Jetz W, Guralnick RP, Fritz SA, Kreft H. Range geometry and socio-economics dominate species-level biases in occurrence information. Glob Ecol Biogeogr. 2016;25(10):1181–93. [Google Scholar]
  • 26.Troudet J, Grandcolas P, Blin A, Vignes-Lebbe R, Legendre F. Taxonomic bias in biodiversity data and societal preferences. Sci Rep. 2017Aug22;7(1):9132. doi: 10.1038/s41598-017-09084-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Mora C, Tittensor DP, Myers RA. The completeness of taxonomic inventories for describing the global diversity and distribution of marine fishes. Proc R Soc B Biol Sci. 2008Jan22;275(1631):149–55. doi: 10.1098/rspb.2007.1315 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Sorte FAL, Somveille M. Survey completeness of a global citizen-science database of bird occurrence. Ecography. 2020;43(1):34–43. [Google Scholar]
  • 29.Troia MJ, McManamay RA. Filling in the GAPS: evaluating completeness and coverage of open-access biodiversity databases in the United States. Ecol Evol. 2016;6(14):4654–69. doi: 10.1002/ece3.2225 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Amano T, González-Varo JP, Sutherland WJ. Languages Are Still a Major Barrier to Global Science. PLoS Biol. 2016Dec29;14(12):e2000933. doi: 10.1371/journal.pbio.2000933 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Sarukhán J, Urquiza-Haas T, Koleff P, Carabias J, Dirzo R, Ezcurra E, et al. Strategic Actions to Value, Conserve, and Restore the Natural Capital of Megadiversity Countries: The Case of Mexico. Bioscience. 2015Feb1;65(2):164–73. doi: 10.1093/biosci/biu195 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Farley SS, Dawson A, Goring SJ, Williams JW. Situating Ecology as a Big-Data Science: Current Advances, Challenges, and Solutions. Bioscience. 2018Aug1;68(8):563–76. [Google Scholar]
  • 33.Kays R, McShea WJ, Wikelski M. Born-digital biodiversity data: Millions and billions. Divers Distrib. 2020;26(5):644–8. [Google Scholar]
  • 34.Freeman B, Peterson AT. Completeness of digital accessible knowledge of the birds of western Africa: Priorities for survey. Condor [Internet]. 2019. Aug 26 [cited 2020 Sep 10];121(3). Available from: https://academic.oup.com/condor/article/121/3/duz035/5538066 [Google Scholar]
  • 35.Soberón J, Jiménez R, Golubov J, Koleff P. Assessing completeness of biodiversity databases at different spatial scales. Ecography. 2007;30(1):152–60. [Google Scholar]
  • 36.Yang W, Ma K, Kreft H. Geographical sampling bias in a large distributional database and its effects on species richness–environment models. J Biogeogr. 2013;40(8):1415–26. [Google Scholar]
  • 37.Pereira HM, Jorg F, Simon F, Jetz W. Global Biodiversity Change Indicators. GEO Biodiversity Network; 2015. [Google Scholar]
  • 38.Pielou EC. The measurement of diversity in different types of biological collections. J Theor Biol. 1966Dec1;13:131–44. [Google Scholar]
  • 39.Peterson AT, Soberón J, Krishtalka L. A global perspective on decadal challenges and priorities in biodiversity informatics. BMC Ecol. 2015May29;15(1):15. doi: 10.1186/s12898-015-0046-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Convention on Biological Diversity. Overview of the outcomes of the study to inform the preparation of a long-term strategic framework for capacity-building beyond 2020. 2020. [Google Scholar]
  • 41.Bayraktarov E, Ehmke G, O’Connor J, Burns EL, Nguyen HA, McRae L, et al. Do Big Unstructured Biodiversity Data Mean More Knowledge? Front Ecol Evol [Internet]. 2019. [cited 2020 Sep 11];6. Available from: https://www.frontiersin.org/articles/10.3389/fevo.2018.00239/full [Google Scholar]
  • 42.Sullivan BL, Wood CL, Iliff MJ, Bonney RE, Fink D, Kelling S. eBird: A citizen-based bird observation network in the biological sciences. Biol Conserv. 2009Oct1;142(10):2282–92. [Google Scholar]
  • 43.Pocock MJO, Chandler M, Bonney R, Thornhill I, Albin A, August T, et al. Chapter Six—A Vision for Global Biodiversity Monitoring With Citizen Science. In: Bohan DA, Dumbrell AJ, Woodward G, Jackson M, editors. Advances in Ecological Research [Internet]. Academic Press; 2018. [cited 2020 Sep 1]. p. 169–223. (Next Generation Biomonitoring: Part 2; vol. 59). Available from: http://www.sciencedirect.com/science/article/pii/S0065250418300230 [Google Scholar]
  • 44.Pocock MJO, Roy HE, August T, Kuria A, Barasa F, Bett J, et al. Developing the global potential of citizen science: Assessing opportunities that benefit people, society and the environment in East Africa. J Appl Ecol. 2019;56(2):274–81. [Google Scholar]
  • 45.Theobald EJ, Ettinger AK, Burgess HK, DeBey LB, Schmidt NR, Froehlich HE, et al. Global change and local solutions: Tapping the unrealized potential of citizen science for biodiversity research. Biol Conserv. 2015Jan1;181:236–44. [Google Scholar]
  • 46.Callaghan CT, Rowley JJL, Cornwell WK, Poore AGB, Major RE. Improving big citizen science data: Moving beyond haphazard sampling. PLoS Biol. 2019Jun27;17(6):e3000357. doi: 10.1371/journal.pbio.3000357 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Xue Y, Davies I, Fink D, Wood C, Gomes CP. Avicaching: A Two Stage Game for Bias Reduction in Citizen Science. Proceedings for the 2016 International Conference on Autonomous Agents & Multiagent Systems. 2016:10. [Google Scholar]
  • 48.Amano T, Sutherland WJ. Four barriers to the global understanding of biodiversity conservation: wealth, language, geographical location and security. Proc R Soc B Biol Sci. 2013Apr7;280(1756):20122649. doi: 10.1098/rspb.2012.2649 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Wilson JR, Procheş Ş, Braschler B, Dixon ES, Richardson DM. The (bio)diversity of science reflects the interests of society. Front Ecol Environ. 2007;5(8):409–14. [Google Scholar]
  • 50.Cherry M. South Africa—Serious about Biodiversity Science. PLoS Biol. 2005May17;3(5):e145. doi: 10.1371/journal.pbio.0030145 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Harrison JA, Underhill LG, Barnard P. The seminal legacy of the Southern African Bird Atlas Project. S Afr J Sci. 2008Apr;104(3–4):82–4. [Google Scholar]
  • 52.Mackenzie DI, Nichols JD, Lachman GB, Droege S, Royle JA. Langtimm CA. Estimating site occupancy rates when detection probabilities are less than one. 2002;83(8):8. [Google Scholar]
  • 53.McGeoch M, Jetz W. Measure and Reduce the Harm Caused by Biological Invasions. One Earth. 2019Oct25;1(2):171–4. [Google Scholar]
  • 54.Cam E, Nichols JD, Sauer JR, Hines JE. On the estimation of species richness based on the accumulation of previously unrecorded species. Ecography. 2002;25(1):102–8. [Google Scholar]
  • 55.Colwell RK, Coddington JA, Hawksworth DL. Estimating terrestrial biodiversity through extrapolation. Philos Trans R Soc Lond B Biol Sci. 1994Jul29;345(1311):101–18. doi: 10.1098/rstb.1994.0091 [DOI] [PubMed] [Google Scholar]
  • 56.Lobo JM, Hortal J, Yela JL, Millán A, Sánchez-Fernández D, García-Roselló E, et al. KnowBR: An application to map the geographical variation of survey effort and identify well-surveyed areas from biodiversity databases. Ecol Indic. 2018Aug1;91:241–8. [Google Scholar]
  • 57.Kaschner K, Kesner-Reyes K, Garilao C, Segschneider J, Rius-Barile J, Rees T, et al. AquaMaps: Predicted range maps for aquatic species. 2019; Version 10/2019. Available from: www.aquamaps.org [Google Scholar]

Decision Letter 0

Roland G Roberts

13 Nov 2020

Dear Dr Oliver,

Thank you for submitting your manuscript entitled "Global and national trends in documenting and monitoring species distributions" for consideration as a Research Article by PLOS Biology.

Your manuscript has now been evaluated by the PLOS Biology editorial staff, as well as by an academic editor with relevant expertise, and I'm writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Please re-submit your manuscript within two working days, i.e. by Nov 17 2020 11:59PM.

Login to Editorial Manager here: https://www.editorialmanager.com/pbiology

During resubmission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF when you re-submit.

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. Once your manuscript has passed all checks it will be sent out for review.

Given the disruptions resulting from the ongoing COVID-19 pandemic, please expect delays in the editorial process. We apologise in advance for any inconvenience caused and will do our best to minimize impact as far as possible.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Kind regards,

Roli Roberts

Roland G Roberts, PhD,

Senior Editor

PLOS Biology

Decision Letter 1

Roland G Roberts

19 Jan 2021

Dear Dr Oliver,

Thank you very much for submitting your manuscript "Global and national trends in documenting and monitoring species distributions" for consideration as a Research Article at PLOS Biology. Your manuscript has been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by four independent reviewers. Many thanks for your patience over the holiday period.

You’ll see that the assessments are broadly very positive, but there are multiple requests for you to justify and/or conduct sensitivity analysis on the choice of grid size. There are several further substantial requests, regarding treatment of empty cells, treatment of migrant species, data and code availability (please see PLOS' Data Policy, which is quite stringent), methodological clarifications and statements of limitations. Reviewer #4 also feels that the SSEI metric may be flawed and not truly independent from the SSII, making it redundant over quite a swathe of parameter space; you hould address or rebut this. The Academic Editor asked me to draw your attention to points 1 and 2 from reviewer #1, and the multiple calls for data availability (which we will enforce).

In light of the reviews (below), we will not be able to accept the current version of the manuscript, but we would welcome re-submission of a much-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent for further evaluation by the reviewers.

We expect to receive your revised manuscript within 3 months.

Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension. At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may end consideration of the manuscript at PLOS Biology.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Please submit the following files along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point by point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually, point by point.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Related" file type.

*Re-submission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this re-submission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosbiology/s/submission-guidelines#loc-materials-and-methods

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Roli Roberts

Roland G Roberts, PhD,

Senior Editor,

rroberts@plos.org,

PLOS Biology

*****************************************************

REVIEWERS' COMMENTS:

Reviewer #1:

It was a pleasure to read the manuscript titled "Global and national trends in documenting and monitoring species distributions". I actually saw this posted as a preprint, so getting the opportunity to review it afforded me to take the time and dive in more than I would have as it was in my 'to read' pile.

Overall, I fully support the notion behind what the authors are highlighting here. This is best captured by the idea that 'more data' does not necessarily mean more informed biodiversity decisions and policies (line 61). We are getting more data now than ever before, but are we any better off for it? Who knows! I agree that the general approach to provide quantitative metrics to track a given nation's progress is important. I also think that this could be a method for countries to hold one another accountable in terms of their contributions to biodiversity sampling. I think this paper is suitable for PLoS Biology, and will generate broad interest for an international audience no doubt. However, I think there are a few concerns that if addressed, could significantly strengthen this manuscript. Here I have a few 'major concerns' and then some minor concerns after this. I admit these are somewhat substantial concerns/comments/requests. But in all honesty, I think this has major potential, so would hate to see it published without highlighting the full potential of what you are doing. It will be interesting to see what other Reviewers think about this one, but regardless, I hope these comments are helpful for you!

1) Although I fully support the notion here and behind this paper, I am on the fence about this paper as it currently stands. This is because it currently reads as 50% a methods paper and 50% interpretation of the results. I think the SSII and SSEI are interesting and novel, but the derivation is buried in the methods and not given proper thought at times (e.g., in regard to potential biases that are mentioned in passing). And because of this, I think the manuscript suffers because I am left wanting more of the method development and showing that it truly will work, but also left wanting more of the interpretation and what countries are doing better or worst etc. etc. This is highlighted most in line 240. I really think this would be much stronger if the authors could show that indeed, the metric can be extended at different spatial grains. I would recommend a case study of a few species and going down by the size of the grid, as my current worries is that the 110 km is just far too large for a lot of species. (See next point).

2) I kind of see why the ideal distribution is equal sampling among grid cells. But then this opens up a lot of questions about EOO and AOO, and how the SSII performs in relation to these. The assumption that grid cells are empty then the species doesn't exist there, is not great, and potentially a fundamental flaw as the manuscript currently stands. But I understand you don't get 'absence' information. I think future work would be able to improve these indices by inferring absences based on number of records etc. - especially at large spatial scales. The authors mention this limitation towards the end of the manuscript. But, I think if the authors can demonstrate that this isn't a big issue. Maybe with the jaguar and peccary. Then this would really strengthen the manuscript. It would also be informative to know when this method works for some species as opposed to others

3) What about migrants? This is mostly related to birds. But as I understand, you take all records regardless of time of year and integrate them spatially with grids. But, this could influence the country's reporting as their 'range' only applies to a certain time of year. E.g., neotropical migrants. In the methods the authors argue that by taking the average over many species, some idiosyncrasies among species likely are minimal. But it would be really worth showing this empirically, instead of just making the conceptual argument. Perhaps you could use migrants as a case study.

4) Data availability: According to the version of the manuscript I received, the data availability statement is as follows: "All data is available at the Map of Life (www.mol.org).". I'm a big fan of MOL, but I don't know how to get data our of there. I don't see any type of 'download data' button or tabs. Do I need an account? I just spent about 20 minutes trying to download some datasets, but didn't get anywhere. Making an account is prohibitive, and given the push for open access data, this shouldn't be a necessity. I even tried tracking down the GitHub repo (https://github.com/MapofLife/indicators), but didn't see any actual data here. Given the requirement to publish in PLoS of all data underlying the findings being "fully available", I would hope that the authors can make these data available properly. Even if you can actually get them from MOL and I'm just dense in my ability to do so, I still think as a stand alone paper these data should be available so that the results and findings are reproducible. That is, the data from 1950-2019 for each species, and the SSII and SSEI for each species, as well as probably separate datasets for the country-level averages of these, should be made available in a repository that is permanently archived before acceptance of this manuscript.

Minor comments:

If PLoS allows it, I would suggest using some subheadings in the main text. There are a few jumps to new paragraphs where the flow is not great, and I think subheadings could help alleviate this.

Lines 60-61: Totally agree!

Line 69-74: I don't doubt these statements at all. I'm just wondering if there is any references/qualitative research surrounding this? Mainly, because if there is, it would be cool to highlight it. I suspect such references would be hard to come by, however. But perhaps at the very least, the authors could provide an example of a few nations where they incentivized an improved information base and this was led from a national planning coordination? This is discussed a little bit when the results are interpreted, but an example here would help sell this important point a bit more.

Line 81: "advance and globally implement". So is this method used previously? Has it been published before? Or is what you are doing now, the first time it is being described? This is related to my main point above. If it was published through geo bon, to what extent?

Lines 99-100: Here you say, "not just by the number of records, but also by how effectively the records cover a species' full geographic range". But, then what is the actual difference between SSII and SSEI in words? Or is SSEI just a component of the SSII? Perhaps removing this word 'effectively' and rephrasing will help show that there is indeed a difference between SSII and SSEI?

Line 138: This, to me, is the most likely scenario for Western Europe, South Africa, and Australia. Surely they haven't slowed in their coverage progress, necessarily? But just they are operating at maximum ability?

Lines 140-142: This is interesting! And I agree. But incredibly political. If I understand correctly, the authors are highlighting that, for example, Australia should begin to work on capacity-building in the Pacific Islands and/or Southeast Asia? This is way easier said than done, but I get the point by the authors.

Line 162: It is funny to think of this reference as 'outdated', but they downloaded data from GBIF in 2016, and only used 649 million occurrences (< half of what is in GBIF now). I would bet that the proportion of data from museum specimens has drastically decreased by now in 2020, mainly due to programs like iSPOT and iNaturalist growing in popularity and being integrated into GBIF.

Lines 178-181: I think this is great and I agree!

Lines 203-207: Interesting!!

Line 565: Are these range maps available somewhere in a repository? Can they all be downloaded from MOL for instance? This is core to your analysis, so it would be good to highlight this, although I wouldn't suggest you have to make these available.

Line 578: What about duplicates in GBIF? There are a lot, often in GBIF. https://recology.info/2016/03/scrubr/

Line 617-624: See general comments above.

Line 626-642: ++1 for using geohashes. The field of ecology rarely uses these! Some information here about where your analyses were performed would be handy. Was it using BigQuery (I'm guessing)? Or qGIS?

Lines 712-715: I am not fully following this part. Could you elaborate please.

Lines 830-849: Presumably these will be made available in csv format or some other format for readers? Also, see comments on data availability above.

Reviewer #2:

I enjoyed the introduction it contained a lot of good relevant background information that will be easy to digest for non-experts.

Given the dearth of data at a global scale it is very difficult to identify indicators that will be meaningful a large scale. The SSII and SSEI show promise. SSII quantifies spatiotemporal data while SSEI adds an adjustment for sampling effectiveness. A strength of SSII is indicated by the best sampled group where birds SSII has increased but sampling effectiveness has decreased due to the redundant sampling of citizen science efforts.

Most of the discussion is about the avian results which makes sense due to the higher level of data available for birds. It would be useful to have more text about other taxa. From figure 4 there is less applicability for non-bird data. Is it useful at all for those taxa, specifically reptiles and amphibians?

The limitations section is appreciated. To strengthen your argument for use of these metrics, I would like to hear clear recommendations from the authors to make the metrics more valuable. There clearly is not sufficient data for global recommendations to track some specific taxa to give global coverage, maybe someday. It appears that focused, periodic monitoring of endemic taxa would be the best use of resources, especially at the national level. Since data lack for most groups, can the authors make any recommendations of a subset of taxa that would be useful for nations to track?

Fig 1b: It is unclear to me where the numerators of the aggregate country cover calculations come from.

Starting at line 121: This starts out as a discussion of birds but it is unclear from the second half of the paragraph if it is still just birds or all data in the figures. This needs clarification and separate paragraphs to help the reader.

Line 130: clarify does the 42% refer only to birds. Please add birds to clarify.

Lines 134-136: which figure does this refer too? Any comments on the non-avian portion of the figure?

Figure 5 legend error? Quadrant definitions 1 and 3 as well as 2 and 4 are identical. It is correct in the text, line 185. Please consider changing the color of Fig 5a: light blue for background for quadrant is confusing. It printed out nearly identical blues in my printout.

Line 192: I am not convinced by the discussions correlating trajectories with political decisions and national infrastructures changes.

Paragraph 201 is strong and will be useful for the community to hear.

Would a sliding window view of the data be feasible? The decade level analysis is necessary due to data but would the sliding window show trends better?

Reviewer #3:

[identifies himself as Dominique G Roche]

I was pleased to review PBIOLOGY-D-20-03303_R1 "Global and national trends in documenting and monitoring species distributions". Overall, I think that the research has considerable value. Reducing biodiversity information gaps is a global priority and the proposed framework appears sound. That being said, I am no expert in biodiversity conservation and the relevance/novelty of the indices proposed should be assessed by reviewers with greater expertise in this field. My main criticism of the manuscript is that the text and some of the concepts were difficult to understand at times. Many sentences would benefit from being more clearly written and the figure/table captions were often confusing and/or lacked sufficient information. The figures are critical for readers to understand the content of this manuscript - greater effort should be invested in ensuring that the captions clearly walk the readers through the results.

It would be helpful if the authors included a glossary as a text box to prominently define/explain key terms such as GBIF, MOL, Global/National/Stewards' SSI, SSEI. I think this addition would greatly enhance the ease with which the paper could be read and its accessibility to a broad readership.

The authors mention the importance of scale in the introduction (L64) but there is no justification for the grid size of 110 x 110 km used in the study other than it was "the finest spatial grain appropriate" (L569). There is also no discussion of the potential consequences of using a different scale in their analysis.

The explanation of how 'expected occupied cells' were determined is limited (L587-590).

Few of the test statistics mentioned in the methods (L683-688) are reported in the manuscript. P-values alone are often presented (L722-769) or relationships and differences are only referred to as 'significant' or 'non-significant'.

The manuscript lacks a data availability statement. It is not sufficient to state (in the metadata) that the data are "available at the Map of Life (http://www.mol.org)". The exact link to the specific data used in the study should be provided for reproducibility purposes. Ideally, the data would be downloaded from mol.org and archived in a trusted repository, unless the authors can demonstrate that MOL is funded for the next 50 years (as is the case for trusted repositories). This is because the data disappear is MOL is no longer supported. If the data are available via the GBIF, the authors could share the script used to access them via the API.

I would like to see the authors share the R script used in this study (e.g., via the OSF, Figshare, Zenodo, or some other trusted repository). Given that the aim of the study is to reduce information gaps, it would be nice to see the authors lead by example and readily share their data and code.

Minor comments:

L21. I suggest rephrasing this sentence as: "Here, we propose novel indicators of biodiversity data coverage and sampling effectiveness, and analyze national trajectories in closing spatiotemporal knowledge gaps for terrestrial vertebrates (1950-2019)."

L27. I suggest rephrasing this sentence as: "However, we found that a nation's coverage was stronger for species for which it holds greater stewardship."

L40. Missing reference

L45-47. Grammar

L64. It is unclear what you mean by "expectation". This only becomes evident later in the manuscript.

L67. Grammar. Key correlates of what? This sentence in unclear.

L71-72. This sentence is unclear… do you mean "allow changes in biodiversity data coverage over time to inform decision-making"?

L73. "nations […] stand to gain the greatest benefits from broadly improved biodiversity information" - why? It's unclear to me that this is the case for all nations.

L76-77. Can you cite one or multiple sources in support of this statement? What about the references cited on L229?

L79-81. This sentence is difficult to read. I suggest removing "an updatable framework and". Perhaps write a second sentence after this first one explaining what you mean by 'updatable framework'.

L88-90. I suggest modifying this sentence as follows: "We provide a first global assessment of… [expand] … for terrestrial vertebrates as well as infrastructure to continue tracking these indices at Map of Life (https://mol.org/indicators/coverage)."

L95. Grammar

L93-97. I suggest directing the reader to the relevant panel(s) of Fig. 1 after each term is explained. The current explanation of Steward's SSI is fairly limited in my opinion. If you cannot expand on it in the main text due to the word limit, I would direct the reader to the methods by adding "(see Methods") here rather than in the following sentence.

L102. Is the 'ideal uniform distribution based on Shannon's entropy' really the optimal sampling strategy? Again, I'm no expert in this field but I would have liked to see the rational for this choice, perhaps in the methods.

L110-112. This sentence in unclear.

L108-119. Please refer the reader to specific panels in Fig. 1 throughout this paragraph.

L121. I suggest using a word other than 'exploded'.

L134. This sentence should be supported by a reference to a figure or paper.

L 143. Ref needed.

L144-147. The rational for this statement needs to be better explained.

L149-150. This sentence is unclear.

L155. Unclear.

L163. What is meant by "for the same number of records". Can you point the reader to a figure?

L170. Ref to figure needed.

L175. For clarity, please expand (perhaps in parenthesis) on what you mean by "effective and complementary".

L180. Given the importance of MOL in the context of this study, it would be helpful to have a box clearly explaining what it is.

L185-188. This is repeated twice, on L690-694 and L754-758. I have a really hard time making sense of Fig 5a based on this description and the figure caption…

L232-233. Grammar

L234. What are "limited transparencies of extrapolation approaches"? This sentence would benefit from being shortened. Perhaps split it into two?

I hope my comments are helpful.

Best regards,

Dom Roche

Reviewer #4:

[identifies himself as Jonathan Lenoir]

General comments

I read the work from Ruth Y. Oliver et al. with great interest and I think the authors are tackling an important topic which echoes the recent literature highlighting gaps and biases in our knowledge of global biodiversity distribution (and redistribution). In this manuscript, the authors focused on terrestrial vertebrates to assess both the global and national trends in the sampling effort to collect reliable data on species distribution. By doing so, the authors aim at highlighting, for each country separately, spatiotemporal trajectories in closing knowledge gaps in the distribution of terrestrial vertebrates. To achieve this goal, the authors built two species-specific metrics, namely (i) the species status information index (SSII) and (ii) the species sampling effectiveness index (SSEI), while accounting for country-specific stewardship of species (i.e. how responsible a country is for a given species, which is determined by the portion of the focal species global range that is occurring in the focal country) when aggregating across species the information per country. Using these two metrics, the authors found a rapid global increase, during the last decades, in the relative sampling coverage of each species distribution (SSII), especially so for some taxonomic groups, like birds. However, this rapid global increase in SSII further amplified the strong and existing geographic bias among countries, leading to country-specific SSII trajectories. Noteworthy, the authors suggested that the tremendous growth of species occurrence records in some countries failed to directly translate into newfound knowledge due to a sharp decline in sampling effectiveness (SSEI).

The manuscript is very well written and the figures are really helpful and informative, not only for displaying the main findings (Figs. 3-5) but also for helping the reader to understand the metrics used by the authors (especially Figs. 1 and 2). I really liked it as these figures are very intuitive, thus I would like to congratulate the authors for this effort. This said, I do have several important concerns and reservations that, I think, warrant publication of the manuscript as is. My first and main concern is about the SSEI metric which I think is somewhat flawed and inherently related to SSII. Indeed, the SSEI metric works such that the more occurrence records, the more likely it is that the distribution is uneven among the sampled grid cells. Let's for instance take the most extreme case of one single occurrence record for a given species. This means that only one grid cell among all the possible grid cells that are expected to be occupied is actually occupied and thus SSII is very low for that species, either globally or at the national level (except if the focal country is of the size of the occupied grid cell). For the SSEI metric, then the value is 1 just because the distribution is completely even across the occupied cells (1 occurrence record occurring in one grid cell: perfectly even). Hence, there is a mathematical relationship between SSEI and SSII, which are not fully independent from each other, such that the correlation can only be negative and the data constrained within an upper triangle when relating SSII (x-axis) against SSEI (y-axis). This is very well illustrated by the inset plot in Fig. 4d. As it is now, in my opinion, the SSEI metric is a bit useless, except towards very large values of SSII (when reaching completeness in the species distribution), because only then the SSEI metric bring new insights. However, for low SSII values, the SSEI has no meaning. A more meaningful and useful, I think, version of the SSEI metric should account for the total number of empty cells to further penalize the SSEI metric when only a few cells are occupied among the number of expected cells supposed to be occupied (cf. low SSII values).

My second major concern is about the very strong assumption that terrestrial vertebrate species have completely static distribution throughout the 70 years of the study period (1950-2019) and that any species range shifts would have little impact on the authors' analyses (see lines 570-572). This is especially problematic for the last three decades (1990-2019), during which evidence of species range shifts as climate warms were exploding (see Lenoir et al. 2020). Hence, I suggest the authors to carefully consider that important and recognized fact (species redistribution as global climate warms) in the scientific literature (Parmesan et al. 2003; Chen et al. 2011; Lenoir& Svenning 2015; Pecl et al. 2017; Lenoir et al. 2020) and at the very minimum discuss the implication of trans-boundary shifts in species distribution (e.g. species range expansions into new countries), which should alter the authors finding. Even better, but this involves quite some work I have to admit, would be to account for species range shifts by adjusting each species range map by means of species distribution models (SDMs), for instance. Alternatively, one cheaper solution involving less work for the authors would be to split the analyses into two periods to distinguish a period during which the authors' assumption is likely to hold (cf. a 30-yr period prior to climate warming: 1950-1979) and a second 30-yr period (1990-2019) during which species range shifts may alter the authors' findings, thus requiring to at least discuss the potential implications of trans-boundary shifts during this second time period. As for the authors' defence, terrestrial vertebrates are among the taxonomic groups showing the slowest (most often non-significant but not always: e.g. significant latitudinal range shifts for reptiles) velocities in species latitudinal range shifts, as opposed to marine species (see Fig. 3a in Lenoir et al. 2020). This said, terrestrial vertebrate species showed significant upslope range shifts (especially amphibians, but also birds and mammals: see Fig. 3b in Lenoir et al. 2020) during the last three decades. But I assume elevational range shifts will have relatively more minor impacts on trans-boundary shifts, except maybe for some highly mountainous countries like Bhutan, Nepal, Lesotho, Andorra, Chile or Switzerland where new species may arrive from the neighbouring lowland countries. Hence, I really urge the authors to at least discuss those important implications of trans-boundary shifts if they do not deem necessary (or if they think it is impossible) to account for species range shifts in their analyses, especially during the most recent (1990-2019) period.

Finally, as a relatively more minor concern, I think the authors should also better acknowledge the recent scientific literature highlighting the strong geographic, taxonomic but also methodological biases in species distribution and redistribution (as climate warms) which altogether suggests that no global biodiversity dataset or meta-analysis on biodiversity changes so far is truly global as it is most often claimed (see Brown et al. 2016; Feeley et al. 2017; Lenoir et al. 2020; Nunez & Amano 2021). See my specific comments below for more detailed suggestions which I hope the authors will find useful for their work.

Specific comments to the authors

Line 1: In the title, I think it is important to specify that this work focuses on terrestrial vertebrates only. Indeed, and although the same approach could be applied to other taxonomic groups as you mentioned in the text, this approach strongly relies on expert knowledge for species range maps, which is a strong limitation for a lot of very important taxonomic groups like plants or insects.

Line 26: The sharp decline in sampling effectiveness is mathematically expected as SSII increases (see my general comments).

Lines 34-35: Here, when mentioning the manifold consequences of biodiversity changes, you could cite some references like Pecl et al. (2017) or Bonebrake et al. (2018).

Lines 43-44: Indeed and databases on species range shifts already exist, like the BioShifts database (see Lenoir et al. 2020) which is freely available (https://doi.org/10.6084/m9.figshare.7413365.v1).

Line 50: What about the need for long-term time series of monitoring data in order to assess biodiversity changes (see the BioTiME database from Dornelas et al. 2018)?

Lines 60-64: About taxonomic and geographic biases, the exact same pattern applies for data on species redistribution (see Feeley et al. 2017; Lenoir & Svenning 2015; Lenoir et al. 2020). Besides these biases, there is also important methodological biases in the way data are recorded and then used in subsequent quantitative analyses (see Brown et al. 2016; Lenoir et al. 2020). I think it is also worth pointing at in the list of biases that are currently acknowledged in the scientific literature.

Line 65: Not only socio-economic and ecological drivers to explain data gaps but also linguistic drivers. Indeed languages have also been highlighted as important barriers for global science in general (see Amano et al. 2016).

Line 75: About growing data for plant species distribution at the global extent, are you aware of the sPlot database (Bruelheide et al. 2019)? This one also suffers from the same geographic biases.

Line 85: See my general comment about the SSEI metric.

Line 92: About the optimal spatial resolution of your grid (110 km * 110 km at the equator), how did you set it exactly? Why do you consider this spatial resolution the most appropriate and finest spatial resolution you can use? Why not trying a finer spatial resolution? Did you run a sensitivity analysis varying the size of the spatial resolution? This was not clear in the methods section (cf. lines 569-570).

Lines 97-98: What about trans-boundary shifts under contemporary climate change? Some species are shifting poleward in latitude and upward in elevation, thus changing and potentially reshuffling the national stewardships you used in your analysis. This is an important matter (see my general comments), no?

Lines 101-102: Why constraining the SSEI metric to the realized spatial distribution of records? This is a bit misleading as you do not account for absence data in the other grid cells that are expected to be occupied. Yet, it is what matters in the end to get a uniform spatial distribution of records across all grid cells that are expected to be occupied, right? Why not penalizing the SSEI metric by the total number of grids cells supposed to be occupied but empty? Is it because you suppose that a grid cell expected to be occupied but empty could mean that the species is truly absent? To distinguish between that case (a true absence) and the other case of a false absence due to less sampling effort, you could use information from other species occurrence records during the same time period. I mean, if there is a high density of occurrence records for other terrestrial vertebrates in a grid cell where the focal species is absent but supposed to be occurring, then it is likely to be a true absence, right?

Lines 112-113: Indeed, but this is expected since the more occurrence records you sample, the more likely it is to be unevenly distributed throughout a relatively larger number of occupied grid cells (higher SSII values).

Lines 121: Indeed, see also the recent release of: sPlot, the global vegetation plot database (Bruelheide et al. 2019) providing data on plant co-occurrence; BioTIME, the global database on biodiversity time series (Dornelas et al. 2018); or BioShifts, a database on species range shifts (Lenoir et al. 2020).

Lines 128-129: Again, this is very much expected and not surprising given how SSEI works (see my general comments).

Line 135: What do you mean exactly by "slowed in their coverage progress"? Do you mean according to SSII or according to SSEI? Or both?

Lines 150-153: According to the realized distribution yes, it is true that sampling effectiveness has decreased in countries with increased SSII (expected pattern), but according to the expected distribution, this is not necessarily the case, right? Again, I think the way the SSEI is constructed is problematic as it is specifically focused on the realized distribution of records and it is completely blind to the expected distribution of records (cf. it does not account for the distribution of cells expected to be occupied but empty).

Lines 165-170: This pattern for birds (increasing SSII but decreasing SSEI) is likely due to the fact that citizen science data are everything except strategic sampling and thus the increase in spatial coverage in occurrence records automatically comes with a decrease in the spatial evenness of those records across the sampled area, this is well expected (see my general comments). The more data we collect, the more it is likely to be unevenly distributed in space because humans never collect data randomly in the field. Once a good spatial coverage in occurrence records is achieved, one needs to invest in strategic sampling, which cannot really be achieved by opportunistic data, except under strong guidance by the scientific community to orient citizen science data towards a strategic sampling. I think this is something important to discuss. The rapid increase in citizen science data in ecology is a great thing but it is definitely not the most efficient approach to reach a strategic sampling design if citizen science is not undertaken under the guidance of the scientific community. Not even mentioning the issue of data quality, this type of opportunistic and chaotic data (in terms of spatial distribution) comes with costs as it is then very difficult to use and analyse such data when monitoring biodiversity changes over time. Opportunistic data as collected by GBIF cannot really replace a strategic sampling that is designed for the purpose to assess biodiversity changes. Maybe this could be discussed and highlighted to better balance the discussion around citizen science data.

Lines 172-174: Ok, but this comes with costs (see my comment just above) and it should be discussed. I have the impression that the discussion is only oriented towards the benefits of citizen science data without discussing its drawbacks (e.g. uneven distribution of data).

Lines 176-178: Indeed, this is very important and you could expand a bit this part of the discussion (see my comments above).

Lines 183-184: Linguistic factors as well (Amano et al. 2016).

Lines 184-188: Why did you only use the SSII metric when building your categories? I find it a bit strange that you did not also incorporate the SSEI metric in your categorization. Maybe this is somewhat related to my comment on the fact that SSEI is a bit meaningless under low values of SSII.

Lines 210-212: Can you provide examples of countries reflecting this situation?

Lines 214-225: Here, you discuss the drawbacks of the SSII metric, which is nice, but what about the drawbacks of the SSEI metric (cf. my general comments on the issue that SSEI is not really informative for low values of SSII)? Besides, when discussing the issue of invasive species for the SSII metric, you should also remind the reader about the static nature of SSII (cf. you do not account for potential species range shift under climate change) and the fact that trans-boundary shifts of native species under anthropogenic climate change may completely change the pattern you found for SSII, and especially so for steward's SSII.

Line 235: Species-level expectations should account for species redistribution under contemporary climate change, this is not a minor issue (see my general comments).

Line 238: Please provide citations to illustrate this increase in mapping efforts for other taxonomic groups and systems such as plants (see Bruelheide et al. 2019) and marine taxa (cf. OBIS data).

Lines 322-323: Shading are not visible in panel c. Is it because of very low 95%CI?

I sincerely hope that my comments and suggestions will help both the authors to improve their work as well as the editorial board to take the right decision on the present manuscript. It was a real pleasure to read and review this inspiring work.

Jonathan Lenoir

References

Amano et al. (2016) Languages Are Still a Major Barrier to Global Science. PLoS Biology, 2000933

Bonebrake et al. (2018) Managing consequences of climate‐driven species redistribution requires integration of ecology, conservation and social science. Biological Reviews, 93, 284-305

Brown et al. (2016) Ecological and methodological drivers of species' distribution and phenology responses to climate change. Global Change Biology, 22, 1548-1560

Bruelheide et al. (2019) sPlot - A new tool for global vegetation analyses. Journal of Vegetation Science, 30, 161-186

Chen et al. (2011) Rapid range shifts of species associated with high levels of climate warming. Science, 333, 1024-1026

Dornelas et al. (2018) BioTIME: A database of biodiversity time series for the Anthropocene. Global Ecology and Biogeography, 27, 760-786

Feeley et al. (2017) Most "global" reviews of species' responses to climate change are not truly global. Diversity & Distribution, 23, 231-234

Lenoir & Svenning (2015) Climate-related range shifts - a global multidimensional synthesis and new research directions. Ecography, 38, 15-28

Lenoir et al. (2020) Species better track climate warming in the oceans than on land. Nature Ecology & Evolution, 4, 1044-1059

Parmesan et al. (2003) A globally coherent fingerprint of climate change impacts across natural systems. Nature, 421, 37-42

Pecl et al. (2017) Biodiversity redistribution under climate change: Impacts on ecosystems and human well-being. Science, 355, eaai9214

Decision Letter 2

Roland G Roberts

18 May 2021

Dear Dr Oliver,

Thank you for submitting your revised Research Article entitled "Global and national trends in documenting and monitoring species distributions" for publication in PLOS Biology. I have now obtained advice from three of the original reviewers and have discussed their comments with the Academic Editor. 

Based on the reviews, we will probably accept this manuscript for publication, provided you satisfactorily address the remaining points raised by the reviewers. Please also make sure to address the following data and other policy-related requests.

IMPORTANT:

a) Your findings are somewhat nuanced, but we wondering whether they could be indicated in some way in your Title to make it more informative (something like "Global and national trends in documenting and monitoring species distributions identify opportunities to close critical information gaps").

b) Please attend to the remaining requests from revs #3 and #4. The Academic Editor particularly wanted me to stress the need to ensure that the precedence for SSEI is made clear, as requested by rev #4 ("I would like to see the authors acknowledge the equivalence of SSEI with preceding indices (Lenoir review) and tone down claims of novelty. The results are important without this claim.").

c) You'll see that rev #1 raises strong concerns about data availability (as did rev #3 in the previous round). This is clearly an important and emotive issue in the community, and we encourage you to do your utmost on this front. Specifically, we remind you of your commitment previously expressed in the Response to Reviewers: "MOL has submitted an application to be formally recognized as a trusted data repository. If that status is not in place by the time of publication, all information necessary to replicate the results will in addition to on MOL also be made available on a formally already recognized

repository."

d) You should also familiarise yourself with our data policy. This does have exemptions for third party data, but please could you provide information on the license or a letter/email from the third party (IUCN), explicitly stating that the data cannot be shared even with appropriate credits? If the data can be shared, then you should do so.

e) Please attend to my Data Policy requests below. Specifically, we will require any code needed to reproduce your results to be made available as supplementary files or in e.g. Github. We'll also need the numerical values underlying Figs 2CDEFG, 3ABCDEF, 4CF, 5ABC, S1ABCD, S2AB, S3CDEF. In addition, please cite the location of the data clearly in each relevant Fig legend.

As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.

We expect to receive your revised manuscript within two weeks.

To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following:

-  a cover letter that should detail your responses to any editorial requests, if applicable, and whether changes have been made to the reference list

-  a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable)

-  a track-changes file indicating any changes that you have made to the manuscript. 

NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:

https://journals.plos.org/plosbiology/s/supporting-information  

*Published Peer Review History*

Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*Early Version*

Please note that an uncorrected proof of your manuscript will be published online ahead of the final version, unless you opted out when submitting your manuscript. If, for any reason, you do not want an earlier version of your manuscript published online, uncheck the box. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please do not hesitate to contact me should you have any questions.

Sincerely,

Roli Roberts

Roland G Roberts, PhD,

Senior Editor,

rroberts@plos.org,

PLOS Biology

------------------------------------------------------------------------

DATA POLICY:

You may be aware of the PLOS Data Policy, which requires that all data be made available without restriction: http://journals.plos.org/plosbiology/s/data-availability. For more information, please also see this editorial: http://dx.doi.org/10.1371/journal.pbio.1001797 

Please make all code required to reproduce your results available, either as a supplementary file or in a repository such as Github. We also require all the numerical values underlying the figures and results of your paper be made available in one of the following forms:

1) Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore).

2) Deposition in a publicly available repository. Please also provide the accession code or a reviewer link so that we may view your data before publication. 

Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in the following figure panels as they are essential for readers to assess your analysis and to reproduce it: Figs 2CDEFG, 3ABCDEF, 4CF, 5ABC, S1ABCD, S2AB, S3CDEF. NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values).

IMPORTANT: Please also ensure that figure legends in your manuscript include information on where the underlying data can be found, and ensure your supplemental data file/s has a legend.

Please ensure that your Data Statement in the submission system accurately describes where your data can be found.

------------------------------------------------------------------------

DATA NOT SHOWN?

- Please note that per journal policy, we do not allow the mention of "data not shown", "personal communication", "manuscript in preparation" or other references to data that is not publicly available or contained within this manuscript. Please either remove mention of these data or provide figures presenting the results and the data underlying the figure(s).

------------------------------------------------------------------------

REVIEWERS' COMMENTS:

Reviewer #1:

I sincerely appreciate the work put forth by the authors here. After seeing the decision come through, I thought "wow - four massive reviews - that is a lot of work". I'm glad, however, that the authors decided to undertake this work. I believe they struck a nice balance of appeasing Reviewers and also strengthening their manuscript, without going down too many rabbit holes. Well done.

All in all, I believe the text is sufficiently revised for publication.

BUT, I have one sticking point. And ultimately, this is up to the Editorial Board at PLoS Biology. This was also highlighted by Reviewer 3 in the original round of review. That is 'data availability'.

I mean no offense, but I strongly disagree with the notion of "visually accessible" (the wording in the current data availability statement). To be abrupt, I don't think I've seen this wording yet. And I honestly don't know what this means, but to me, it does not meet the notion of data accessibility/availability that we as a conservation biology community should be striving for. And even for the indicators information, I still don't know how to get these data. I went to https://mol.org/indicators/ as suggested by the data availability statement. But I still don't see any 'download' button or any way to get the data. All I see is 'explore'. I believe that an 'account' should not need to be created in order to access these data (I'm partially assuming that is how I can download the entire dataset). These data need to be made available with the paper as a current record of downloadable and reproducible research in my opinion. All the data, not simply one country at a time. Especially, for such a prestigious outlet of PLoS Biology. I understand the idea of IUCN not making range data available (which is a sticky problem in and of itself) and your hands being tied, so to speak. But regardless, the authors should strive to make as much data available as possible to reproduce these results. I apologize if this sounds harsh, but I think if real change is going to ever happen in our (the scientific community, specifically ecology/biodiversity research) push for available data and reproducibility then it starts with big/prestigious labs (e.g., Walter's group) publishing papers in big outlets (e.g., PLoS Biology) going the extra mile to ensure the data are available and results reproducible beyond "visually accessible".

Best of luck with your future work and I look forward to seeing this manuscript online soon!

Reviewer #3:

[identifies himself as Dominique Roche]

The authors have put considerable work into addressing the reviewer comments and . The only comment I have pertains to the need to present test statistics (and effect sizes, when possible) alongside p values in the main text and the ESM. P values should also be reported to three decimal places. Although not necessary, it would be helpful if the authors also included the sample size or degrees of freedom as appropriate. I cannot indicate line numbers as there are none in the manuscript. Examples include:

Main text:

"Globally, only approximately half of nations (42%) showed increasing, significant trends (p < 0.01) in coverage averaged across taxa over the previous decade (Fig. 4b)."

Captions for Figs 4 and 5. The term 'significant' is used repeatedly without reference to the ESM where the methods and statistical results are presented.

ESM:

"The temporal patterns in SSII are different, with birds only exceeding the three other groups

after 1980, but since then showing near linear-growth in taxon-wide SSII and exceeding other

classes in 2019 by nearly 10-fold (Fig. 3c)." What statistics were used to infer these results? If it was a comparison of confidence intervals, perhaps re-state here for clarity.

"Steward's SSII has recently increased in a majority of nations (84%), particularly in North

America and southern and eastern Europe with nearly half of nations (42%) showing significant

(p < 0.001) increasing trends (Fig. 3b). Of the minority (13%) with decreasing rates, Finland had

the most rapid decrease (-0.021 SSII/year). Despite mostly positive trends, much of Africa and

Asia saw only negligible increases in indicator values over the last decade, with the exceptions

of India, Sri Lanka, and South Korea which showed large increases in data coverage. Nations

were nearly evenly split between either non-significant and significantly increasing Steward's

SSII for resident bird species (52.8% and 47.2%, respectively, none decreasing; Fig. 3c). Most

nations did not have significant trends in data coverage for mammals (85.8%), amphibians

(89.9%), and reptiles (81%)."

"Recent National SSEI differed strongly among nations (Fig. 4d, Supplementary Table 3).

National SSEI was generally lower within western Europe, North America, and Australia.

National SSEI and Steward¶s SSII were weakl\\, negativel\\ correlated (Spearman¶s rho = -0.52,

p < 0.001). A majority of nations (51%) had decreasing SSEI across terrestrial vertebrates,

however only 11% of nations globally had significant (p < 0.01), decreasing trends (Fig. 3e).

These nations included the United States, Canada, Italy, and South Africa. Decreasing trends in

SSEI were most common for bird species (27.5%) (Fig. 4f)."

etc.

Reviewer #4:

[identifies himself as Jonathan Lenoir]

General comments

I was one of the four reviewers (reviewer #4) during the former round of review. I read the authors' responses to my comments (as well as their responses to the comments from the other three reviewers) and I particularly appreciate the efforts made by the authors to address most of the concerns collectively arising from the four reviewers. For instance, the new Supplementary Fig. 3 on the impact of varying grain sizes on the SSII metric is a great addition in terms of sensitivity analysis, as requested by reviewers #1 and #3. But then, why not also assessing the impact of varying spatial grains on the other metric: SSEI? Or is it because the SSEI metric is unaffected by the variation in the spatial grain? If so, it would be nice to precise it and show it.

About SSEI, I am also very grateful to the authors for considering my main initial concern on the potential link between SSII and SSEI. The additional explanations provided by the authors and the new panel in Supplementary Fig. 4 are really helpful in that respect. Also, the important clarification on the special case of just a single grid cell where there is one or several records of occurrence for the focal species is indeed important. In fact, such cases would lead to a maximum entropy (H*) being equal to zero (log(1)=0) and thus the SSEI index would equal Inf value. So, I am grateful to the authors for also clarifying this issue.

This said, I am still convinced that the SSEI metric is only useful when N, the total number of records (i.e. total number of occurrence records for a given species), is several order of magnitude larger (e.g. 10 to 100 times at least) than G, the total number of grid cells where there is actual information on sampling effort. Indeed, both panels (a) and (b) in Supplementary Fig. 4 are actually illustrating this very well since SSEI can only reach very low values and thus be highly variable (which is a very important feature for discriminating different sampling effectiveness situations) when the total number of records is way larger than the total number of grid cells sampled. So, I would at least recommend to discuss that inherent property of the SSEI metric and recommend the authors to warn the reader about this and that the SSEI metric will be especially relevant under high sampling effort (in terms of total number of occurrence records) relative to the number of grid cells that are sampled.

Still about the SSEI metric and information theory (IT), I would like to mention here that this metric is, simply put, the Pielou's index of diversity, also called equitability or evenness index (cf. the empirical entropy measured by Shannon's index divided by the maximum entropy given by Shannon's index). The analogy with the way the Shannon's index is used in Ecology to measure species richness is that maximum entropy is given by H*=log(S) where S is the total number of species and then N is the total number of individuals, with ni being the number of individuals for species i. Here, the authors did not consider S but G, the total number of grid cells sampled with N being the total number of occurrence records and not the total number of individuals. Hence, there is nothing new or novel for me with the SSEI index itself because it is a metric that already exists in IT and that is widely used in ecology. Hence, the authors' claim in the abstract that "we propose novel indicators of biodiversity data coverage and sampling effectiveness" is an overstatement to me. Yet, the idea of the authors to borrow the Shannon's and Pielou's index from IT for assessing the equitability of the sampling of occurrence records over the grid cells where the species is known to occur is indeed a novel application of the Pielou's index and an interesting one. This is why I think this study has definitely a great potential, not because of the SSEI metric itself (nothing new with that because it is Pielou's index) but because of its application to assess sampling effectiveness across the set of grid cells where data is available. So, maybe it would be nice that the authors tone down this claim of "novelty" on the SSEI metric and actually refer to the Pielou's index when mentioning the SSEI metric, just to relate to existing metrics from IT. By the way, I actually think there is a mistake in the text explaining the SSEI metric as the formula written in the Supplementary Materials, Methods section (SSEI subsection), for H* (cf. H*=log(N)) is wrong. Indeed, N is the total number of records here, while the formula of maximum entropy should be H*=log(G), where G is the total number of sampled grid cells. I assume that only the formula of H* in the text of the Supplementary Materials is wrong and that the authors actually used the right formula in their computations and analyses, but it would be nice that the authors confirm this is actually the case. Note that the authors are sometimes mixing G with N in their responses to my former comments. For instance, when mentioning the special case of a single grid cell (G), the authors mentioned a single occurrence record (N), which is different. Indeed, the same issue applies with several occurrence records if they all fall inside the same grid cell Gi, so it is the total number of sampled grid cell G that matters here and which should be strictly greater than 1.

Sorry to insist on the metrics used here but these are quite central to the whole study. In that respect, I invite the authors to also consider the Simpson's index for assessing the evenness and equitability of the sampling across the sampled grid cells (cf. SSEI). Indeed, the Simpson's index has the advantage to account for the total number of grid cells that are actually sampled which is not the case for the Pielou's index used by the authors. Indeed, under perfectly even sampling effort, Pielou's index will give the exact same value of 1, or perfect evenness, if just 2 or 1000 grid cells are sampled while the Simpson's index will give a higher value for the situation in which more grid cells are sampled.

About my second major concern regarding species range shifts over time, the authors did a good job to address this point and to acknowledge the existing scientific literature on the matter. I have nothing else to add at this stage regarding this second concern I had. I agree with the authors that it would be too much to integrate in this study the temporal dynamic of species range shifts but it is good that the authors discuss their approach in light of this and that it can be implemented in the future given the increasing amount of biodiversity time series.

Again, I would like to thank the authors for addressing and answering my initial comments and concerns. I hope these new suggestions in light of the revised version of their work will further help them.

Yours sincerely,

Jonathan Lenoir

Decision Letter 3

Roland G Roberts

22 Jun 2021

Dear Ruth,

On behalf of my colleagues and the Academic Editor, Craig Moritz, I'm pleased to say that we can in principle offer to publish your Research Article "Global and national trends, gaps, and opportunities in documenting and monitoring species distributions" in PLOS Biology, provided you address any remaining formatting and reporting issues. These will be detailed in an email that will follow this letter and that you will usually receive within 2-3 business days, during which time no action is required from you. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have made the required changes.

IMPORTANT: Many thanks for clarifying the data provision. Please could you also include clear mentions of its location in each relevant main and supplementary Figure legend? e.g. "The data underlying this Figure may be found in https://github.com/MapofLife/biodiversity-data-gaps". This may look repetitive, but we want each Figure (and its data) to be standalone. I've flagged to my colleagues that I've requested this change.

Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.

PRESS: We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have not yet opted out of the early version process, we ask that you notify us immediately of any press plans so that we may do so on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for choosing PLOS Biology for publication and supporting Open Access publishing. We look forward to publishing your study. 

Sincerely, 

Roli

Roland G Roberts, PhD 

Senior Editor 

PLOS Biology

rroberts@plos.org

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File

    Text A. Methods, Supplementary Text, and Supplementary Acknowledgments to support main text. Fig A. National patterns in data collection, coverage, and sampling effectiveness (2010–2019). (a) Change rates in Steward’s SSII and National SSEI. Dashed lines represent zero slopes. (b, c) Relationship and mismatch between Steward’s SSII and total spatiotemporal records collected nationally (b) and the percentage of expected species nationally recorded (c). (d) Relationship between the percentage of expected species nationally recorded and mean National SSEI. The data underlying this figure may be found in https://mol.org/indicators/coverage and https://github.com/MapofLife/biodiversity-data-gaps. Fig B. National stewardship in data coverage. (a) National and Steward’s SSII over the previous decade (2010–2019). Points are colored by the percent difference between National and Steward’s SSII. Dashed line represents the 1:1 line between variables. (b) Relative stewardship of nations, as estimated by percent difference, over the previous decade. Color scale matches that in panel (a). National boundaries from gadm.org. The data underlying this figure may be found in https://mol.org/indicators/coverage and https://github.com/MapofLife/biodiversity-data-gaps. Fig C. Empirical demonstration of the effects of spatial resolution on the SSII and SSEI. (a, b) Thresholded species distribution model output (Ellis-Soto and colleagues, 2021) rescaled to 3 spatial resolutions (110, 55, and 27.5 km) for 2 hummingbird species, (a) the Glowing puffleg (Eriocnemis vestita) and (b) White-sided hillstar (Oreotrochilus leucopleurus). Grid cells are colored by the number of records collected between 2000–2019. (c) Annual SSII (solid lines) and SSEI (dashed lines) computed at 3 spatial resolutions. (d–i) Comparison of SSII (d–f) and SSEI (g–i) values among spatial resolutions (d, g: 100 vs. 55 km; e, h: 55 vs. 27.5 km; f, i: 110 vs. 27.5 km). Gray shading shows 95% confidence interval. Colored text displays slope estimates and 95% confidence intervals for each species (blue: Eriocnemis vestita; green: Oreotrochilus leucopleurus). The data underlying this figure may be found in https://mol.org/indicators/coverage and https://github.com/MapofLife/biodiversity-data-gaps. Fig D. Theoretical examples of the Species Sampling Effectiveness Index (SSEI). Each line corresponds to theoretical cases with different levels of evenness of the distribution of biodiversity records for an idealized species with the same range size. In these examples, the proportion of the sampled range with a single record vs. alternate values (1, 2, 10, 100, and 1,000) is adjusted from 0 to 1. SSEI is highest in cases with uniform or near-uniform sampling (i.e., all grid cells either contain 1 or 2 records). SSEI is lowest in cases with highly uneven sampling (i.e., a mixture of grid cells with either a single record or 100–1,000 records). These examples also highlight that SSEI is identical in the cases where redundant sampling is uniform (i.e., values are the same if all cells have a 1, 10, or 1,000 records). Additionally, SSEI approaches the maximum value when only a small minority of cells contain more than a single record (i.e., the proportion of cells with a single record >90%). Table A. Species example coverage and sampling effectiveness values. Values presented for the jaguar (Panthera onca) and collared peccary (Pecari tajacu) as demonstrated in Fig 2C–2E. The data underlying this table may be found in https://mol.org/indicators/coverage and https://github.com/MapofLife/biodiversity-data-gaps. Table B. National example data coverage and sampling effectiveness values. Values presented for the jaguar (Panthera onca) and collared peccary (Pecari tajacu) as demonstrated in Fig 2F and 2G. The data underlying this table may be found in https://mol.org/indicators/coverage and https://github.com/MapofLife/biodiversity-data-gaps. Table C. National data coverage and sampling effectiveness values over the previous decade (2010–2019). ISO3 codes and mean values for National and Steward’s SSII and SSEI for nations. The data underlying this table may be found in https://mol.org/indicators/coverage and https://github.com/MapofLife/biodiversity-data-gaps.

    (PDF)

    Attachment

    Submitted filename: Oliver_etal_SSII_reviews.pdf

    Attachment

    Submitted filename: Oliver_etal_SSII_reviews_6-17-2021.pdf

    Data Availability Statement

    All supporting data and scripts are available for download at https://github.com/MapofLife/biodiversity-data-gaps associated with the following DOI: https://doi.org/10.48600/MOL-3Y3Z-DW77. National indicator values are directly accessible for download at mol.org/indicators/coverage.


    Articles from PLoS Biology are provided here courtesy of PLOS

    RESOURCES