Automated assessment reveals that the extinction risk of reptiles is widely underestimated across space and phylogeny

Gabriel Henrique de Oliveira Caetano; David G Chapple; Richard Grenyer; Tal Raz; Jonathan Rosenblatt; Reid Tingley; Monika Böhm; Shai Meiri; Uri Roll

doi:10.1371/journal.pbio.3001544

. 2022 May 26;20(5):e3001544. doi: 10.1371/journal.pbio.3001544

Automated assessment reveals that the extinction risk of reptiles is widely underestimated across space and phylogeny

Gabriel Henrique de Oliveira Caetano ^1,², David G Chapple ³, Richard Grenyer ⁴, Tal Raz ⁵, Jonathan Rosenblatt ⁶, Reid Tingley ³, Monika Böhm ^7,⁸, Shai Meiri ⁵, Uri Roll ^2,^*

Editor: Pedro Jordano⁹

PMCID: PMC9135251 PMID: 35617356

Abstract

The Red List of Threatened Species, published by the International Union for Conservation of Nature (IUCN), is a crucial tool for conservation decision-making. However, despite substantial effort, numerous species remain unassessed or have insufficient data available to be assigned a Red List extinction risk category. Moreover, the Red Listing process is subject to various sources of uncertainty and bias. The development of robust automated assessment methods could serve as an efficient and highly useful tool to accelerate the assessment process and offer provisional assessments. Here, we aimed to (1) present a machine learning–based automated extinction risk assessment method that can be used on less known species; (2) offer provisional assessments for all reptiles—the only major tetrapod group without a comprehensive Red List assessment; and (3) evaluate potential effects of human decision biases on the outcome of assessments. We use the method presented here to assess 4,369 reptile species that are currently unassessed or classified as Data Deficient by the IUCN. The models used in our predictions were 90% accurate in classifying species as threatened/nonthreatened, and 84% accurate in predicting specific extinction risk categories. Unassessed and Data Deficient reptiles were considerably more likely to be threatened than assessed species, adding to mounting evidence that these species warrant more conservation attention. The overall proportion of threatened species greatly increased when we included our provisional assessments. Assessor identities strongly affected prediction outcomes, suggesting that assessor effects need to be carefully considered in extinction risk assessments. Regions and taxa we identified as likely to be more threatened should be given increased attention in new assessments and conservation planning. Lastly, the method we present here can be easily implemented to help bridge the assessment gap for other less known taxa.

The Red List of Threatened Species, published by the IUCN, is a crucial tool for conservation decision making, but is subject to various sources of uncertainty and bias. Modelling the threat status of all global reptiles identifies increased threat to many groups of reptiles across many regions of the world, beyond those currently recognized; moreover, it highlights the effects of the IUCN assessment procedure on eventual threat categories.

Introduction

The International Union for Conservation of Nature’s (IUCN) Red List of Threatened Species [1] is the most comprehensive assessment of the extinction risk of species worldwide [2]. Since its inception in 1964, the Red List has been instrumental in “generating scientific knowledge, raising awareness among stakeholders, designating priority conservation sites, allocating funding and resources, influencing development of legislation and policy, and guiding targeted conservation action” [3]. For example, the 2004 completion of IUCN’s Global Amphibian Assessment reported their dire global state [4] and led to the creation of organizations dedicated to amphibian conservation and to increased funding for research and conservation policy focused on amphibians [3]. Additionally, the IUCN’s Red List forms a basis for the designation of priority areas for conservation, such as Key Biodiversity Areas [5]. For example, the Alliance for Zero Extinction [6] works directly with decision-makers to establish protected areas for threatened species represented by a single population, using Red List data.

The Red List assigns evaluated species to categories based on their distribution, population trends, and specific threats [7]. The categories Least Concern (LC) and Near Threatened (NT) are deemed not threatened, while Vulnerable (VU), Endangered (EN), and Critically Endangered (CR) species are deemed threatened. Other species are assessed as Extinct in the Wild (EW), Extinct (EX), or Data Deficient (DD). DD category is assigned to species for which information is insufficient to assign them any of the above categories. Still, most of global biodiversity remains Not Evaluated (NE) by the Red List. This is predominantly due to the laborious nature of Red List assessments, which are based on voluntary expert participation, usually through multiparticipant in-person meetings [7]. Importantly, NE and DD species are generally not prioritized for conservation decision-making, although Red List guidelines specifically state that they “should not be treated as if they were not threatened” [7]. Even though DD species have been shown to be comparable to CR ones with respect to their levels of overlap with human impact [8]. These assessment gaps [9,10] led to the use of several automated methods to provisionally assess species [11,12]. These methods employ algorithms including phylogenetic regression models [13–15], structural equation models [16], random forests [17,18], deep learning [19,20], Bayesian networks [21,22], and even linguistic analysis of Wikipedia pages [23]. Most previous attempts (e.g., [13,17,18]) employed a binary classification of threatened (categories CR, EN, and VU) versus nonthreatened (NT and LC). Few studies attempted to predict specific categories (e.g., [19,20,24]), which are more useful to decision makers as they enable prioritizing among threatened species. A more comprehensive review of these methods [25] also calls for attention to obstacles for their implementation in the assessment process. This review argues that a major obstacle for their implementation is the lack of communication between conservation researchers developing such methods and IUCN personnel [25].

A challenge that remains unaddressed in automated assessment is human decision bias. Biases are introduced by ambiguities in the interpretation of IUCN guidelines by assessors and reviewers, heterogeneity in assessor expertise levels, and personal agendas [26]. The IUCN tries to decrease reliance on subjective expert opinions [2], even employing automated assistance for generating and verifying assessments [12]. However, expert input (and guidance from the IUCN personnel who lead each workshop) remains an important part of the assessment process. Automated methods that ignore such biases in their training data risk reproducing or even amplifying them in their predictions [27].

Reptiles remain the only tetrapod group without comprehensive IUCN assessment. As of July 2021, approximately 28% of 11,570 reptile species remain unassessed and approximately 14% of those assessed have been classified as DD [1] Moreover, many of the reptile assessments are more than 10 years old rendering them outdated as per IUCN guidelines [1]. This assessment gap is not random. Smaller species, with narrow distributions, located in the tropics, are less likely to have been assessed [9]. Bland and Böhm [28], and Miles [19], automatically assessed some reptile species. Their models predicted approximately 20% of NE and DD species are threatened, a similar proportion to those assessed as such (excluding DD). However, in both studies, models were trained and validated using a small set of species with a wealth of morphological, ecological, and life history data (which are rare for DD species). Such exercises might provide important information on the mechanisms underlying extinction risk. However, these data-hungry methods are greatly limited in their utility because such data are unavailable for the vast majority of DD and NE species (e.g., DD and newly described reptiles, most invertebrate taxa). Ultimately, we need methods that will enable precise automated extinction risk assessments of species, which acknowledge different biases and data gaps.

Here, we use robust machine learning to automatically predict IUCN extinction risk categories to all reptile species globally, to (1) present a new automated assessment framework and (2) provisionally fill the reptile assessment gap. Our methods rely only on readily available data (mostly geographic ranges, phylogenetic structure, and body mass) and estimate potential effects of assessor or reviewer identities. We use these methods to assign provisional extinction risk categories to 4,369 reptile species, of which 3,286 are currently unassessed and 1,083 are currently classified as DD. We further explore global trends in extinction risk across all reptiles and highlight the effects of our new provisional categories on overall patterns in this class. Lastly, we highlight potential sources of biases and incongruences in the assessment process.

Results

General model results

We implemented a novel automated assessment method, using the XGBoost algorithm [29], and provided provisional assessment to 4,369 reptile species that were previously NE or assessed as DD (S1 Data). Of these 4,369 species, we assessed 1,161 (27%) as threatened (244 as CR, 467 as EN, and 450 as VU), and 3,208 as non-threatened (3,021 as LC and 187 as NT). This is compared to 21% threatened species in the assessed/training dataset (1,375 of 6,520, χ²: 26.947, p-value: <0.001).

The model we used to predict extinction risk for DD and NE species included spatial and phylogenetic autocorrelation and excluded assessor/reviewer effects, achieved 90% validated accuracy for the binary threatened/nonthreatened classification, and 84% accuracy for predicting specific categories (AUC - Area Under Curve: 0.83, Tables 1 and 2). The complete model, including spatial and phylogenetic autocorrelation, and assessor/reviewer effects, achieved similar results, as did the model excluding spatial and phylogenetic autocorrelation but retaining assessor/reviewer effects (Table 1). The model excluding both autocorrelations and assessor/reviewer effects, and the models including either spatial or phylogenetic autocorrelation, were less accurate (Table 1). However, the model obtained the highest accuracies when excluding threatened species classified under criteria other than B from the training dataset (Table 1; details below). We predicted extinction risk categories for DD and NE species using the model that excluded assessor/reviewer effects but retained spatial and phylogenetic data, since we cannot know the identity of assessors who will evaluate currently unassessed species. For analyses regarding potential assessor/reviewer effects, we used the complete model. Detailed accuracy metrics are presented in Table 2. The lowest accuracy across models was in separating the NT and LC categories (Table 2).

Table 1. Comparison of accuracy metrics of 8 automated assessment models for classifying reptile species into IUCN extinction risk categories.

Model	Task	Species sampling	Predictors	Accuracy	AUC
Complete	Binary	all	Environmental + body mass + PEM + MEM + assessor/reviewer effects	0.904	0.833
	Specific	all	Environmental + body mass + PEM + MEM + assessor/reviewer effects	0.852	0.812
Environment and body mass	Binary	all	Environmental + body mass	0.877	0.784
	Specific	all	Environmental + body mass	0.821	0.777
Assessor/reviewer effects	Binary	all	Environmental + body mass + assessor/reviewer effects	0.890	0.805
	Specific	all	Environmental + body mass + assessor/reviewer effects	0.835	0.802
Spatial	Binary	all	Environmental + body mass + MEM	0.889	0.807
	Specific	all	Environmental + body mass + MEM	0.825	0.791
Phylogenetic	Binary	all	Environmental + body mass + PEM	0.884	0.800
	Specific	all	Environmental + body mass + PEM	0.826	0.781
Spatial-phylogenetic (used for prediction)	Binary	all	Environmental + body mass + PEM + MEM	0.900	0.828
	Specific	all	Environmental + body mass + PEM + MEM	0.837	0.801
Complete—Criterion B	Binary	Criterion B + NT, LC	Environmental + body mass + PEM + MEM + assessor/reviewer effects	0.926	0.838
	Specific	Criterion B + NT, LC	Environmental + body mass + PEM + MEM + assessor/reviewer effects	0.875	0.803
Spatial-phylogenetic—Criterion B	Binary	Criterion B + NT, LC	Environmental + body mass + PEM + MEM	0.915	0.800
	Specific	Criterion B + NT, LC	Environmental + body mass + PEM + MEM	0.858	0.782

Open in a new tab

The “complete” model includes environmental predictors, body mass, spatial and phylogenetic autocorrelations, and assessor/reviewer effects. The model used to predict extinction risk for DD and NE species—“Spatial-phylogenetic” model—includes environmental predictors, body mass, and spatial and phylogenetic autocorrelations but excludes assessor/reviewer effects, as this information is not available for unassessed species. The species sampling column indicates which species were used in the training of each model, in regard to their extinction risk category and criteria used by IUCN on their assessment. The “Binary” task represents the separation of threatened (CR, EN, and VU) from nonthreatened categories (NT and LC). The “Specific” task represents classification into IUCN extinction risk categories. MEM and PEM represent spatial and phylogenetic autocorrelations, respectively. More detailed metrics are presented in Table 2.

AUC, Area Under Curve; CR, Critically Endangered; DD, Data Deficient; EN, Endangered; IUCN, International Union for Conservation of Nature; LC, Least Concern; MEM, Moran’s Eigenvector Maps; NE, Not Evaluated; NT, Near Threatened; PEM, Phylogenetic Eigenvector Maps; VU, Vulnerable.

Table 2. Accuracy metrics of automated assessment models classifying reptile species into IUCN extinction risk categories, under 2 different approaches: (1) complete model, accounting for spatial and phylogenetic autocorrelation and assessor/reviewer effects; (2) accounting for spatial and phylogenetic autocorrelation (this was the model used for predictions).

	Binary	CR	EN	VU	NT	LC
Complete
Sensitivity	0.955	0.773	0.699	0.532	0.278	0.964
Specificity	0.711	0.997	0.976	0.977	0.983	0.691
AUC	0.833	0.885	0.837	0.755	0.631	0.828
Precision	0.925	0.927	0.731	0.649	0.556	0.890
Recall	0.955	0.773	0.699	0.532	0.278	0.964
F1	0.940	0.843	0.715	0.585	0.370	0.925
Spatial-phylogenetic (used for predictions)
Sensitivity	0.952	0.621	0.726	0.278	0.532	0.950
Specificity	0.703	0.995	0.969	0.976	0.979	0.683
AUC	0.828	0.808	0.847	0.627	0.756	0.816
Precision	0.923	0.872	0.689	0.463	0.667	0.886
Recall	0.952	0.621	0.726	0.278	0.532	0.950
F1	0.938	0.726	0.707	0.347	0.592	0.917

Open in a new tab

“Binary” represents the separation of threatened (CR, EN, and VU) from nonthreatened categories (NT and LC). Remaining columns represent the predictive accuracy for assigning species to the 5 extinction risk categories: CR, Critically Endangered; EN, Endangered; LC, Least Concern; NT, Near Threatened; VU, Vulnerable. See S1 Table for remaining models.

Across different classification tasks and extent of occurrence classes, the average ranking of the importance of feature classes in the complete model was predominantly due to (1) spatial autocorrelation; (2) assessor effects; (3) phylogenetic autocorrelation; (4) climate; and (5) human encroachment. In the model excluding assessor/reviewer effects, the ranking was: (1) spatial autocorrelation; (2) phylogenetic autocorrelation; (3) climate; (4) human encroachment; and (5) insularity (for full details on feature importance across models, see S1 Fig and S2 Table; for a list of variables in each category, see S1 Data). The hyperparameter configuration for the model chosen for predictions is summarized in S3 Table. The features selected for each combination of range size (calculated as extent of occurrence) class and classification task are provided in S1 Data. The contribution of each feature class to predictive performance for each combination of range size class and classification task is presented in S1 Fig.

Criterion B for IUCN extinction risk assessments—which is predominantly based on species range sizes [7]—is the most widely used criterion for assigning a threatened status in reptile assessments (74% of species assessed under any criteria). The model only trained on species assessed as threatened based on criteria B, as well as NT and LC species, was more accurate for both binary (93%, AUC: 0.84, Table 1) and specific categorizations (87%, AUC: 0.80, Table 1). Further, excluding assessor/reviewer effects resulted in similar accuracy (binary classification: 92% accuracy, 0.80 AUC; specific classification: 86% accuracy, 0.78 AUC; Table 1). Despite their higher accuracy, these models tended to misclassify non-criterion B–threatened species, assigning them to lower extinction risk categories than observed (S4 Table). This is probably because species are only classified under non-B criteria if such criteria assign them to a similar, or higher, extinction risk category. Thus, we proceeded with models trained on all species for the remaining analyses. Our model correctly classified 93.8% of previously assessed species (6,112 of 6,520 species). The 6.2% misclassified species (408 of 6,520 species) were nearly twice as likely to be assigned to nonthreatened categories than to shift in the opposite direction and generally to shift to less threatened specific categories (S2 Fig). This was consistent in most biogeographical realms, except in the Nearctic and Neotropical realms, in which the numbers were similar for the binary classification (S2 Fig).

Comparison with previous methods

We compared our method to similar past endeavors. Our simplest model (“Environment and body mass”; Table 1) obtained higher accuracy (88%) than methods based on Random Forest (85%) and Neural Networks (79%), using the same predictors (S5 Table). The extreme class imbalance in the dataset greatly hindered both methods, especially Neural Networks (S5 Table), despite the use of supersampling to account for uneven class distributions. In fact, Neural Networks are known to be sensitive to such imbalances [30], while XGBoost is considered more robust to them [29]. While previous methods have incorporated similar predictors to ours, and have separately incorporated features such as tolerating missing values, identifying specific IUCN categories, and accounting for spatial and phylogenetic autocorrelation, none did so in combination, as our method did (S6 Table). Our method is also the first to account for assessor bias (as an exploratory tool, not for prediction; S6 Table).

Predictions for data deficient and not evaluated species

DD and NE species were significantly more likely to be assigned threatened categories than assessed species (DD: 29%, NE: 26%, assessed non-DD: 21% threatened; Fig 1A, S7 Table). DD species were more likely than assessed species to be predicted as VU, EN, or CR, and less likely to be predicted as NT or LC. NE species were more likely than assessed species to be VU, and EN, and less likely to be predicted as NT or LC (Fig 1B, S7 and S8 Tables).

Fig 1 — (A) Grouping categories into threatened and nonthreatened and (B) specific extinction risk categories: CR, Critically Endangered; EN, Endangered; LC, Least Concern; NT, Near Threatened; VU, Vulnerable. Number of species in each category is indicated above each bar. Significant differences in a Pearson’s χ² test are indicated by asterisks, colored according to which proportions are being compared (S7 Table). The data underlying this figure can be found in S2 Data.

Phylogenetic and spatial patterns

The proportion of threatened species increased overall for Squamata and Crocodylia, but decreased for Testudines (Fig 2, S9 Table), especially in the turtle families Chelidae, Chelydridae, and Kinosternidae. Anguimorph lizards (except Varanidae) proportion of threatened species decreased following our predictions. The 3 largest lizard clades—Iguania, Scincomorpha, and Gekkota—(as well as Lacertoidea except Lacertidae) showed increased threat, as did the largest snake clades (Colubridae, Dipsadinae, Elapidae) and Serpentes as a whole (Fig 2, S9 Table). Including predictions for DD and NE species, the proportions of threatened species increased in ecoregions across most of South and North America, Australia, and Madagascar (Fig 3, S10 Table).

Fig 3 — The spatial data are grouped by WWF terrestrial ecoregions. The shift between red and blue is proportional to the (symmetric log scale) increase/decrease in extinction risk per ecoregion when using our assessments. Bar plots indicate proportion of species in threatened categories for each biogeographical realm, before and after the inclusion of automated assessments. The data underlying this figure can be found in S2 Data. IUCN, International Union for Conservation of Nature; WWF, World Wide Fund for Nature.

Effect of assessor/reviewer identities on predictions

We permuted the identity of assessors and reviewers until we identified the group of assessors and reviewers that would assign each species to the least threatened category possible, while maintaining the other predictors’ values (optimistic scenario) and to the most threatened category possible (pessimistic scenario). Proportions of species predicted as threatened increased from optimistic to observed to pessimistic scenarios for all categories (Fig 4A, S11 Table) and across most biogeographical realms. In the Nearctic and Madagascar, the observed and pessimistic scenarios were similar, and in Oceania no differences were detected (Fig 4B, S12 Table). Species that changed category between the observed assessments and the optimistic scenario moved overwhelmingly to a single category (LC), while in the pessimistic scenario, species showed a more diverse distribution of new categories (S3 Fig).

Fig 4 — Analysis includes only species that have IUCN assessments (6,520 species). (a) Proportion of reptile species assigned to each extinction risk category for the actual IUCN assessments (Observed); proportion expected if the most optimistic group of assessors assessed every species (Optimistic); proportion expected if the most pessimistic group assessed every species (Pessimistic). (b) Proportion of threatened species in each biogeographical realm for Observed, Optimistic, and Pessimistic assessments. Significant differences in a Pearson’s χ² test are indicated by asterisks, colored according to which proportions are being compared (S11 Table). The data underlying this figure can be found in S2 Data. AA, Australasian; AT, Afrotropical; CR, Critically Endangered; EN, Endangered; IM, Indomalayan; LC, Least Concern; MA, Madagascan; NA, Nearctic; NT, Near Threatened; NT, Neotropical; OC, Oceanian; PA, Palearctic; VU, Vulnerable.

Discussion

Our model assigned IUCN extinction risk categories to the 40% of the world’s reptiles that currently lack published assessments or are classified as DD. Our novel modeling approach enabled classifying specific extinction risk categories with high accuracy using only readily available data (ranges and body sizes). Our methods also gained better accuracy than previously explored methods (S5 Table). We predicted that the prevalence of threatened reptile species is significantly higher than currently depicted by IUCN assessments. This pattern is widespread across space and phylogeny. Our results show that, while high prediction accuracy can be achieved without explicitly accounting for assessor/reviewer identities, the identity of assessor/reviewers greatly affects predictions.