Skip to main content
PLOS One logoLink to PLOS One
. 2023 May 4;18(5):e0276951. doi: 10.1371/journal.pone.0276951

Broad-scale factors shaping the ecological niche and geographic distribution of Spirodela polyrhiza

Marlon E Cobos 1,*, A Townsend Peterson 1
Editor: Mirko Di Febbraro2
PMCID: PMC10159170  PMID: 37141194

Abstract

The choice of appropriate independent variables to create models characterizing ecological niches of species is of critical importance in distributional ecology. This set of dimensions in which a niche is defined can inform about what factors limit the distributional potential of a species. We used a multistep approach to select relevant variables for modeling the ecological niche of the aquatic Spirodela polyrhiza, taking into account variability arising from using distinct algorithms, calibration areas, and spatial resolutions of variables. We found that, even after an initial selection of meaningful variables, the final set of variables selected based on statistical inference varied considerably depending on the combination of algorithm, calibration area, and spatial resolution used. However, variables representing extreme temperatures and dry periods were more consistently selected than others, despite the treatment used, highlighting their importance in shaping the distribution of this species. Other variables related to seasonality of solar radiation, summer solar radiation, and some soil proxies of nutrients in water, were selected commonly but not as frequently as the ones mentioned above. We suggest that these later variables are also important to understanding the distributional potential of the species, but that their effects may be less pronounced at the scale at which they are represented for the needs of this type of modeling. Our results suggest that an informed definition of an initial set of variables, a series of statistical steps for filtering and exploring these predictors, and model selection exercises that consider multiple sets of predictors, can improve determination of variables that shape the niche and distribution of the species, despite differences derived from factors related to data or modeling algorithms.

Introduction

Ecological niche modeling (ENM) includes a diverse set of tools used to study potential distributions of species via characterizations of their environmental requirements [1,2]. In particular, correlational ENMs use distributional data (occurrence records) and sets of environmental variables to calibrate models that are used to predict environmental suitability for a species across a region of interest [3]. Variables appropriate to characterizing and understanding a species’ niche are those that allow identifying conditions that are favorable for the species, as well as detecting potential limits of what is or is not suitable for a species (e.g., temperatures that allow maximum growth rates, or maximum temperatures that can be tolerated). However, differences in the scale at which ecological processes occur and the grain and extent to which environmental variables are measured make it difficult to select predictors based on direct interpretations of their biological importance [4].

The challenge of selecting appropriate environmental variables when characterizing species’ ecological niches using correlative models is well-known in the field of distributional ecology [57]. In general, models can be constructed with two main goals: (1) to improve predictions of the geographic distribution of the species, and (2) to understand which environmental variables are important constraints on species’ niches. When the goal is to improve model predictive ability, variables can be selected based on how they improve predictions of independent testing data. In these cases, environmental data sets that efficiently summarize environmental variation across an area of interest (e.g., principal components) are commonly considered to be good choices [8]. However, when models are constructed to understand which and how environmental variables shape species’ ecological niches and geographic distributions, use of biologically meaningful and interpretable variables becomes more relevant [4].

Common procedures for selecting environmental predictors in ENM include reducing multicollinearity, testing contribution of variables to models, selecting variables based on their biological importance considering empirical evidence or the experience of researchers, or using a broad set of variables and letting the algorithm select important variables [9]. Other alternatives include selecting predictors by transforming original variables to summarize the variance explained by a set of principal components that are more information-rich and in general, are not correlated [10,11]. However, interpretation of the role of particular environmental factors in the characterization of species niches is complicated. More recently, different sets of variables have been used as part of the parameterizations to be tested during the process of model calibration [12]. After model selection, one or more sets of variables can be identified as more appropriate and powerful for use in creating final models.

The reality is that, regardless of the method used to determine the set of variables for modeling ecological niches, the decision is always difficult and the answer is rarely unique or unambiguous [9]. This complication exists because every step taken to define which variables are best may result in distinct sets of predictors at the end. For instance, when selecting one variable depending on correlation values and biological importance, the decision of which variable to keep depends on the researcher; many times, such an initial decision determines which other variables can or cannot be considered. Analyses like the variance inflation factor may end up identifying unique sets of variables; however, the set of variables selected depends on a predefined limit, which is not a biological consideration [13]. Distinct answers are obtained when different sets of variables are considered in model calibration, although such sets are usually subject to a priori processes of selection [14].

Implications of using one set or another set of environmental dimensions when creating models are not negligible, especially in applications in which model transfers to other geographic or temporal scenarios are required [1517]. Other less explored complications are the effects that areas for model calibration, spatial resolution of raster layers used as predictors, and use of different modeling algorithms have on the variables that get selected. The area across which a model is calibrated has direct implications on predictions: models may be over- or under-fitted if such an area is poorly defined [18,19]. Little is known about how changes in calibration areas affect decisions related to variable selection during the modeling process. The spatial resolution of variables is known to affect model calibration and model transfers [6,20]; however, little has been said about its effects on the final set of variables selected (but see [21]). Distinct modeling algorithms may also perform better or worse depending on the sets of variables used, as all predictors influence the model and interact with other variables differently. Again, however, this factor has not been considered deeply (but see [22]), and the set of variables for modeling is usually fixed when using multiple algorithms (e.g., [23,24]).

Here we explore the challenges in defining sets of environmental variables in ENM for Spirodela polyrhiza (L.) Schleid (greater duckweed), a freshwater plant species with a broad near cosmopolitan distribution [25]. Specifically, we used distinct methods for variable selection in a multi-step approach. We performed analyses at two spatial resolutions, used two algorithms for model calibration, and considered different options regarding calibration areas, to explore the consequences of these factors on variable selection. We hypothesize that variables representing extreme conditions and environmental conditions during the active period of the species (see section Study organism) are better predictors for broad-scale characterizations of the species ecological niche and distribution. As little is known about macroecological factors driving the geographic distribution of greater duckweed, our explorations of environmental variables can help to understand the distributional potential of this plant, environmental dimensions limiting its potential for expansion to other areas, and how climate change might affect this species’ range.

Methods

Study organism

Spirodela polyrhiza is a species of duckweed that ranks among the smallest angiosperms known (sizes 0.5–18 mm). It is a free-floating aquatic plant that reproduces vegetatively in largest part [26]. To overcome unfavorable conditions (specially during the winter), this species produces a starch-rich tissue called a turion that is denser than normal fronds, and sinks to the bottom of water sources until conditions become favorable [27]. Similar to other duckweed species, under appropriate conditions, S. polyrhiza grows at high rates, which helps it to cover large portions of the surface of the water bodies where it is present [28].

Considering their high growth rates, small size, simple structures, and potential for industrial applications, duckweed species have been the subject of intensive and detailed research [29]. Among the most notable applications are possible utility in water treatment [30], bioenergy [31], animal feeding [27], human nutrition [32,33], and pharmaceutical applications [34]. Given the potential of duckweed species as model organisms [35,36], stock collections of these species have been established by several institutions around the world, which have aided substantially in promoting further research on these species [37]. As such, various aspects of S. polyrhiza physiology, genome, and potential industrial applications have been studied in detail [27,3841]. A remarkable characteristic of the geographic distribution of S. polyrhiza is that it extends worldwide. According to Les et al. [42], duckweed species and other aquatic plants dispersed transoceanically in the recent past, which highlights the importance of external biotic dispersal for this species [43]. Duckweed dispersal is mainly via adhesion to aquatic birds and mammals [25,44,45]. However, little has been done to understand the macroecological factors that drive its distribution, and only general aspects of the regions occupied by S. polyrhiza have been characterized [29].

Occurrence data

We obtained geographic occurrence records for S. polyrhiza from the Global Biodiversity Information Facility (GBIF [46]) and the Botanical Information and Ecology Network (BIEN [47]. In all, a total of 85,923 georeferenced records were obtained (GBIF: 84,992; BIEN: 931). We cleaned data from each database independently to exclude records from before 1970, lacking coordinates, with zero values for longitude and latitude, or duplicates [48]. Records marked as absent or uncommon were also removed from the GBIF data. After this initial cleaning, we had 45,913 records (GBIF: 45,459; BIEN: 454). We combined records from the two sources and excluded duplicates again. Records that were outside of, but closer than ~5’ to the edge of environmental layers (i.e., that fell very close to informative areas for climate data) were moved to the nearest pixel with information; points falling farther outside layer borders were removed. To reduce bias from spatial autocorrelation, we thinned records using a minimum point-to-point distance of ~30’. We selected this value after testing the effect of increasing distances in the Moran’s I statistic for all environmental variables (see S1 and S2 Tables). The final number of records after these procedures was 964. Occurrence data download, cleaning, and thinning were accomplished using rgbif [49], spooc [50], BIEN [51], and ellipsenm [52] in R [53].

Environmental variables

We obtained raster environmental data layers from three sources: (1) bioclimatic (BIO) and solar radiation (SR) layers from WorldClim v2.1, at 10’ resolution (available at www.worldclim.com [54]); (2) cation exchange capacity (CEC), organic carbon (OC), and pH, from the ISRIC–World Soil Information database, at 250 m resolution (available at www.soilgrids.org [55]); and (3) total phosphorus (TP), labile inorganic phosphorus (LIP), and organic phosphorus (OP) in soils, from Global Gridded Soil Phosphorus at 30’ resolution (available at www.daac.ornl.gov/SOILS/guides/Global_Phosphorus_Dist_Map.html [56]). We used bioclimatic variables to represent temperature (which could help to identify thermal limits), and precipitation (which can inform about water availability). Solar radiation layers provide information on solar energy levels across a region in our models; soil variables offer more indirect information relevant to nutrient availability. All of these variables have been proven to be relevant to the development of the study species in analyses on local extents and/or in laboratory experiments [25,26] (S3 Table).

Solar radiation layers were available as averages for the 12 months of the year. To create layers that better represented extremes and annual averages, we created the following “bioclimatic-like” layers: annual mean solar radiation (AMSR), maximum solar radiation of the month with maximum values (SRMax), minimum solar radiation of the month with minimum values (SRMin), range of solar radiation (RSR), average solar radiation of the quarter with highest values (ASRQH), and average solar radiation of the quarter with lowest values (ASRQL). We created these variables using the values for the 12 months obtained from WorldClim. Variable processing and calculations were done using the packages raster [57] and gdalUtilities [58], in R.

To test the effect of spatial resolution on the outcome of variable selection processes, we created two groups of variables, at distinct spatial resolutions: (1) a group at 10’ resolution including BIO and SR variables, plus CEC, OC, and pH; and (2) a group at 30’ resolution including BIO and SR variables, plus TP, LIP, and OP. We performed raster aggregation procedures (average of values) on CEC, OC, and pH to match the resolution of BIO variables, and on BIO and SR variables to match the resolution of variables at 30’. One of the layers at 10’ and at 30’ resolution was used as a reference for the aggregation process to exactly match pixels among all layers at each resolution. The method of aggregation used was the nearest neighbor and the average value was used to represent environments aggregated. Although the set of variables representing soil conditions used at 10’ differs from the one at 30’, variable selection analyses will help to identify whether the variables present in the two sets, at distinct resolutions, are consistently selected.

Geographic areas for model calibration

To explore the effect of the areas across which models are calibrated on the set of variables selected, we explored four options for such areas in our analysis: (1) buffers of ~5° (~500 km at the Equator) around occurrence records, (2) concave-hull polygons with a buffer of ~5°, (3) ecoregions occupied by the species buffered by ~1° (~100 km), and (4) the result of intersecting the previous three areas. Buffer distances for the first two types of calibration areas were defined considering that the species can be dispersed by birds over relatively large distances. Distance for ecoregion buffering was selected to include a more diverse set of environments around occupied regions. We obtained the layer of world terrestrial ecoregions from the Harvard WorldMap database (available at https://worldmap.maps.arcgis.com). Although a new simulation-based approach has been recently suggested as a reliable tool to estimate calibration areas [59], the broad distribution of this species makes it difficult for that method to be applied. The types of calibration areas used in our study have been used in other studies [9] to define relevant environments for model development (e.g., [59,60]). Our chosen calibration areas are therefore reasonable options to calibrate models considering that such areas should reflect what regions could have been accessible to the species and present relevant environments for comparisons (Fig 1). The two groups of variables were masked to the four areas. We created these calibration areas using the packages ellipsenm, rgeos [61], and rgdal [62] in R.

Fig 1. Geographic representation of species occurrence data for Spirodela polyrhiza (upper panel) and areas for model calibration used to create ecological niche models.

Fig 1

Occurrence data represented are after filtering and spatial-thinning procedures. Buffer and concave areas are presented before masking them to land areas for purposes of representation.

Modeling algorithms

We used generalized linear models (GLMs) and Maxent [63,64] to estimate the ecological niche of the species. These algorithms are both used commonly in the literature and produce reliable and good-performing models [65,66]. For contrasts in model calibration, Maxent uses presences and a characterization of the background, whereas GLMs use presences and pseudo-absences; both background and the pseudo-absences were taken as a sample of available pixels across the calibration area. For purposes of comparison, the same points (20,000) were used in both algorithms, and were treated as both background and pseudo-absence data. The sample of 20,000 points was taken for each calibration area option independently. This number of points was used to achieve a good representation of the areas and corresponding environments over which presences will be compared, and to follow recommendations regarding amount of pseudo-absence data in ENM applications using regression techniques [67]. GLMs were performed as logistic models with a weight of 1 for presences and 10,000 for pseudo-absences (e.g., [14]). GLMs created in such a way are considered to be similar mathematically to Maxent models under certain conditions and assumptions [68].

Variable selection process

As variable selection can be done in multiple ways and at distinct points in the process of data preparation or modeling, we followed a multi-step approach that considers quantitative and qualitative characteristics of predictors (Fig 2). Our approach consisted of (1) initial inspection and processing of variables (see section Environmental variables); (2) assessing linear correlations among variables; (3) exploring variable values in occurrences and across calibration areas; (4) an initial selection based on the criteria (2) and (3) and the biological relevance of variables; (5) creating variable sets resulting from all combinations of two or more initially selected variables; and (6) including all sets of variables in model calibration exercises to identify which algorithm parameters and variable sets, in concert, result in the best-performing models. We performed exploration exercises 16 times: the combination of two environmental data sets, four calibration areas, and two algorithms.

Fig 2. Schematic representation of the process that was followed to select variables to model the ecological niche of the greater duckweed (Spirodela polyrhiza).

Fig 2

After the first two steps, initial selection of variables was done based on three considerations: (1) groups of variables with all variable-to-variable pairwise correlations |r| ≤ 0.8 (Fig 3); (2) biological relevance of variables; and (3) variables for which the calibration area had wider limits in environmental dimensions than the occurrences [69], based on histogram plots of values (Fig 4). The latter consideration assumes that using variables for which the entire spectrum of responses can be characterized (i.e., non-truncated responses [2,70,71]) makes for better models [72]. Biological relevance of variables was determined based on details about the species’ natural history [25,41], phenology [28,73], and physiology [38,74] in the literature, and our own experience with populations in the field and controlled environments. For simplicity, we selected the same initial set of variables based on the relevance criterion for the four calibration areas considered.

Fig 3. Results from linear correlation tests for initial variables.

Fig 3

Values of correlation above |0.8| are magnified threefold. Results for variables at 10’ resolution are shown. Results for variables at 30’ are similar for most variables and can be found in S1 Fig.

Fig 4. Histograms of variable values in calibration areas (M) and occurrence records.

Fig 4

Results for variables at 10’ resolution and calibration areas resulting from intersection are shown. Results of analyses at 30’ resolution and for other calibration areas were similar, although minor differences can be observed in S2S8 Figs.

Using the groups of variables remaining after the initial selection, we prepared subsets of variables representing all combinations of three or more variables [14]. Such sets of variables were then used as part of our process of model calibration in which other parameter settings were tested. For both Maxent and GLMs, we tested five response types (lq, lp, q, qp, lqp; l = linear, q = quadratic, and p = product). For Maxent, six regularization multiplier values were explored (0.1, 0.3, 0.6, 1.0, 2.5, 5.0). Performance of candidate models was evaluated based on three criteria: statistical significance of predictions (partial ROC; [75]), omission rate (allowing a 5% omission error; [76]), and model fit and complexity (based on the Akaike information criterion for GLMs, and the AICc proposed by Warren and Seifert [77] for Maxent).

In total, then, for each model calibration exercise, 10,180 and 5065 GLM models were tested at 10’ and 30’, respectively, and 61,080 and 30,390 Maxent models were tested at 10’ and 30’, respectively. Model calibration processes with Maxent were done using the kuenm R package [12], and model calibration using GLMs was done using stats and other base functions in R [53].

Importance of variables and effects on models

To understand the effect of selected variables on models that could be used to represent species niches and/or potential distributions, we created a final model for each of the calibration areas at the two resolutions of variables, using parameters and sets of variables selected after model calibration. Then, we measured the effects of variables on such models: in Maxent, we used jackknife analysis to measure variable contributions [78], and for GLMs, we used an ANOVA to explore deviance explained by each of the predictors considered and whether deviance values were statistically significant (whether the deviance was larger than expected by chance).

We transferred all the models across the area comprising the union of the four calibration areas, and compared those models to assess whether patterns of suitability values differed as a result of using distinct variables, calibration areas, and algorithms. Model transfers in Maxent were done using free extrapolation, no replicates, and a cloglog output format. Model transfers for GLMs were scaled 0–1. As ecological niches exist simultaneously in both geographic and environmental spaces [79], we created 3-dimensional visualizations of resulting predictions in a space defined in terms of some of the environmental variables with larger effects on our models. Explorations in environmental space were used to detect how variation in suitability was associated with variable values.

Results

Results from the selection process

Graphical explorations of environmental conditions across calibration areas and occurrence records varied somewhat among the distinct options of areas for calibration (Figs 4 and S2S8). Despite such variations, these explorations allowed us to identify variables that appeared better for detecting suitable and unsuitable conditions based on distributions of values and confidence limits. Variable correlations also varied slightly among the distinct options of calibration areas tested, although we consistently found more highly correlated variables in calibration areas derived from ecoregions (Figs 3 and S1). After considering the exploration of environmental conditions, correlation values, and biological importance, we retained 11 variables at 10’ resolution and 10 variables at 30’ resolution. The variables mean diurnal range (BIO 2), maximum temperature of warmest month (BIO 5), minimum temperature of coldest month (BIO 6), annual precipitation (BIO 12), precipitation of driest month (BIO 14), precipitation seasonality (BIO 15), range of solar radiation (RSR), and average solar radiation of the quarter with highest values (ASRQH), were in common between these sets; cation exchange capacity (CEC), organic carbon (OC), and pH were kept for the set at 10’, whereas labile inorganic phosphorus (LIP) and organic phosphorus (OP) were kept at 30”.

All model calibration exercises found at least one parameter setting that produced a model that met all criteria for selection (i.e., models with partial ROC values ≤0.05, omission rates ≤0.05, and delta AICc values ≤2; S4 and S5 Tables). Variables selected contrasted markedly among treatments that considered distinct calibration areas, spatial resolutions, and modeling algorithms (Fig 5). None of the final sets of variables selected during model calibration used all of the variables initially selected. In general, fewer variables were selected for models created with Maxent at 10’ resolution (2–4) than for the other algorithm/resolution combinations (6–7). Although the subsets of variables considered were not totally comparable between the tests at distinct resolutions, at least one variable representing soil conditions was consistently selected across all exercises using distinct calibration areas, using at least one of the modeling algorithms. Soil and solar radiation variables were more consistently selected at 10’ resolution, especially when using Maxent, whereas at 30’ resolution, bioclimatic and solar radiation variables were more consistently selected. Bioclimatic and solar radiation variables that represent extreme conditions or means of extreme periods appeared to be selected more consistently regardless of the differences in spatial resolution or algorithm (Fig 5).

Fig 5. Summary of variables retained after the multi-step approach for selection.

Fig 5

Results depending on spatial resolution of predictors, model calibration areas, and the algorithm used are shown. BIO2 = mean diurnal range of temperature; BIO5 = maximum temperature of warmest month; BIO6 = minimum temperature of coldest month; BIO12 = annual precipitation; BIO14 = precipitation of driest month; BIO15 = precipitation seasonality; RSR = range of solar radiation; ASRQH = average solar radiation of the quarter with highest values; CEC = cation exchange capacity; OC = organic carbon; LIP = labile inorganic phosphorus; OP = organic phosphorus.

Effects of variables on models

Bioclimatic and solar radiation variables had consistently larger effects than soil variables on Maxent models (S9 and S10 Figs), with the exception of CEC, which was the most important variable for the only model that selected this predictor (i.e., with variables at 10’ using calibration areas that intersected the other three options; S9 Fig). The most important predictor for Maxent models varied among BIO variables and CEC at 10’ resolution, whereas at 30’, BIO 6 was consistently selected as more important based on the contribution, permutation importance, and jackknife results. For GLMs, bioclimatic variables, a few quadratic versions, and products of such variables, as well as CEC, contributed most to the deviance in models at 10’ resolution (S6S9 Tables). Solar radiation variables were not particularly relevant to explain deviance in these models. At 30’ resolution, deviance in models was mostly explained by bioclimatic and solar radiation variables, whereas soil variables did not explain large portions of the deviance (S10S13 Tables).

Model projections

Geographic transfers of Maxent models at 10’ resolution showed higher variability across distinct calibration areas than GLM projections (Figs 6 and S11). Variation was greatest in northern and eastern Asia, central North America, eastern Australia, and northern and southern Africa. At 30’ resolution, geographic transfers showed lower variability for both GLM and Maxent models.

Fig 6. Geographic projections of suitability values deriving from final models created with the variables selected.

Fig 6

Results for variables at 10’ resolution are shown. Results at 30’ resolution are presented in S11 Fig.

Projections of suitability in environmental space showed higher variability in Maxent projections than in GLMs, considering distinct calibration areas at 10’ resolution (Figs 7 and S12S26). That is, suitability values varied highly across the regions of the environment detected as suitable (above the 5% omission threshold). In most Maxent projections of suitability in environmental space, and for various environmental variables, extreme environments were predicted to have high suitability (i.e., we observed truncated responses [2] in our models). GLM projections were more stable in both aspects; in these projections, and considering most variables, regions of high suitability tended to be surrounded by regions with decreasingly lower suitability (i.e., extreme environments were only rarely detected as the most suitable ones). At 30’ resolution, projections of environmental space looked similar across distinct calibration areas and modeling algorithms. Perhaps the main difference is that Maxent constrained suitable environments a little more than did the GLMs.

Fig 7. Projections of suitability values in a three-dimensional environmental space.

Fig 7

Values of suitability derive from final models created with selected variables and parameters. GLM results for variables at 10’ resolution and calibration areas resulting from intersection are shown. Results at 30’ resolution and for other calibration areas are in S12S26 Figs.

Discussion

Determining sets of appropriate environmental variables with which to model ecological niches or potential distributions of species remains a major challenge in distributional ecology. Our results showed that multiple factors generated distinct outcomes regarding which variables are better for model development. Even after an initial selection of relevant variables, when we let statistical methods select the best subset of variables, choice of spatial resolution of layers, area for model calibration, and modeling algorithm, all affected the final subset of predictors selected. However, despite the fact that distinct groups of variables were selected when these factors were changed, some predictors were more consistently selected than others, which hints at the relevance of such variables when understanding ecological limitations for the species.

Of particular interest is the recurring selection of variables representing maximum temperature (BIO 5) and minimum temperature (BIO 6), which concurs with our initial hypothesis and, in most cases, helped to identify the limits of what is suitable for the species in regions with high and low temperatures. The consistent selection of these two variables could be attributed to the importance of the temperature in the natural history of the species, especially in that low temperatures are responsible for triggering production of turions [25]. If temperatures in an area are consistently low, the species will only produce these dormant fronds, and populations will stop growing (i.e., the species will be outside of its thermal niche, at least for reproduction).

Obtaining different sets of key variables in modeling exercises based on distinct calibration areas is concerning. One of the main aspects to be defined when creating such models is the area over which models will be calibrated [19,59]. This dependency has implications from both statistical and ecological perspectives. From a modeling point of view, the environmental values of the points selected as background or pseudoabsences affect how models are fitted to the occurrence records, sometimes resulting in overfitted models, which complicates model transfers [80,81]. In regard to the ecological relevance of these areas, because models are fitted within these regions, the associations to be found are only relevant if a species has had access to those environmental conditions [18]. As distinct calibration areas affected the set of variables selected and the effects of such variables in the models, correct definition of these areas becomes an even more important challenge. New methods to define calibration areas are now available that account for ecological, historical, and dispersal factors, which may result in more properly calibrated models and more consistent sets of variables [59]. However, this challenge persists in cases in which limited information exists about a species, or the distribution of species is close to global, as in this example.

The other two factors explored (modeling algorithm and spatial resolution of variables) also affected the set of environmental variables selected for niche models. As in previous explorations [22,82,83], the effects of these two factors were seen clearly in the transfers of models, both geographic and environmental, and thus cannot be neglected. Spatial resolution of layers has been noticed as a factor that can influence the sets of variables selected [21]. Depending on the spatial resolution of layers, the number of environmental combinations found in an area can change, with more numerous combinations at finer spatial resolutions. One of the complications deriving from these differences is that the ways in which variables are correlated can change at distinct resolutions due to changes in sample size [84], which can modify the initial selection of variables that is made, not necessarily related to the biological importance of such variables. Our inclusion of distinct modeling algorithms showed that combining these factors certainly increases the complexity of the process of selecting variables. The relevance of distinct variables has been shown to change depending on the algorithm used [22]. Although it is not clear whether the set of variables should be changed depending on the algorithm (if the variables have been preselected in some way), the fact that distinct algorithms work differently and that distinct predictors have distinct effects should not be overlooked [85].

Exploring environmental conditions within calibration areas and in occurrence records beforehand helped to identify variables for which truncated responses could be found. Although this point may not be related to the biological role of this environmental dimension, it is crucial in being able to transfer a model to other conditions with less ambiguity [69,86]. Maximum temperature of warmest month (BIO 5) and minimum temperature of coldest month (BIO 6) are examples of variables that contributed importantly to models, and, as expected, values of suitability were higher at intermediate values, with decreasing suitability towards extreme environmental values. Performing these explorations can help to select predictors appropriately when the goal is to understand why species are distributed the way they are. However, other variables should not be discarded only based on these graphical explorations, because they may be important environmental constraints despite the truncation. For instance, cation exchange capacity (CEC), a soil variable, showed truncation towards lower values in our examples (Fig 7), but still was selected across various of the experiments, and its contribution to models was not negligible. CEC is a soil variable that provides information about nutrient availability; hence, these results underline the importance of making decisions based on ideas that combine ecological and statistical considerations.

In spite of the variability in the results, we found that variables related to temperature extremes were critical in characterizing the greater duckweed ecological niche, which concurs with findings from experimental work done with this species [25,87]. In fact, temperature may be the main factor shaping the distribution of this species, especially considering its distributional limits at high latitudes. Models created using precipitation variables (particularly those using precipitation of driest quarter; BIO 14) correctly discarded suitability in xeric regions, showing the importance of considering a factor that represents water availability [88]. Solar radiation of quarters periods with higher values (ASRQH) and range of solar radiation (RSR) were also potentially helpful in limiting the distribution of the species towards higher latitudes, as solar radiation informs about a crucial resource for photosynthesis, and experimental work has confirmed the importance of this factor [31,38]. Factors related to soil variables that served as proxies for nutrient availability and water conditions also showed high importance in some of the results. Although nutrients are critical for the development of this species, the fact that soil variables are only indirect proxies for such information [89] and the complications of representing this type of information at the scale of our analyses may explain why these variables were not selected as consistently as others.

In sum, we showed that selecting relevant variables to characterize ecological niches and potential distributions becomes even more complicated when multiple factors related to data processing and model development are considered. However, if a series of criteria and approaches is applied in concert, certain variables are selected more consistently than others. Such variables may in effect be the ones that shape and constrain the species’ distribution from a macroecological point of view. Variables representing extreme temperatures, dry periods, seasonality of solar radiation, summer solar radiation, and some soil proxies of nutrients in water were among the factors that contributed the most to shaping the distribution of S. polyrriza.

Supporting information

S1 Fig. Results from linear correlation tests for initial variables.

Values of correlation above |0.8| are magnified threefold. Results for variables at 30’ resolution are shown.

(TIF)

S2 Fig. Histograms of environmental variable values in calibration areas and occurrence records.

Results for variables at 10’ resolution and buffer calibration areas are shown.

(TIF)

S3 Fig. Histograms of environmental variable values in calibration areas and occurrence records.

Results for variables at 10’ resolution and concave calibration areas are shown.

(TIF)

S4 Fig. Histograms of environmental variable values in calibration areas and occurrence records.

Results for variables at 10’ resolution and calibration areas resulting from ecoregions are shown.

(TIF)

S5 Fig. Histograms of environmental variable values in calibration areas and occurrence records.

Results for variables at 30’ resolution and buffer calibration areas are shown.

(TIF)

S6 Fig. Histograms of environmental variable values in calibration areas and occurrence records.

Results for variables at 30’ resolution and concave calibration areas are shown.

(TIF)

S7 Fig. Histograms of environmental variable values in calibration areas and occurrence records.

Results for variables at 30’ resolution and calibration areas resulting from ecoregions are shown.

(TIF)

S8 Fig. Histograms of environmental variable values in calibration areas and occurrence records.

Results for variables at 30’ resolution and calibration areas resulting from intersection are shown.

(TIF)

S9 Fig. Predictor contribution to Maxent models created with variables and parameter settings selected after model calibration.

Results for variables at 10’ resolutions are shown.

(TIF)

S10 Fig. Predictor contribution to Maxent models created with variables and parameter settings selected after model calibration.

Results for variables at 30’ resolutions are shown.

(TIF)

S11 Fig. Geographic projections of suitability values deriving from final models created with the selected variables.

Results for variables at 30’ resolution are shown.

(TIF)

S12 Fig. Projections of suitability values in a three-dimensional environmental space.

Values of suitability derive from final models created with selected variables and parameters. GLM results for variables at 10’ resolution and calibration areas resulting from buffers are shown.

(TIF)

S13 Fig. Projections of suitability values in a three-dimensional environmental space.

Values of suitability derive from final models created with selected variables and parameters. GLM results for variables at 10’ resolution and calibration areas resulting from concave hulls are shown.

(TIF)

S14 Fig. Projections of suitability values in a three-dimensional environmental space.

Values of suitability derive from final models created with selected variables and parameters. GLM results for variables at 10’ resolution and calibration areas resulting from ecoregions are shown.

(TIF)

S15 Fig. Projections of suitability values in a three-dimensional environmental space.

Values of suitability derive from final models created with selected variables and parameters. GLM results for variables at 30’ resolution and calibration areas resulting from buffers are shown.

(TIF)

S16 Fig. Projections of suitability values in a three-dimensional environmental space.

Values of suitability derive from final models created with selected variables and parameters. GLM results for variables at 30’ resolution and calibration areas resulting from concave hulls are shown.

(TIF)

S17 Fig. Projections of suitability values in a three-dimensional environmental space.

Values of suitability derive from final models created with selected variables and parameters. GLM results for variables at 30’ resolution and calibration areas resulting from ecoregions are shown.

(TIF)

S18 Fig. Projections of suitability values in a three-dimensional environmental space.

Values of suitability derive from final models created with selected variables and parameters. GLM results for variables at 30’ resolution and calibration areas resulting from intersection are shown.

(TIF)

S19 Fig. Projections of suitability values in a three-dimensional environmental space.

Values of suitability derive from final models created with selected variables and parameters. Maxent results for variables at 10’ resolution and calibration areas resulting from buffers are shown.

(TIF)

S20 Fig. Projections of suitability values in a three-dimensional environmental space.

Values of suitability derive from final models created with selected variables and parameters. Maxent results for variables at 10’ resolution and calibration areas resulting from concave hulls are shown.

(TIF)

S21 Fig. Projections of suitability values in a three-dimensional environmental space.

Values of suitability derive from final models created with selected variables and parameters. Maxent results for variables at 10’ resolution and calibration areas resulting from ecoregions are shown.

(TIF)

S22 Fig. Projections of suitability values in a three-dimensional environmental space.

Values of suitability derive from final models created with selected variables and parameters. Maxent results for variables at 10’ resolution and calibration areas resulting from intersection are shown.

(TIF)

S23 Fig. Projections of suitability values in a three-dimensional environmental space.

Values of suitability derive from final models created with selected variables and parameters. Maxent results for variables at 30’ resolution and calibration areas resulting from buffers are shown.

(TIF)

S24 Fig. Projections of suitability values in a three-dimensional environmental space.

Values of suitability derive from final models created with selected variables and parameters. Maxent results for variables at 30’ resolution and calibration areas resulting from concave hulls are shown.

(TIF)

S25 Fig. Projections of suitability values in a three-dimensional environmental space.

Values of suitability derive from final models created with selected variables and parameters. Maxent results for variables at 30’ resolution and calibration areas resulting from ecoregions are shown.

(TIF)

S26 Fig. Projections of suitability values in a three-dimensional environmental space.

Values of suitability derive from final models created with selected variables and parameters. Maxent results for variables at 30’ resolution and calibration areas resulting from intersection are shown.

(TIF)

S1 Table. Spatial autocorrelation results for all environmental variables derived from spatial patterns of occurrence data after using distinct distances for spatial thinning.

Results presented here are for variables at 10’ resolution. Spatial autocorrelation was measured using the statistic Moran’s I.

(DOCX)

S2 Table. Spatial autocorrelation results for all environmental variables derived from spatial patterns of occurrence data after using distinct distances for spatial thinning.

Results presented here are for variables at 30’ resolution. Spatial autocorrelation was measured using the statistic Moran’s I.

(DOCX)

S3 Table. Description of ecological importance of variables used for ecological niche modeling exercises with Spirodela polyrhiza.

(DOCX)

S4 Table. Selected parameter settings and variables after model calibration for analyses with variables at 10’ resolution.

AIC/AICc values are not comparable across distinct calibration areas.

(DOCX)

S5 Table. Selected parameter settings and variables after model calibration for analyses with variables at 30’ resolution.

AIC/AICc values are not comparable across distinct calibration areas.

(DOCX)

S6 Table. Effects of predictors on GLMs produced using variables and parameter settings selected after model calibration.

Results for models created with variables at 10’ resolution, using buffer calibration areas are shown. Quadratic = “^2”; Product = “:”.

(DOCX)

S7 Table. Effects of predictors on GLMs produced using variables and parameters settings selected after model calibration.

Results for models created with variables at 10’ resolution, using concave calibration areas are shown. Quadratic = “^2”; Product = “:”.

(DOCX)

S8 Table. Effects of predictors on GLMs produced using variables and parameters settings selected after model calibration.

Results for models created with variables at 10’ resolution, using calibration areas from ecoregions are shown. Quadratic = “^2”; Product = “:”.

(DOCX)

S9 Table. Effects of predictors on GLMs produced using variables and parameters settings selected after model calibration.

Results for models created with variables at 10’ resolution, using calibration areas from intersection are shown. Quadratic = “^2”; Product = “:”.

(DOCX)

S10 Table. Effects of predictors on GLMs produced using variables and parameters settings selected after model calibration.

Results for models created with variables at 30’ resolution, using buffer calibration areas are shown. Quadratic = “^2”; Product = “:”.

(DOCX)

S11 Table. Effects of predictors on GLMs produced using variables and parameters settings selected after model calibration.

Results for models created with variables at 30’ resolution, using concave calibration areas are shown. Quadratic = “^2”; Product = “:”.

(DOCX)

S12 Table. Effects of predictors on GLMs produced using variables and parameters settings selected after model calibration.

Results for models created with variables at 30’ resolution, using calibration areas from ecoregions are shown. Quadratic = “^2”; Product = “:”.

(DOCX)

S13 Table. Effects of predictors on GLMs produced using variables and parameters settings selected after model calibration.

Results for models created with variables at 30’ resolution, using calibration areas from intersection are shown. Quadratic = “^2”; Product = “:”.

(DOCX)

Acknowledgments

MEC thanks his doctoral dissertation committee for the initial suggestion of this set of analyses. We thank the members of the KUENM working group in the University of Kansas Biodiversity Institute, for their thinking and work on these topics over the years.

Data Availability

All the data used in this project can be obtained using the code provided or downloaded from the source websites described in the Methods section. Filtered occurrence data and R code to reproduce all analyses is available from a GitHub repository (www.github.com/marlonecobos/Spirodelap_ENM).

Funding Statement

This research was supported in part by a grant from the National Science Foundation (OIA-1920946). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Franklin J. Mapping Species Distributions: Spatial Inference and Prediction. Cambridge University Press; 2010. [Google Scholar]
  • 2.Peterson AT, Soberón J, Pearson RG, Anderson RP, Martínez-Meyer E, Nakamura M, et al. Ecological Niches and Geographic Distributions. Princeton: Princeton University Press; 2011. [Google Scholar]
  • 3.Peterson AT, Papeş M, Soberón J. Mechanistic and correlative models of ecological niches. Eur J Ecol. 2015;1: 28–38. doi: 10.1515/eje-2015-0014 [DOI] [Google Scholar]
  • 4.Smith AB, Santos MJ. Testing the ability of species distribution models to infer variable importance. Ecography. 2020;43: 1801–1813. doi: 10.1111/ecog.05317. [DOI] [Google Scholar]
  • 5.Araújo MB, Guisan A. Five (or so) challenges for species distribution modelling. J Biogeogr. 2006;33: 1677–1688. doi: 10.1111/j.1365-2699.2006.01584.x [DOI] [Google Scholar]
  • 6.Austin MP, Van Niel KP. Improving species distribution models for climate change studies: Variable selection and scale. J Biogeogr. 2011;38: 1–8. doi: 10.1111/j.1365-2699.2010.02416.x [DOI] [Google Scholar]
  • 7.Bradie J, Leung B. A quantitative synthesis of the importance of variables used in MaxEnt species distribution models. J Biogeogr. 2017;44: 1344–1361. doi: 10.1111/jbi.12894 [DOI] [Google Scholar]
  • 8.Cruz-Cárdenas G, López-Mata L, Villaseñor JL, Ortiz E. Potential species distribution modeling and the use of principal component analysis as predictor variables. Rev Mex Biodivers. 2014;85: 189–199. doi: 10.7550/rmb.36723 [DOI] [Google Scholar]
  • 9.Simões M, Romero-Alvarez D, Nuñez-Penichet C, Jiménez L, Cobos ME. General theory and good practices in ecological niche modeling: A basic guide. Biodiv Inform. 2020;15: 67–68. doi: 10.17161/bi.v15i2.13376 [DOI] [Google Scholar]
  • 10.Araújo MB, Peterson AT. Uses and misuses of bioclimatic envelope modeling. Ecology. 2012;93: 1527–1539. doi: 10.1890/11-1930.1 [DOI] [PubMed] [Google Scholar]
  • 11.Kriticos DJ, Jarošik V, Ota N. Extending the suite of bioclim variables: A proposed registry system and case study using principal components analysis. Methods Ecol Evol. 2014;5: 956–960. doi: 10.1111/2041-210X.12244 [DOI] [Google Scholar]
  • 12.Cobos ME, Peterson AT, Barve N, Osorio-Olvera L. kuenm: An R package for detailed development of ecological niche models using Maxent. PeerJ. 2019;7: e6281. doi: 10.7717/peerj.6281 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.O’Brien RM. A caution regarding rules of thumb for variance inflation factors. Qual Quant. 2007;41: 673–690. doi: 10.1007/s11135-006-9018-6 [DOI] [Google Scholar]
  • 14.Cobos ME, Peterson AT, Osorio-Olvera L, Jiménez-García D. An exhaustive analysis of heuristic methods for variable selection in ecological niche modeling and species distribution modeling. Ecol Inform. 2019;53: 100983. doi: 10.1016/j.ecoinf.2019.100983 [DOI] [Google Scholar]
  • 15.Petitpierre B, Broennimann O, Kueffer C, Daehler C, Guisan A. Selecting predictors to maximize the transferability of species distribution models: Lessons from cross-continental plant invasions. Global Ecol Biogeogr. 2017;26: 275–287. doi: 10.1111/geb.12530 [DOI] [Google Scholar]
  • 16.Fan JY, Zhao NX, Li M, Gao WF, Wang ML, Zhu GP. What are the best predictors for invasive potential of weeds? Transferability evaluations of model predictions based on diverse environmental data sets for Flaveria bidentis. Weed Res. 2018;58: 141–149. doi: 10.1111/wre.12292 [DOI] [Google Scholar]
  • 17.Feng X, Park DS, Liang Y, Pandey R, Papeş M. Collinearity in ecological niche modeling: Confusions and challenges. Ecol Evol. 2019;9: 10365–10376. doi: 10.1002/ece3.5555 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Barve N, Barve V, Jiménez-Valverde A, Lira-Noriega A, Maher SP, Peterson AT, et al. The crucial role of the accessible area in ecological niche modeling and species distribution modeling. Ecol Modell. 2011;222: 1810–1819. doi: 10.1016/j.ecolmodel.2011.02.011 [DOI] [Google Scholar]
  • 19.Anderson RP, Raza A. The effect of the extent of the study region on GIS models of species geographic distributions and estimates of niche evolution: Preliminary tests with montane rodents (genus Nephelomys) in Venezuela. J Biogeogr. 2010;37: 1378–1393. doi: 10.1111/j.1365-2699.2010.02290.x [DOI] [Google Scholar]
  • 20.Harris RMB, Porfirio LL, Hugh S, Lee G, Bindoff NL, Mackey B, et al. To be or not to be? Variable selection can change the projected fate of a threatened species under future climate. Ecol Manag Restor. 2013;14: 230–234. doi: 10.1111/emr.12055. [DOI] [Google Scholar]
  • 21.Nyström Sandman A, Wikström SA, Blomqvist M, Kautsky H, Isaeus M. Scale-dependent influence of environmental variables on species distribution: A case study on five coastal benthic species in the Baltic Sea. Ecography. 2013;36: 354–363. doi: 10.1111/j.1600-0587.2012.07053.x [DOI] [Google Scholar]
  • 22.Bosch S, Tyberghein L, Deneudt K, Hernandez F, De Clerck O. In search of relevant predictors for marine species distribution modelling using the MarineSPEED benchmark dataset. Divers Distrib. 2018;24: 144–157. doi: 10.1111/ddi.12668 [DOI] [Google Scholar]
  • 23.Guo C, Lek S, Ye S, Li W, Liu J, Li Z. Uncertainty in ensemble modelling of large-scale species distribution: Effects from species characteristics and model techniques. Ecol Modell. 2015;306: 67–75. doi: 10.1016/j.ecolmodel.2014.08.002 [DOI] [Google Scholar]
  • 24.Zhu G, Gutierrez Illan J, Looney C, Crowder DW. Assessing the ecological niche and invasion potential of the Asian giant hornet. Proc Natl Acad Sci USA. 2020;117: 24646–24648. doi: 10.1073/pnas.2011441117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Jacobs DL. An ecological life-history of Spirodela polyrhiza (greater duckweed) with emphasis on the turion phase. Ecol Monogr. 1947;17: 437–469. doi: 10.2307/1948596 [DOI] [Google Scholar]
  • 26.Lemnaceae Landolt E.. In: Kubitzki K, editor. Flowering Plants. Monocotyledons: Alismatanae and Commelinanae (except Gramineae). Berlin, Heidelberg: Springer; 1998. pp. 264–270. doi: 10.1007/978-3-662-03531-3_28 [DOI] [Google Scholar]
  • 27.Landolt E, Kandeler R. The family of Lemnaceae—A monographic study, Volume 2. Biosystematic investigation in the family of duckweeds, Lemnaceae. Zürich: Geobotanischen Institutes der ETH; 1987.
  • 28.Lemon GD, Posluszny U, Husband BC. Potential and realized rates of vegetative reproduction in Spirodela polyrhiza, Lemna minor, and Wolffia borealis. Aquat Bot. 2001;70: 79–87. doi: 10.1016/S0304-3770(00)00131-5 [DOI] [Google Scholar]
  • 29.Tippery NP, Les DH. Tiny Plants with Enormous Potential: Phylogeny and Evolution of Duckweeds. In: Cao XH, Fourounjian P, Wang W, editors. The Duckweed Genomes. Cham: Springer International Publishing; 2020. pp. 19–38. doi: 10.1007/978-3-030-11045-1_15 [DOI] [Google Scholar]
  • 30.Ziegler P, Sree KS, Appenroth K-J. Duckweed biomarkers for identifying toxic water contaminants? Environ Sci Pollut Res. 2019;26: 14797–14822. doi: 10.1007/s11356-018-3427-7 [DOI] [PubMed] [Google Scholar]
  • 31.Ma YB, Zhu M, Yu CJ, Wang Y, Liu Y, Li ML, et al. Large-scale screening and characterisation of Lemna aequinoctialis and Spirodela polyrhiza strains for starch production. Plant Biol. 2018;20: 357–364. doi: 10.1111/plb.12679 [DOI] [PubMed] [Google Scholar]
  • 32.Appenroth K-J, Sree KS, Bog M, Ecker J, Seeliger C, Böhm V, et al. Nutritional value of the duckweed species of the genus Wolffia (Lemnaceae) as human food. Front Chem. 2018;6: 483. doi: 10.3389/fchem.2018.00483 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Appenroth K-J, Sree KS, Böhm V, Hammann S, Vetter W, Leiterer M, et al. Nutritional value of duckweeds (Lemnaceae) as human food. Food Chem. 2017;217: 266–273. doi: 10.1016/j.foodchem.2016.08.116 [DOI] [PubMed] [Google Scholar]
  • 34.Appenroth K-J, Sree KS, Fakhoorian T, Lam E. Resurgence of duckweed research and applications: Report from the 3rd International Duckweed Conference. Plant Mol Biol. 2015;89: 647–654. doi: 10.1007/s11103-015-0396-9 [DOI] [PubMed] [Google Scholar]
  • 35.Laird RA, Barks PM. Skimming the surface: Duckweed as a model system in ecology and evolution. Am J Bot. 2018;105: 1962–1966. doi: 10.1002/ajb2.1194 [DOI] [PubMed] [Google Scholar]
  • 36.Appenroth K-J, Ziegler P, Sree S. Duckweed as a model organism for investigating plant-microbe interactions in an aquatic environment and its applications. Endocytobiosis Cell Res. 2016;27: 94–106. [Google Scholar]
  • 37.Sree KS, Appenroth K-J. Worldwide Genetic Resources of Duckweed: Stock Collections. In: Cao XH, Fourounjian P, Wang W, editors. The Duckweed Genomes. Cham: Springer International Publishing; 2020. pp. 39–46. doi: 10.1007/978-3-030-11045-1_15 [DOI] [Google Scholar]
  • 38.Appenroth K-J, Teller S, Horn M. Photophysiology of turion formation and germination in Spirodela polyrhiza. Biol Plant. 1996;38: 95. doi: 10.1007/BF02879642 [DOI] [Google Scholar]
  • 39.Michael TP, Bryant D, Gutierrez R, Borisjuk N, Chu P, Zhang H, et al. Comprehensive definition of genome features in Spirodela polyrhiza by high-depth physical mapping and short-read DNA sequencing strategies. Plant J. 2017;89: 617–635. doi: 10.1111/tpj.13400 [DOI] [PubMed] [Google Scholar]
  • 40.Chen Y, Zhao X, Li G, Kumar S, Sun Z, Li Y, et al. Genome-wide identification of the Nramp gene family in Spirodela polyrhiza and expression analysis under cadmium stress. Int J Mol Sci. 2021;22: 6414. doi: 10.3390/ijms22126414 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kuehdorf K, Jetschke G, Ballani L, Appenroth K-J. The clonal dependence of turion formation in the duckweed Spirodela polyrhiza—An ecogeographical approach. Physiol Plant. 2014;150: 46–54. doi: 10.1111/ppl.12065 [DOI] [PubMed] [Google Scholar]
  • 42.Les DH, Crawford DJ, Kimball RT, Moody ML, Landolt E. Biogeography of discontinuously distributed hydrophytes: A molecular appraisal of intercontinental disjunctions. Int J Plant Sci. 2003;164: 917–932. doi: 10.1086/378650 [DOI] [Google Scholar]
  • 43.Coughlan NE, Kelly TC, Jansen MAK. “Step by step”: High frequency short-distance epizoochorous dispersal of aquatic macrophytes. Biol Invasions. 2017;19: 625–634. doi: 10.1007/s10530-016-1293-0 [DOI] [Google Scholar]
  • 44.Coughlan NE, Kelly TC, Jansen M A.K. Mallard duck (Anas platyrhynchos)-mediated dispersal of Lemnaceae: A contributing factor in the spread of invasive Lemna minuta? Plant Biol. 2015;17: 108–114. doi: 10.1111/plb.12182 [DOI] [PubMed] [Google Scholar]
  • 45.Kimball RT, Crawford DJ, Les DH, Landolt E. Out of Africa: Molecular phylogenetics and biogeography of Wolffiella (Lemnaceae). Biol J Linn Soc. 2003;79: 565–576. doi: 10.1046/j.1095-8312.2003.00210.x [DOI] [Google Scholar]
  • 46.Derived dataset GBIF.org (25 October 2022) Filtered export of GBIF occurrence data doi:10.15468/dd.se4wfq.
  • 47.Enquist BJ, Condit R, Peet RK, Schildhauer M, Thiers BM. Cyberinfrastructure for an integrated botanical information network to investigate the ecological impacts of global climate change on plant biodiversity. PeerJ Preprints. 2016; e2615v2. doi: 10.7287/peerj.preprints.2615v2 [DOI] [Google Scholar]
  • 48.Cobos ME, Jiménez L, Nuñez-Penichet C, Romero-Alvarez D, Simões M. Sample data and training modules for cleaning biodiversity information. Biodiv Inform. 2018;13: 49–50. doi: 10.17161/bi.v13i0.7600 [DOI] [Google Scholar]
  • 49.Chamberlain S, Barve V, Mcglinn D, Oldoni D, Geffert L, Ram K. rgbif: Interface to the Global Biodiversity Information Facility API. R package. 2018. Available: https://CRAN.R-project.org/package=rgbif.
  • 50.Chamberlain S. spocc: Interface to species occurrence data sources. R package. 2021. Available: https://CRAN.R-project.org/package=spocc.
  • 51.Maitner B, Brad B, Nathan C, Rick C, John D, M DS, et al. The BIEN R package: A tool to access the botanical information and ecology network (BIEN) database. Methods Ecol Evol. 2017;9: 373–379. doi: 10.1111/2041-210X.12861 [DOI] [Google Scholar]
  • 52.Cobos ME, Osorio-Olvera L, Soberón J, Peterson AT. ellipsenm: Ecological niche’s characterizations using ellipsoids. R package. 2020. Available: https://github.com/marlonecobos/ellipsenm.
  • 53.R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2021. Available: https://www.R-project.org/.
  • 54.Fick SE, Hijmans RJ. WorldClim 2: New 1-km spatial resolution climate surfaces for global land areas. Int J Climatol. 2017;37: 4302–4315. doi: 10.1002/joc.5086 [DOI] [Google Scholar]
  • 55.Hengl T, Jesus JM de, Heuvelink GBM, Gonzalez MR, Kilibarda M, Blagotić A, et al. SoilGrids250m: Global gridded soil information based on machine learning. PLoS ONE. 2017;12: e0169748. doi: 10.1371/journal.pone.0169748 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Yang X, Post WM, Thornton PE, Jain A. The distribution of soil phosphorus for global biogeochemical modeling. Biogeosciences. 2013;10: 2525–2537. doi: 10.5194/bg-10-2525-2013 [DOI] [Google Scholar]
  • 57.Hijmans RJ. raster: Geographic data analysis and modeling. R package. 2019. Available: https://CRAN.R-project.org/package=raster.
  • 58.O’Brien J. gdalUtilities: Wrappers for “GDAL” utilities executables. R package. 2021. Available: https://CRAN.R-project.org/package=gdalUtilities.
  • 59.Machado-Stredel F, Cobos ME, Peterson AT. A simulation-based method for identifying accessible areas as calibration areas for ecological niche models and species distribution models. Front Biogeogr. 2021;13: e48814. doi: 10.21425/F5FBG48814 [DOI] [Google Scholar]
  • 60.Romero-Alvarez D, Escobar LE, Varela S, Larkin DJ, Phelps NBD. Forecasting distributions of an aquatic invasive species (Nitellopsis obtusa) under future climate scenarios. PLoS ONE. 2017;12: e0180930. doi: 10.1371/journal.pone.0180930 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Bivand R, Rundel C. rgeos: Interface to geometry engine-open source (’GEOS’). R package. 2019. Available: https://CRAN.R-project.org/package=rgeos.
  • 62.Bivand R, Keitt T, Rowlingson B. rgdal: Bindings for the “geospatial” data abstraction library. R package. 2019. Available: https://CRAN.R-project.org/package=rgdal.
  • 63.Phillips SJ, Anderson RP, Dudík M, Schapire RE, Blair ME. Opening the black box: An open-source release of Maxent. Ecography. 2017;40: 887–893. doi: 10.1111/ecog.03049 [DOI] [Google Scholar]
  • 64.Phillips SJ, Anderson RP, Schapire RE. Maximum entropy modeling of species geographic distributions. Ecol Modell. 2006;190: 231–259. doi: 10.1016/j.ecolmodel.2005.03.026 [DOI] [Google Scholar]
  • 65.Elith J, Graham CH, Anderson RP, Dudík M, Ferrier S, Guisan A, et al. Novel methods improve prediction of species’ distributions from occurrence data. Ecography. 2006;29: 129–151. doi: 10.1111/j.2006.0906-7590.04596.x [DOI] [Google Scholar]
  • 66.Guisan A, Edwards TC, Hastie T. Generalized linear and generalized additive models in studies of species distributions: Setting the scene. Ecol Modell. 2002;157: 89–100. doi: 10.1016/S0304-3800(02)00204-1 [DOI] [Google Scholar]
  • 67.Barbet-Massin M, Jiguet F, Albert CH, Thuiller W. Selecting pseudo-absences for species distribution models: How, where and how many? Methods Ecol Evol. 2012;3: 327–338. doi: 10.1111/j.2041-210X.2011.00172.x [DOI] [Google Scholar]
  • 68.Fithian W, Hastie T. Finite-sample equivalence in statistical models for presence-only data. Ann Appl Stat. 2013;7: 1917–1939. doi: 10.1214/13-AOAS667 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Owens HL, Campbell LP, Dornak LL, Saupe EE, Barve N, Soberón J, et al. Constraints on interpretation of ecological niche models by limited environmental ranges on calibration areas. Ecol Modell. 2013;263: 10–18. doi: 10.1016/j.ecolmodel.2013.04.011 [DOI] [Google Scholar]
  • 70.Owens HL, Campbell LP, Dornak LL, Saupe EE, Barve N, Soberón J, et al. Constraints on interpretation of ecological niche models by limited environmental ranges on calibration areas. Ecol Modell. 2013;263: 10–18. doi: 10.1016/j.ecolmodel.2013.04.011 [DOI] [Google Scholar]
  • 71.Thuiller W. Patterns and uncertainties of species’ range shifts under climate change. Glob Change Biol. 2004;10: 2020–2027. doi: 10.1111/j.1365-2486.2004.00859.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Chevalier M, Broennimann O, Cornuault J, Guisan A. Data integration methods to account for spatial niche truncation effects in regional projections of species distribution. Ecol Appl. 2021;31: e02427. doi: 10.1002/eap.2427 [DOI] [PubMed] [Google Scholar]
  • 73.Mejbel HS, Simons AM. Aberrant clones: Birth order generates life history diversity in Greater Duckweed, Spirodela polyrhiza. Ecology and Evolution. 2018;8: 2021–2031. doi: 10.1002/ece3.3822 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Hitsman HW, Simons AM. Latitudinal variation in norms of reaction of phenology in the greater duckweed Spirodela polyrhiza. J Evol Biol. 2020;33: 1405–1416. doi: 10.1111/jeb.13678 [DOI] [PubMed] [Google Scholar]
  • 75.Peterson AT, Papeş M, Soberón J. Rethinking receiver operating characteristic analysis applications in ecological niche modeling. Ecol Modell. 2008;213: 63–72. doi: 10.1016/j.ecolmodel.2007.11.008 [DOI] [Google Scholar]
  • 76.Anderson RP, Lew D, Peterson AT. Evaluating predictive models of species’ distributions: Criteria for selecting optimal models. Ecol Modell. 2003;162: 211–232. doi: 10.1016/S0304-3800(02)00349-6 [DOI] [Google Scholar]
  • 77.Warren DL, Seifert SN. Ecological niche modeling in Maxent: The importance of model complexity and the performance of model selection criteria. Ecol Appl. 2011;21: 335–342. doi: 10.1890/10-1171.1 [DOI] [PubMed] [Google Scholar]
  • 78.Peterson AT, Cohoon KP. Sensitivity of distributional prediction algorithms to geographic data completeness. Ecol Modell. 1999;117: 159–164. doi: 10.1016/S0304-3800(99)00023-X [DOI] [Google Scholar]
  • 79.Colwell RK, Rangel TF. Hutchinson’s duality: The once and future niche. Proc Natl Acad Sci USA. 2009;106: 19651–19658. doi: 10.1073/pnas.0901650106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Radosavljevic A, Anderson RP. Making better Maxent models of species distributions: Complexity, overfitting and evaluation. J Biogeogr. 2014;41: 629–643. doi: 10.1111/jbi.12227 [DOI] [Google Scholar]
  • 81.Yates KL, Bouchet PJ, Caley MJ, Mengersen K, Randin CF, Parnell S, et al. Outstanding challenges in the transferability of ecological models. Trends Ecol Evolut. 2018;33: 790–802. doi: 10.1016/j.tree.2018.08.001 [DOI] [PubMed] [Google Scholar]
  • 82.Suárez-Seoane S, Virgós E, Terroba O, Pardavila X, Barea-Azcón JM. Scaling of species distribution models across spatial resolutions and extents along a biogeographic gradient. The case of the Iberian mole Talpa occidentalis. Ecography. 2014;37: 279–292. doi: 10.1111/j.1600-0587.2013.00077.x [DOI] [Google Scholar]
  • 83.Connor T, Hull V, Viña A, Shortridge A, Tang Y, Zhang J, et al. Effects of grain size and niche breadth on species distribution modeling. Ecography. 2018;41: 1270–1282. doi: 10.1111/ecog.03416 [DOI] [Google Scholar]
  • 84.Sari BG, Lúcio AD, Olivoto T, Krysczun DK, Tischler AL, Drebes L. Interference of sample size on multicollinearity diagnosis in path analysis. Pesq Agropec Bras. 2018;53: 769–773. doi: 10.1590/S0100-204X2018000600014 [DOI] [Google Scholar]
  • 85.Qiao H, Soberón J, Peterson AT. No silver bullets in correlative ecological niche modelling: Insights from testing among many potential algorithms for niche estimation. Methods Ecol Evol. 2015;6: 1126–1136. doi: 10.1111/2041-210X.12397 [DOI] [Google Scholar]
  • 86.Elith J, Kearney M, Phillips S. The art of modelling range‐shifting species. Methods Ecol Evol. 2010;1: 330–342. doi: 10.1111/j.2041-210X.2010.00036.x [DOI] [Google Scholar]
  • 87.Hitsman HW, Simons AM. Latitudinal variation in norms of reaction of phenology in the greater duckweed Spirodela polyrhiza. J Evol Biol. 2020;33: 1405–1416. doi: 10.1111/jeb.13678 [DOI] [PubMed] [Google Scholar]
  • 88.Woleck J. Assessment of the possibility of exoornithochory of duckweeds (Lemnaceae) in the light of researches into the resistance of these plants to desiccation. Ekol Pol. 1982;29: 405–419. [Google Scholar]
  • 89.Austin MP. Spatial prediction of species distribution: An interface between ecological theory and statistical modelling. Ecol Modell. 2002;157: 101–118. doi: 10.1016/S0304-3800(02)00205-3 [DOI] [Google Scholar]

Decision Letter 0

Mirko Di Febbraro

31 Jan 2023

PONE-D-22-28513Broad-scale factors shaping the ecological niche and geographic distribution of Spirodela polyrhizaPLOS ONE

Dear Dr. Cobos,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Most of the referees found merit in this manuscript, especially pointing out the accurate framework implemented by the authors to select optimal environmental variables for model calibration. That said, some points need to be addressed before proceeding further with this manuscript. Firstly, hypotheses and objectives must be clearly specified as to put the study significance in appropriate evidence. Also, important details about methodological choices miss from the text (e.g. the rationale behind pseudo-absences placement and number, information on the modelling algorithms, etc), as well as ecological justification for the sizes of the calibration area or the distance considered to reduce sampling bias. Regarding this latter point, the authors should also make sure that the spatial autocorrelation in models' residuals was actually absent or not significant.

Please submit your revised manuscript by Mar 17 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Mirko Di Febbraro

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. "Please upload a new copy of Figure 2 and 4 as the detail is not clear. Please follow the link for more information: " ext-link-type="uri" xlink:type="simple">https://blogs.plos.org/plos/2019/06/looking-good-tips-for-creating-your-plos-figures-graphics/"" " ext-link-type="uri" xlink:type="simple">https://blogs.plos.org/plos/2019/06/looking-good-tips-for-creating-your-plos-figures-graphics/"

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

Reviewer #4: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

Reviewer #4: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The interesting paper entitled “Broad-scale factors shaping the ecological niche and geographic distribution of Spirodela polyrhiza” showed a multi-step approach to select relevant variables for modelling the ecological niche. A widely distributed aquatic plant species was used as a case study, but this approach can be widely replicated for different species, geographic regions and scales. The analysis led to interesting and reliable results on plant-specific ecology. Further thoughts on how underestimated factors, such as the calibration area and spatial resolution used, greatly influence the response of each model are also of interest.

Overall, I consider this manuscript suitable for publication in Plos One.

I have only a few concerns:

I suggest reporting the authority of the species when first mentioned.

L 136-137: the authors mention that to reduce autocorrelation bias, they thinned the records using a minimum point-to-point distance of ~30'. It would be useful to test and compare this autocorrelation to show and then discuss whether it was indeed reduced.

L 170: please cite the source from which the ecoregions were retrieved.

Attachments: for ease of reading, I suggest changing the format of the supporting figures (e.g. jpeg) or directly merging all this information (including descriptions) into one pdf file.

Reviewer #2: The main aim of the MS is to identify the abiotic factors that shape the distribution of Spirodela polyrhiza. This aim could be crucial to understand and predict the actual and future distribution of the target species, but in my opinion the methods used are completely wrong:

First of all raster resolution of explanatory variables at 10’ or 30’ cannot be acceptable, especially to investigate the distribution of a species that lives in the ponds.

Secondly, the authors try to disentangle some methodological issues in SDM/ENM (e.g. extend of the calibration areas, algorithms) that cannot be addressed using very coarse variables and considering only one species.

Specific methodological comment:

- Records that were outside of, but closer than ~5’ to the edge of environmental layers were moved to the nearest pixel with information – what? are you expected to find similar environmental condition within 100 km2

- minimum point-to-point distance of ~30’ – how many different environmental condition occurs in ~3600 km2 - not acceptable

- the broad distribution of this species makes it difficult for that method to be applied, why? Maybe is due to the coarse resolution of your approach?

- Modeling algorithms: how did you generate the pseudoabsence? how did you calibrate the models,? have you performed some how the cross validation?

- “We transferred all the models across the area comprising the union of the four calibration areas and compared those models to assess whether patterns of suitability values differed as a result of using distinct variables, calibration areas, and algorithms.” WHY? Is it a methodological or ecological MS? Not clear.

- Have you considered the multicollinearity among explanatory variables?

Please consider to read many paper that suggest how perform the SDM/ENM, how create pseudo-absence, how to create ensemble models by combining different algorithms, how to perform cross validation and so on. Please not use a single species to address methodological question, consider to use virtual species for that. Hereunder some useful paper:

Barbet‐Massin, M., Jiguet, F., Albert, C. H., Thuiller, W. (2012). Selecting pseudo‐absences for species distribution models: how, where and how many?. Methods in ecology and evolution, 3(2), 327-338.

Bucklin, D. N., Basille, M., Benscoter, A. M., Brandt, L. A., Mazzotti, F. J., Romanach, S. S., ... Watling, J. I. (2015). Comparing species distribution models constructed with different subsets of environmental predictors. Diversity and distributions, 21(1), 23-35.

Connor, T., Hull, V., Viña, A., Shortridge, A., Tang, Y., Zhang, J., ... Liu, J. (2018). Effects of grain size and niche breadth on species distribution modeling. Ecography, 41(8), 1270-1282.

Muscarella, R., Galante, P. J., Soley‐Guardia, M., Boria, R. A., Kass, J. M., Uriarte, M., Anderson, R. P. (2014). ENM eval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for Maxent ecological niche models. Methods in ecology and evolution, 5(11), 1198-1205.

VanDerWal, J., Shoo, L. P., Graham, C., Williams, S. E. (2009). Selecting pseudo-absence data for presence-only distribution modeling: how far should you stray from what you know?. Ecological modelling, 220(4), 589-594.

Reviewer #3: The paper entitled "Broad-scale factors shaping the ecological niche and geographic distribution of Spirodela polyrhiza", investigates the effect of variable selection procedures for modeling the ecological niche of the aquatic Spirodela polyrriza, taking into account variability arising from using distinct algorithms, calibration areas, and spatial resolutions of variables. The authors show that the final set of variables selected based on statistical inference varied considerably depending on the combination of algorithm, calibration area, and spatial resolution used.

The article is clearly written, polished, well-edited, and scientifically sound. It is characterized by good originality. I recommend publishing with minor changes. I have a few comments.

Abstract

The abstract accurately describes the main objective of the study. It explains how the study was done, including the model organism used, without exceeding methodological details. The most important results are summarized, but their significance has not been sufficiently emphasized, specifically for the variation of the final set of variables selected based on the combination of algorithm, calibration area, and spatial resolution.

Introduction

The authors provide a careful overview of the challenge of selecting appropriate environmental variables when characterizing species' ecological niches, and what still ought to be done. The objectives of the study are clearly specified, but hypothesis are missing. The lack of clear hypothesis could prevent to really understand the significance and the importance of the study.

Methods

The Materials and Methods section provides enough detail to allow suitably skilled investigators to replicate the main steps of the study. However, the lack of specific information (e.g., method of generating pseudo-absences, method details of GLMs' model calibration, R packages adopted to evaluate GLMs' performance of candidate models) does not allow a full understanding of the code provided. Specific information should be included in detail, citing articles you followed for the choices/methods applied.

In addition, I suggest the authors explain why they adopted that specific ratio of the quantity of presence data to the number of background points/pseudo-absence data to fit models, specifying a reference that supports the choice. This is always a very sore point because according to some authors an inadequate number of background points or pseudo-absences (in this study 20,000 points for 964 occurrences; lines 138 and 190-191) could reduce the accuracy of the models. Model accuracy is generally affected both by this ratio, but also by the method used to generate pseudo-absences (see for example

Barbet-Massin et al., 2012 -

https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/j.2041-210X.2011.00172.x).

More specific comments here:

Line 136 - 138: I suggest the authors provide details on the reference that supports the choice of using the minimum point-to-point distance of ~30’ for the spatial thinning of species occurrence records.

Line 160 - 162: "We performed raster aggregation procedures (average of values) on CEC, OC, and pH to match the resolution of BIO variables, and on BIO and SR variables to match the resolution of variables at 30’. [..]" I do not really understand this sentence. Have you used a BIO variable as a snap raster to ensure all cells were properly aligned, and all rasters have the same cell resolutions? If this is the case, please specify the method applied (e.g., nearest neighbour method etc.), if not rephrase the sentence.

Line 162 - 164: "Although the set of variables representing soil conditions used at 10’ differs from the one at 30’, variable selection analyses will help to identify whether the variables selected differ between the two resolutions. [..]"

Does any other author support this??

Line 171 - 176: "Although a new simulation-based approach has been recently suggested as a reliable tool to estimate calibration areas [59], the broad distribution of this species makes it difficult for that method to be applied. Our chosen calibration areas are therefore reasonable options to calibrate models, considering that such areas should reflect what regions could have been accessible to the species and present relevant environments for comparisons [..]"

Does any other author support this?? Do you have a specific reference to cite?

Line 216-217: "The latter consideration assumes that using variables for which the entire spectrum of responses can be characterized makes for better models. [..]" Please cite and provide a reference for this sentence.

Line 218-219: "Biological relevance of variables was determined based on details about the species' natural history, phenology, and physiology in the literature, and our own experience with populations in the field and controlled environments [..]" No bibliographic reference was cited to justify the selection of biologically relevance variables. Please, integrate a citation

Line 260: “see below”. Where? Please, specify.

Results

The results relate to the research question, and the language adopted to express results is clear and concise. The tables and figures are appropriate, but very fragmented in many appendix documents. To explore the results in detail, the reader should open as many as 11 documents (.docx format) and have vector graphic software to open 26 images attached in the .eps format. I strong suggest aggregating appendixes and exporting images in a simple file format (e.g., jpg or .tif).

Discussion

The writing is very good and well-polished. I have no suggestion.

Reviewer #4: General comments

The authors of “Broad-scale factors shaping the ecological niche and geographic distribution of Spirodela polyrhiza” focused their research to define an innovative methodological approach in order to select the most appropriate set of environmental variables in ecological niche modeling. This aim is the basis for produce efficient ecological niche modeling, and even today it still not fully resolved. Currently the traditional methods to establish the set of environmental variables consists in different approach for example excluding the variables with very high multicollinearity problems, selecting the variable through expert based procedure supported by empirical evidences, and also letting algorithms that they eliminate the variables in order to optimize the fitting of ecological niche model produced. Furthermore as well described by authors, the final set of environmental variables in ecological niche modeling depends on many different factors. Firstly in the ecological niche modelling, these models can be produced to respond two different aims to analyze the ecology of species target and its environmental limits or to predict the geographic distribution of species target, and consequently these two different aims require to define dissimilar environmental variables. Moreover the set of environmental variables may varies depending on spatial resolution of environmental data used, on areas for model calibration, and also on algorithm used. These three last aspects will require further examines given that there are still few researches that they directly investigated these questions. In this context, the authors developed ecological niche models for Spirodela polyrhiza, a cosmopolitan free-floating aquatic plants on different calibration areas, using environmental variables at different spatial resolutions, using two niche model algorithms and also applying a multi-step approach to define the environmental. The occurrence data were downloaded from GBIF and Botanical Information and Ecology Network at global scale, successively these data were filtered maintaining 964 occurrences. The authors used environmental variables with different spatial resolution, in particular bioclimatic and solar radiation at 10’ of resolution acquired from WorldClim v2.1, soil variables as cation exchange capacity, organic carbon and ph from World Soil Information database at fine resolution of 250 m and coarser soil variables as total phosphorous, labile inorganic phosphorous and organic phosphorous at 30’ resolution. Furthermore, the authors considered four different areas for model calibration: first area was defined as buffers of 5° around S. polyrhiza occurrences, the second area consist in concave-hull polygons with a buffer of 5° around S. polyrhiza occurrences, the third area was the ecoregions occupied by the species buffered by 1° and finally the fourth area was the intersection of the previous three areas. Concerning the ecological niche model algorithms the authors calculated the generalized linear models (GLM with different weight, 1 for S. polyrhiza occurrences and 10,000 for pseudo-absence) and Maxent using 20,000 pseudo-absence and background data respectively. Finally, the environmental variables were selected using a multi-step approach that well summarize a large number of qualitative and quantitative approaches individually applied on previous studies. First step (that is only described in Figure 2) consist in inspection and/or treatment of variables, after a measure of linear correlation among variables, followed by exploration of variable values inside and outside the S. polyrhiza occurrence areas. At the end of the third step, the authors proposed a first selection of variables supported by ecological and historic information of species target. After, the variables were assembled between them in all combinations from two to total number of variables. Finally all these dataset were used in ecological niche models with different calibration areas and algorithms (GLM, Maxent) with a total of 10,180 and 5065 GLM models were tested at spatial resolution of 10’ and 30’ respectively and also for Maxent algorithms were produced 61,080 and 30,390 niche models at 10’ and 30’ respectively produced. The performance of each models was calculated using the following metrics: partial ROC, omission rate and Akaike information criterion for GLMs, and the AICc for Maxent. Ultimately, the authors defined the effects of environmental variables analyzing the best model for each algorithms (GLM, Maxent) and calibration area and in the two spatial resolutions (10’, 30’) through the use of jackknife analysis for Maxent and ANOVA for GLM.

The results of this research demonstrated the potentiality of this approach to define the best set of environmental variables. Firstly, the graphical explorations and the linear correlation of environmental conditions across calibration areas and S. polyrhiza occurrences enables to display the variables with higher suitable conditions in order to reduce at 11 variables with 10’ of spatial resolution and 10 variables with 30’ of spatial resolution. Interestingly, the differences of environmental variables selected at different resolution, concerning 10’ resolution the soil and solar radiation variables were more suitable to analyze the S. polyrhiza occurrences whereas at 30’ resolution the bioclimatic and solar radiation variables were more suitable. Finally, the Maxent algorithm seems to work better compared to GLM given that the Maxent projections showed higher variability across for each spatial resolution and the calibration areas.

In general, this research the authors very well examine the problem due to the selection of environmental variables for niche ecological modelling. The manuscript is well structured in particular in introduction, methods and discussion, less the results that it requires a large number of information as the results of model evaluation (partial ROC, omission rate, Akaike information criterion for GLM and AICc for Maxent), and also the results of jackknife analysis for Maxent and ANOVA for GLM to measure/explore deviances for each environmental variables selected. Moreover, the results obtained for S. polyrhiza niche models are in line with large number of previous researches with different species target in which the bioclimatic, solar radiation and soil conditions variables were identified as the most important environmental variables that limit species growth at global scale. Consequently in the manuscript, miss a clear paragraph that describe innovative aspects to use these environmental variables in order to produce S. polyrhiza niche models. I consider that the use of traditionally methods to define the set of environmental variables allowed to achieve same or similar results. Please, provide you a motivation for this my question. Furthermore, several steps in text require more details and adequate motivations, for example miss ecological reason of dimensions and shapes of calibration areas, miss ecological reason of distance (30’) required to reduce the bias of spatial autocorrelation, and also miss the ecological reasons for choosing these environmental variables (bioclimatic, solar radiation, and soil conditions, for more details see Specific comments). Finally, the use of GBIF and BIEN underestimate the real spatial extension of S. polyrhiza. Can this problem effect the ecological niche model produced?

In consideration of above, this paper may be addressed in a major revision.

Specific comments

Line 10: Change the corresponding author email with an institutional email.

Lines 134 – 136: I did not understand this sentence. Are environmental variables not at global scale? Please rewrite this sentence, thank you.

Lines 137 – 138: The S. polyrhiza occurrences were subjected to a drastic reduction from 45,913 to 964 records. This reduction is due to minimum point-to-point distance (30’) set to eliminate the spatial autocorrelation problem. Please, you include in the manuscript an ecological motivation specific to S. polyrhiza to establish this distance.

Paragraph “Environmental variables”: In this paragraph misses a description how these environmental variables were important to S. polyrhiza growth.

Lines 157 – 160: Add a table with the environmental variables used describing their ecological importance specific to S. polyrhiza and the spatial resolution.

Lines 167 – 171: Please add references that they used similar methods to define calibration area, and also indicated the importance of these methods (buffer, concave-hull polygons, ecoregions, and the intersection of previous three areas) as calibration areas for S. polyrhiza.

Paragraph “Modeling algorithms”: Even if the calibration areas include large portion of territory around S. polyrhiza occurrences, these occurrences derived by GBIF and BIEN dataset that could be underestimated the areal of target species. Consequently, background or pseudo-absence points could be false negative. How you considered this question?

Lines 198 – 200 and Figure 2: In the main text miss a description of the first step displayed in Figure 2 “Inspection and/or treatment”. Furthermore, please add the same number for each step reported in main text in figure 2 to make the reading of manuscript easier.

Lines 212 – 216: The references of Fig. 3 and Fig. 4 were inverted in main text. After, these two figures (Fig. 3, Fig. 4) showed preliminary results of S. polyrhiza ecological niche model. In my researches, I prefer described these results in the results paragraph. Please move these two figures and add their description in results paragraph.

Figures 3, 4, 6, 7: Add in the main text or in the caption of these figures the reasons to display in the main text only the results with 10’ in spatial resolution.

Figures 6 and S11: Add a legend of suitability.

Lines 367 – 370: Based on this sentence, I have a question. The S. polyrhiza occurrences used in this research could be not include the total real occurrences, can this problem influenced the results of ecological niche models?

Lines 403 – 408: Add ecological reasons specific and references to S. polyrhiza and/or other species in order to motivate the inclusion of variables with complication in graphical explorations.

Lines 411 – 413: The temperature is certainly the main driver to limit the species growth at global scale, please add references.

Lines 419 – 421: The relationship between the target species and soil variables were more important in the finest ecological niche model. This describe an higher importance of soil condition at this fine scale, please add references to motivate this feature.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Mauro Fois

Reviewer #2: No

Reviewer #3: No

Reviewer #4: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 May 4;18(5):e0276951. doi: 10.1371/journal.pone.0276951.r002

Author response to Decision Letter 0


10 Apr 2023

Major comment from Editor and Reviewers:

Editor: Regarding this latter point (distance considered to reduce sampling bias), the authors should also make sure that the spatial autocorrelation in models' residuals was actually absent or not significant.

Reviewer 1: L 136-137: the authors mention that to reduce autocorrelation bias, they thinned the records using a minimum point-to-point distance of ~30'. It would be useful to test and compare this autocorrelation to show and then discuss whether it was indeed reduced.

Reviewer 2: … minimum point-to-point distance of ~30’ – how many different environmental condition occurs in ~3600 km2 - not acceptable

Reviewer 3: Line 136 - 138: I suggest the authors provide details on the reference that supports the choice of using the minimum point-to-point distance of ~30’ for the spatial thinning of species occurrence records.

Reviewer 4: … miss ecological reason of distance (30’) required to reduce the bias of spatial autocorrelation. Lines 137 – 138: The S. polyrhiza occurrences were subjected to a drastic reduction from 45,913 to 964 records. This reduction is due to minimum point-to-point distance (30’) set to eliminate the spatial autocorrelation problem. Please, you include in the manuscript an ecological motivation specific to S. polyrhiza to establish this distance.

Response: We understand this concern and are aware of the usual problems related to sampling bias and how spatial thinning can help to reduce this problem. We consider that spatial correlation in model residuals are still going to exist after our thinning process. However, despite not solving the issue of spatial autocorrelation completely, the distance filter that we used helps to exclude duplicated information and reduce problems derived from autocorrelation without losing more data. We have tried distinct filter distances (0, 50, 100, 150, 200, and 250 km) to handle spatial autocorrelation issues in our study (measured using Moran’s I), but none of these options solves the problem entirely (S1-S2 Tables). That is, after using our thinning distance (30’, ~50 km) observed values of Moran’s I are reduced by orders of magnitude for all predictors; these values do not decrease appreciably more with increased thinning distances. In fact, none of the distances tested erases the problem of spatial autocorrelation (i.e., all P-values were 0.01); the loss of data increases with larger distances. We believe that this is happening because of the spatial arrangement of our data, which are almost global in extent but are biased towards the northern parts of the planet. We also consider that the intensive model selection process that we used (aiming to meet statistical significance, omission, and model complexity criteria) helps to reduce the effect of spatial autocorrelation in our results.

We respectfully disagree with the statement that 30’ is a non-acceptable distance for such filtering. Analyses in ecological niche modeling can be done at multiple scales and grain size. As we are explicitly acknowledging that our analyses are coarse, the uncertainty of coordinates in many places in Europe is high, and our conclusions are directed to broad-scale aspects, we consider that this distance is justified.

We have modified the text in the manuscript to explain our motivation to use this distance (see lines 139-142), and included the table below as part of our supplementary materials.

_____________________________________________

Reviewer #1:

I suggest reporting the authority of the species when first mentioned.

Response: We added the authority as suggested (see line 92).

L 170: please cite the source from which the ecoregions were retrieved.

Response: No particular reference is associated with this layer, however, we have added a reference to the database from which we obtained such data (see line 192).

Attachments: for ease of reading, I suggest changing the format of the supporting figures (e.g. jpeg) or directly merging all this information (including descriptions) into one pdf file.

Response: We included the supporting figures and tables as a separate PDF file for easy reading. However, following the journal requirements the original formats were kept.

_____________________________________________

Reviewer #2:

First of all raster resolution of explanatory variables at 10’ or 30’ cannot be acceptable, especially to investigate the distribution of a species that lives in the ponds.

Response: We understand the reviewer’s concern. Our main focus is to understand which environmental factors are most relevant to the ecological niche and distribution of the species. We decided to use layers at the resolutions mentioned above in view of the broad distribution of the species, the availability of some layers at only 30’ resolution (i.e., phosphorus), and the fact that there are no available layers that apply to and characterize all lentic water bodies. In addition, many of the occurrence points available for the species in a large portion of Europe were georeferenced using a grid of 10 km. For this reason, using layers of a finer resolution would not be adequate. Finally, for global-scale climate data, the real spatial resolution of the information available (e.g., weather station data for WorldClim) is actually much more like 30-60’, and all of the detail to which we are accustomed comes from interpolation via reference to digital elevation models. We understand the limitations of our study, and that is why we are focused on discussing which factors are important at a broad scale—clearly, other processes are acting at finer spatial scales, and are not treated in this analysis.

Secondly, the authors try to disentangle some methodological issues in SDM/ENM (e.g. extend of the calibration areas, algorithms) that cannot be addressed using very coarse variables and considering only one species.

Response: We respectfully disagree. We have set out to elucidate the factors that delimit and constrain the ecological niche and geographic distribution of this species on a global extent. We have treated those same methodological issues in many other publications (led by Cobos or by Peterson), such that this paper is a platform not to propose or decide those methodological issues, but rather to reflect on lessons proposed and learned in other studies already completed.

Specific methodological comment:

- Records that were outside of, but closer than ~5’ to the edge of environmental layers were moved to the nearest pixel with information – what? are you expected to find similar environmental condition within 100 km2

Response: We understand your concern. Considering the first law of geography—that near things are more similar than far things—which is to say that climate dimensions show substantial spatial autocorrelation, indeed conditions should be similar, especially along the coast, where these points are located. We also note that, given the coarse resolution of the data layers used, these points falling outside of the raster extent by short distances may be “out” only because of how raster layers are represented (i.e., as square pixels). This is, raster layers do not follow coastal lines properly and gaps in area coverage may exist given the gridded nature of these layers.

- the broad distribution of this species makes it difficult for that method to be applied, why? Maybe is due to the coarse resolution of your approach?

Response: We are very familiar with the method in question, and the reason not to use it is not related to the resolution of our layers. One of the limitations of such a method is that it does not work well when processes are run across large regions where the shape of the planet becomes a factor to calculate distances for dispersal. As our species is distributed across most of the world the limitations of this method constitutes a problem for its application, which is why we did not use it.

- Modeling algorithms: how did you generate the pseudoabsence? how did you calibrate the models,? have you performed some how the cross validation?

Response: Please see details for pseudo-absences in lines 211-220, model calibration and model evaluation in lines 253-266.

- “We transferred all the models across the area comprising the union of the four calibration areas and compared those models to assess whether patterns of suitability values differed as a result of using distinct variables, calibration areas, and algorithms.” WHY? Is it a methodological or ecological MS? Not clear.

Response: This is an ecological study, but assessing whether choices about environmental variables, calibration areas, and algorithms are important because they can and do affect model outcomes. If one did not assess different modeling choices in those dimensions, one would run the risk of interpreting as biological patterns phenomena that are, in reality, just methodological artifacts. As such, we assess a diversity of methodological choices, and only interpret biologically patterns that are robust to different choices.

- Have you considered the multicollinearity among explanatory variables?

Response: Please see details in lines 225-226, 236.

Please consider to read many paper that suggest how perform the SDM/ENM, how create pseudo-absence, how to create ensemble models by combining different algorithms, how to perform cross validation and so on. Please not use a single species to address methodological question, consider to use virtual species for that. Hereunder some useful paper:

Barbet‐Massin, M., Jiguet, F., Albert, C. H., Thuiller, W. (2012). Selecting pseudo‐absences for species distribution models: how, where and how many?. Methods in ecology and evolution, 3(2), 327-338.

Bucklin, D. N., Basille, M., Benscoter, A. M., Brandt, L. A., Mazzotti, F. J., Romanach, S. S., ... Watling, J. I. (2015). Comparing species distribution models constructed with different subsets of environmental predictors. Diversity and distributions, 21(1), 23-35.

Connor, T., Hull, V., Viña, A., Shortridge, A., Tang, Y., Zhang, J., ... Liu, J. (2018). Effects of grain size and niche breadth on species distribution modeling. Ecography, 41(8), 1270-1282.

Muscarella, R., Galante, P. J., Soley‐Guardia, M., Boria, R. A., Kass, J. M., Uriarte, M., Anderson, R. P. (2014). ENM eval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for Maxent ecological niche models. Methods in ecology and evolution, 5(11), 1198-1205.

VanDerWal, J., Shoo, L. P., Graham, C., Williams, S. E. (2009). Selecting pseudo-absence data for presence-only distribution modeling: how far should you stray from what you know?. Ecological modelling, 220(4), 589-594.

Response: Thank you for the suggestions. We are aware of, and have read, all the references mentioned. We are now citing Barbet‐Massin et al. (2012) in the section in which we explain how GLMs were constructed (see line 217). We also cite Connor et al. (2018) in our discussion (see line 401). While clearly differences exist in terms of methodological choices among labs and among researchers, we believe that the methods used in our study are more current than the ones described in the other references suggested.

_____________________________________________

Reviewer #3:

The abstract accurately describes the main objective of the study. It explains how the study was done, including the model organism used, without exceeding methodological details. The most important results are summarized, but their significance has not been sufficiently emphasized, specifically for the variation of the final set of variables selected based on the combination of algorithm, calibration area, and spatial resolution.

Response: We have added text to our abstract to highlight the significance of our findings (see lines 27-31).

The authors provide a careful overview of the challenge of selecting appropriate environmental variables when characterizing species' ecological niches, and what still ought to be done. The objectives of the study are clearly specified, but hypothesis are missing. The lack of clear hypothesis could prevent to really understand the significance and the importance of the study.

Response: We included a hypothesis in our introduction, as suggested (see lines 96-99).

The Materials and Methods section provides enough detail to allow suitably skilled investigators to replicate the main steps of the study. However, the lack of specific information (e.g., method of generating pseudo-absences, method details of GLMs' model calibration, R packages adopted to evaluate GLMs' performance of candidate models) does not allow a full understanding of the code provided. Specific information should be included in detail, citing articles you followed for the choices/methods applied.

Response: We have added the details suggested in multiple parts of our methods. Regarding the method to generate pseudo-absences, we describe that in lines 211-217. Although not many R packages were used to perform GLM-related procedures, we made it clear that we did most of this using base function in R (see lines 265-266). Please see lines 253-261 regarding details in GLM calibration and evaluation (this part includes details for Maxent and GLMs).

In addition, I suggest the authors explain why they adopted that specific ratio of the quantity of presence data to the number of background points/pseudo-absence data to fit models, specifying a reference that supports the choice. This is always a very sore point because according to some authors an inadequate number of background points or pseudo-absences (in this study 20,000 points for 964 occurrences; lines 138 and 190-191) could reduce the accuracy of the models. Model accuracy is generally affected both by this ratio, but also by the method used to generate pseudo-absences (see for example Barbet-Massin et al., 2012).

Response: We added some text to explain why we used such a number of points as background/pseudo-absences, including a reference that justifies our decision (see lines 214-217).

Line 160 - 162: "We performed raster aggregation procedures (average of values) on CEC, OC, and pH to match the resolution of BIO variables, and on BIO and SR variables to match the resolution of variables at 30’. [..]" I do not really understand this sentence. Have you used a BIO variable as a snap raster to ensure all cells were properly aligned, and all rasters have the same cell resolutions? If this is the case, please specify the method applied (e.g., nearest neighbour method etc.), if not rephrase the sentence.

Response: We have added two more sentences to specify the way aggregation was performed (see lines 176-177).

Line 162 - 164: "Although the set of variables representing soil conditions used at 10’ differs from the one at 30’, variable selection analyses will help to identify whether the variables selected differ between the two resolutions. [..]" Does any other author support this??

Response: We rewrote our statement to make it easier to understand and to reflect that we consider that our analyses will allow us to check whether some variables that are present in the two groups (e.g., bioclimatic variables) are selected consistently despite being grouped with distinct variables (see lines 177-180).

Line 171 - 176: "Although a new simulation-based approach has been recently suggested as a reliable tool to estimate calibration areas [59], the broad distribution of this species makes it difficult for that method to be applied. Our chosen calibration areas are therefore reasonable options to calibrate models, considering that such areas should reflect what regions could have been accessible to the species and present relevant environments for comparisons [..]" Does any other author support this? Do you have a specific reference to cite?

Response: Thank you for your questions. We do not have a specific reference for this statement. However, we are very familiar with the methods that use simulations for calibration area delimitation. One of the limitations of such a method is that it does not work well when processes are run across large regions where the shape of the planet becomes a factor to calculate distances for dispersal. As our species is distributed across most of the world the limitations of this method constitutes a problem for its application, which is why we did not use it. The reason why we consider the areas selected as reasonable is that we have used all these options before and they all allow us to include areas close to the ones where the species has been reported. Since close areas are more likely to be accessed, we think model calibration processes were done considering relevant environments (see lines 182-190).

Line 216-217: "The latter consideration assumes that using variables for which the entire spectrum of responses can be characterized makes for better models. [..]" Please cite and provide a reference for this sentence.

Response: There are no references for this particular statement, however this is related to the ideas presented in previous studies about truncated and complete responses, and how non-truncated responses benefit model construction and projection (Chevalier et al. 2021, Owens et al. 2013, Peterson et al. 2011, Thuiller et al. 2004). We have added these references accordingly (see lines 239-241).

Line 218-219: "Biological relevance of variables was determined based on details about the species' natural history, phenology, and physiology in the literature, and our own experience with populations in the field and controlled environments [..]" No bibliographic reference was cited to justify the selection of biologically relevance variables. Please, integrate a citation

Response: Thank you for your suggestion. We have included references as appropriate in lines 241-243.

Line 260: “see below”. Where? Please, specify.

Response: We have removed this part as it makes reference to one of the latest results we obtained, which is properly referenced in the results section (see lines 349-360).

The results relate to the research question, and the language adopted to express results is clear and concise. The tables and figures are appropriate, but very fragmented in many appendix documents. To explore the results in detail, the reader should open as many as 11 documents (.docx format) and have vector graphic software to open 26 images attached in the .eps format. I strong suggest aggregating appendixes and exporting images in a simple file format (e.g., jpg or .tif).

Response: We included the supporting figures and tables as a separate PDF file for easy reading. However, following the journal requirements the original formats were kept.

_____________________________________________

Reviewer #4:

In general, this research the authors very well examine the problem due to the selection of environmental variables for niche ecological modelling. The manuscript is well structured in particular in introduction, methods and discussion, less the results that it requires a large number of information as the results of model evaluation (partial ROC, omission rate, Akaike information criterion for GLM and AICc for Maxent), and also the results of jackknife analysis for Maxent and ANOVA for GLM to measure/explore deviances for each environmental variables selected.

Response: Thank you for your comments. We added a few details to refer to the criteria that selected models met in terms of partial ROC, omission rates and AIC or AICc (see lines 302-303). We would like to keep our text focused on the sets of variables selected, as this was the main focus of our study. Please see lines 328-339 for a description of jackknife and deviance results.

Moreover, the results obtained for S. polyrhiza niche models are in line with a large number of previous research with different species targets in which the bioclimatic, solar radiation and soil conditions variables were identified as the most important environmental variables that limit species growth at global scale. Consequently, in the manuscript, I miss a clear paragraph that describes innovative aspects of using these environmental variables in order to produce S. polyrhiza niche models.

Response: We understand the reviewer’s concern and agree that these types of variables have been used in previous studies to characterize niches and potential distributions of species. We would like to clarify that our goal was not to test new types of variables, but rather we wanted to understand which variables from a multiple and diverse set of predictors could be more relevant to understanding what shapes the niche and distribution of the species (at broad-scale).

I consider that the use of traditional methods to define the set of environmental variables allowed to achieve same or similar results. Please, provide you with a motivation for this question.

Response: We respectfully disagree. Traditional methods do not include one of the critical steps in our study (i.e., testing multiple model options in which one of the factors changed is the set of variables). This means that traditionally only one set of variables would have been tested with distinct model parameterizations. Considering that including or excluding certain variables play a big role in model performance the answers that we obtained are certainly different and most likely, more appropriate.

Line 10: Change the corresponding author email with an institutional email.

Response: Thank you for the suggestion. The corresponding author prefers to keep the email as is to allow for long term availability of this means of communication.

Lines 134 – 136: I did not understand this sentence. Are environmental variables not at global scale? Please rewrite this sentence, thank you.

Response: The variables are indeed global, but they are restricted to terrestrial areas. We have added some wording to make it clear (see lines 137-138).

… miss the ecological reasons for choosing these environmental variables (bioclimatic, solar radiation, and soil conditions …)

Paragraph “Environmental variables”: In this paragraph misses a description how these environmental variables were important to S. polyrhiza growth.

Response: We have added some text as a general statement of why these variables are important (see lines 153-157). We also added references in the section “Variable selection process” to guide the readers to the original studies that demonstrated the importance of these types of variables for the species at local level or via experimentation (see lines 157-159).

Lines 157 – 160: Add a table with the environmental variables used describing their ecological importance specific to S. polyrhiza and the spatial resolution.

Response: We have added a table as a supplementary material and not as part of the main manuscript to avoid redundancy between the text and the table, considering changes that have been made based on request from all reviewers (S3 Table).

… several steps in text require more details and adequate motivations, for example miss ecological reason of dimensions and shapes of calibration areas.

Lines 167 – 171: Please add references that they used similar methods to define calibration area, and also indicated the importance of these methods (buffer, concave-hull polygons, ecoregions, and the intersection of previous three areas) as calibration areas for S. polyrhiza.

Response: A motivation for distances used has been added in lines 186-189. We also added references as requested (see lines 193-194). We consider that with the addition of these references, the explanation of areas used in lines 183-186, and the code provided, the readers should now be able to fully follow the steps we performed.

Lines 198 – 200 and Figure 2: In the main text miss a description of the first step displayed in Figure 2 “Inspection and/or treatment”. Furthermore, please add the same number for each step reported in main text in figure 2 to make the reading of manuscript easier.

Response: We changed our text to account for this step included in figure 2 (see lines 224-225). We decided not to include the numbers from the text in figure 2 as this figure describes the process and intermediate set of decisions to be made. Including the numbers in the figure may result in more confusion.

Lines 212 – 216: The references of Fig. 3 and Fig. 4 were inverted in main text. After, these two figures (Fig. 3, Fig. 4) showed preliminary results of S. polyrhiza ecological niche model. In my researches, I prefer described these results in the results paragraph. Please move these two figures and add their description in results paragraph.

Response: We have corrected the order of the figures in the text. Considering that the interpretation of the results presented in the figures is actually in the first paragraph of results, we would like to keep the place of these figures as it is to make the methods easier to follow using the examples. We understand that the editorial team of the journal will be the one determining the final position of these figures and we would like to wait to see what they consider more appropriate.

Figures 3, 4, 6, 7: Add in the main text or in the caption of these figures the reasons to display in the main text only the results with 10’ in spatial resolution.

Response: The reasons to display only results for variables at 10’ are: 1) the figure in geographic space is more detailed at this resolution; 2) many of the patterns observed are similar; and 3) to be consistent across the document regarding the results shown. We modified our captions to briefly explain this.

Figures 6 and S11: Add a legend of suitability.

Response: The legend has been added.

The use of GBIF and BIEN underestimate the real spatial extension of S. polyrhiza. Can this problem affect the ecological niche model produced?

Paragraph “Modeling algorithms”: Even if the calibration areas include large portion of territory around S. polyrhiza occurrences, these occurrences derived by GBIF and BIEN dataset that could be underestimated the area of target species. Consequently, background or pseudo-absence points could be false negative. How you considered this question?

Lines 367 – 370: Based on this sentence, I have a question. The S. polyrhiza occurrences used in this research could be not include the total real occurrences, can this problem influenced the results of ecological niche models?

Response: If we understood correctly, the reviewer asks whether the inclusion of occurrence records of the plant that are not known because of lack of sampling could influence the results of ecological niche models? Our short answer is yes, and that is directly related to uncertainty derived from data in any modeling exercise (almost impossible to measure because we do not know the truth). However, the degree of influence of new records depends on how environmentally different they are compared to existent ones. Although no other source of occurrence records of this plant exists and we are not able to answer this question directly, we suspect that the records currently used summarize appropriately the whole set of environmental conditions used by the species. Therefore, the addition of further information may not change our characterizations dramatically.

Regarding our background/pseudo-absence data potentially being false negatives, we are not too concerned about that as Maxent does not assume the background to be absences. GLMs are usually constructed with presences and pseudo-absences (ideally absences), however, weighting our presences and pseudo-absences differently helps to reduce problems related to false negatives. This process makes GLMs to be more similar to Maxent, in the sense that pseudo-absences become something more like a background (see our explanation in lines 218-220).

Lines 403 – 408: Add ecological reasons specific and references to S. polyrhiza and/or other species in order to motivate the inclusion of variables with complication in graphical explorations.

Response: We have added a sentence after this part to talk about the importance of making decisions not only based on statistics but also considering ecological considerations (see lines 428-430).

Lines 411 – 413: The temperature is certainly the main driver to limit the species growth at global scale, please add references.

Response: Thank you for the suggestion. Please see two references specifically related to the importance of temperature for the species in line 433 (the sentence before the one indicated here).

Lines 419 – 421: The relationship between the target species and soil variables were more important in the finest ecological niche model. This describe an higher importance of soil condition at this fine scale, please add references to motivate this feature.

Response: Thank you for the suggestion. We would like to refrain from adding this type of interpretation because solid variables were not the same at the two resolutions. We are concerned that a sentence saying that solid variables may be more important at finer resolutions are not totally supported by our results and can lead to misinterpretation.

Attachment

Submitted filename: response_reviewers.docx

Decision Letter 1

Mirko Di Febbraro

18 Apr 2023

Broad-scale factors shaping the ecological niche and geographic distribution of Spirodela polyrhiza

PONE-D-22-28513R1

Dear Dr. Cobos,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Mirko Di Febbraro

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #4: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #4: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #4: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #4: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #4: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors addressed my previous concerns regarding autocorrelation bias and other few comments. In my opinion it can be accepted in this form

Reviewer #4: General comments

The paper “Broad-scale factors shaping the ecological niche and geographic distribution of Spirodela polyrhiza” has been considerably improved. In this last version all paragraphs are well structured and exhaustive. The authors have clearly answered my questions and added the information required in the manuscript. Only few details require further clarification before its publication.

Specific comments

Lines 245-246. In the pairwise correlation analysis, which algorithm was used? Please indicate.

Lines 245-249. References to Fig. 3 and Fig. 4 are invert.

Lines 257-259. In the caption of Fig. 3 indicate the icon for combinations of variables with high values of correlation.

Lines 317-319. The results report in S4 and S5 tables do not coincide with the affirmation of this phrase. In S4 table, the Omission rates values are greater than 0.05, and also the AICs values in S4 and S5 tables are of the order of the thousands. Please rewrite this phrase.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Mauro Fois

Reviewer #4: No

**********

Acceptance letter

Mirko Di Febbraro

24 Apr 2023

PONE-D-22-28513R1

Broad-scale factors shaping the ecological niche and geographic distribution of Spirodela polyrhiza

Dear Dr. Cobos:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Mirko Di Febbraro

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Results from linear correlation tests for initial variables.

    Values of correlation above |0.8| are magnified threefold. Results for variables at 30’ resolution are shown.

    (TIF)

    S2 Fig. Histograms of environmental variable values in calibration areas and occurrence records.

    Results for variables at 10’ resolution and buffer calibration areas are shown.

    (TIF)

    S3 Fig. Histograms of environmental variable values in calibration areas and occurrence records.

    Results for variables at 10’ resolution and concave calibration areas are shown.

    (TIF)

    S4 Fig. Histograms of environmental variable values in calibration areas and occurrence records.

    Results for variables at 10’ resolution and calibration areas resulting from ecoregions are shown.

    (TIF)

    S5 Fig. Histograms of environmental variable values in calibration areas and occurrence records.

    Results for variables at 30’ resolution and buffer calibration areas are shown.

    (TIF)

    S6 Fig. Histograms of environmental variable values in calibration areas and occurrence records.

    Results for variables at 30’ resolution and concave calibration areas are shown.

    (TIF)

    S7 Fig. Histograms of environmental variable values in calibration areas and occurrence records.

    Results for variables at 30’ resolution and calibration areas resulting from ecoregions are shown.

    (TIF)

    S8 Fig. Histograms of environmental variable values in calibration areas and occurrence records.

    Results for variables at 30’ resolution and calibration areas resulting from intersection are shown.

    (TIF)

    S9 Fig. Predictor contribution to Maxent models created with variables and parameter settings selected after model calibration.

    Results for variables at 10’ resolutions are shown.

    (TIF)

    S10 Fig. Predictor contribution to Maxent models created with variables and parameter settings selected after model calibration.

    Results for variables at 30’ resolutions are shown.

    (TIF)

    S11 Fig. Geographic projections of suitability values deriving from final models created with the selected variables.

    Results for variables at 30’ resolution are shown.

    (TIF)

    S12 Fig. Projections of suitability values in a three-dimensional environmental space.

    Values of suitability derive from final models created with selected variables and parameters. GLM results for variables at 10’ resolution and calibration areas resulting from buffers are shown.

    (TIF)

    S13 Fig. Projections of suitability values in a three-dimensional environmental space.

    Values of suitability derive from final models created with selected variables and parameters. GLM results for variables at 10’ resolution and calibration areas resulting from concave hulls are shown.

    (TIF)

    S14 Fig. Projections of suitability values in a three-dimensional environmental space.

    Values of suitability derive from final models created with selected variables and parameters. GLM results for variables at 10’ resolution and calibration areas resulting from ecoregions are shown.

    (TIF)

    S15 Fig. Projections of suitability values in a three-dimensional environmental space.

    Values of suitability derive from final models created with selected variables and parameters. GLM results for variables at 30’ resolution and calibration areas resulting from buffers are shown.

    (TIF)

    S16 Fig. Projections of suitability values in a three-dimensional environmental space.

    Values of suitability derive from final models created with selected variables and parameters. GLM results for variables at 30’ resolution and calibration areas resulting from concave hulls are shown.

    (TIF)

    S17 Fig. Projections of suitability values in a three-dimensional environmental space.

    Values of suitability derive from final models created with selected variables and parameters. GLM results for variables at 30’ resolution and calibration areas resulting from ecoregions are shown.

    (TIF)

    S18 Fig. Projections of suitability values in a three-dimensional environmental space.

    Values of suitability derive from final models created with selected variables and parameters. GLM results for variables at 30’ resolution and calibration areas resulting from intersection are shown.

    (TIF)

    S19 Fig. Projections of suitability values in a three-dimensional environmental space.

    Values of suitability derive from final models created with selected variables and parameters. Maxent results for variables at 10’ resolution and calibration areas resulting from buffers are shown.

    (TIF)

    S20 Fig. Projections of suitability values in a three-dimensional environmental space.

    Values of suitability derive from final models created with selected variables and parameters. Maxent results for variables at 10’ resolution and calibration areas resulting from concave hulls are shown.

    (TIF)

    S21 Fig. Projections of suitability values in a three-dimensional environmental space.

    Values of suitability derive from final models created with selected variables and parameters. Maxent results for variables at 10’ resolution and calibration areas resulting from ecoregions are shown.

    (TIF)

    S22 Fig. Projections of suitability values in a three-dimensional environmental space.

    Values of suitability derive from final models created with selected variables and parameters. Maxent results for variables at 10’ resolution and calibration areas resulting from intersection are shown.

    (TIF)

    S23 Fig. Projections of suitability values in a three-dimensional environmental space.

    Values of suitability derive from final models created with selected variables and parameters. Maxent results for variables at 30’ resolution and calibration areas resulting from buffers are shown.

    (TIF)

    S24 Fig. Projections of suitability values in a three-dimensional environmental space.

    Values of suitability derive from final models created with selected variables and parameters. Maxent results for variables at 30’ resolution and calibration areas resulting from concave hulls are shown.

    (TIF)

    S25 Fig. Projections of suitability values in a three-dimensional environmental space.

    Values of suitability derive from final models created with selected variables and parameters. Maxent results for variables at 30’ resolution and calibration areas resulting from ecoregions are shown.

    (TIF)

    S26 Fig. Projections of suitability values in a three-dimensional environmental space.

    Values of suitability derive from final models created with selected variables and parameters. Maxent results for variables at 30’ resolution and calibration areas resulting from intersection are shown.

    (TIF)

    S1 Table. Spatial autocorrelation results for all environmental variables derived from spatial patterns of occurrence data after using distinct distances for spatial thinning.

    Results presented here are for variables at 10’ resolution. Spatial autocorrelation was measured using the statistic Moran’s I.

    (DOCX)

    S2 Table. Spatial autocorrelation results for all environmental variables derived from spatial patterns of occurrence data after using distinct distances for spatial thinning.

    Results presented here are for variables at 30’ resolution. Spatial autocorrelation was measured using the statistic Moran’s I.

    (DOCX)

    S3 Table. Description of ecological importance of variables used for ecological niche modeling exercises with Spirodela polyrhiza.

    (DOCX)

    S4 Table. Selected parameter settings and variables after model calibration for analyses with variables at 10’ resolution.

    AIC/AICc values are not comparable across distinct calibration areas.

    (DOCX)

    S5 Table. Selected parameter settings and variables after model calibration for analyses with variables at 30’ resolution.

    AIC/AICc values are not comparable across distinct calibration areas.

    (DOCX)

    S6 Table. Effects of predictors on GLMs produced using variables and parameter settings selected after model calibration.

    Results for models created with variables at 10’ resolution, using buffer calibration areas are shown. Quadratic = “^2”; Product = “:”.

    (DOCX)

    S7 Table. Effects of predictors on GLMs produced using variables and parameters settings selected after model calibration.

    Results for models created with variables at 10’ resolution, using concave calibration areas are shown. Quadratic = “^2”; Product = “:”.

    (DOCX)

    S8 Table. Effects of predictors on GLMs produced using variables and parameters settings selected after model calibration.

    Results for models created with variables at 10’ resolution, using calibration areas from ecoregions are shown. Quadratic = “^2”; Product = “:”.

    (DOCX)

    S9 Table. Effects of predictors on GLMs produced using variables and parameters settings selected after model calibration.

    Results for models created with variables at 10’ resolution, using calibration areas from intersection are shown. Quadratic = “^2”; Product = “:”.

    (DOCX)

    S10 Table. Effects of predictors on GLMs produced using variables and parameters settings selected after model calibration.

    Results for models created with variables at 30’ resolution, using buffer calibration areas are shown. Quadratic = “^2”; Product = “:”.

    (DOCX)

    S11 Table. Effects of predictors on GLMs produced using variables and parameters settings selected after model calibration.

    Results for models created with variables at 30’ resolution, using concave calibration areas are shown. Quadratic = “^2”; Product = “:”.

    (DOCX)

    S12 Table. Effects of predictors on GLMs produced using variables and parameters settings selected after model calibration.

    Results for models created with variables at 30’ resolution, using calibration areas from ecoregions are shown. Quadratic = “^2”; Product = “:”.

    (DOCX)

    S13 Table. Effects of predictors on GLMs produced using variables and parameters settings selected after model calibration.

    Results for models created with variables at 30’ resolution, using calibration areas from intersection are shown. Quadratic = “^2”; Product = “:”.

    (DOCX)

    Attachment

    Submitted filename: response_reviewers.docx

    Data Availability Statement

    All the data used in this project can be obtained using the code provided or downloaded from the source websites described in the Methods section. Filtered occurrence data and R code to reproduce all analyses is available from a GitHub repository (www.github.com/marlonecobos/Spirodelap_ENM).


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES