Skip to main content
PLOS One logoLink to PLOS One
. 2026 Feb 26;21(2):e0324107. doi: 10.1371/journal.pone.0324107

Predicting dominant terrestrial biomes at a global scale using machine learning algorithms, climate variable indices, and extreme event indices

Hisashi Sato 1,2,*
Editor: Chong Xu3
PMCID: PMC12944746  PMID: 41746926

Abstract

Understanding the global distribution of biomes is essential for biodiversity conservation, climate modeling, and land-use planning. Traditional approaches often summarize climate data into indices, and recent models sometimes include extreme events such as severe droughts or rare cold spells. This study evaluates how the choice of machine learning algorithm, climate data summarization, and extreme climate indices affect the accuracy and robustness of global biome modeling. Four algorithms were tested: random forest (RF), support vector machine (SVM), naive Bayes (NV), and LeNet convolutional neural network (CNN). RF and CNN achieved the highest accuracy, with CNN preferred due to RF’s stronger overfitting. Summarizing climate data into indices reduced accuracy by 1–2%, while adding extreme indices increased accuracy by <2% (except for NV, which performed poorly overall). However, extreme climate data caused large mismatches between observed and predicted climate values, reducing robustness as measured by prediction consistency. These results indicate that including extreme climate data in global biome prediction models offers limited accuracy gains but can significantly weaken robustness, so caution is advised.

Introduction

Biomes—major regional ecological community defined by distinctive life forms and dominant plant species [1]—are primarily determined by climate [2,3] and, in turn, influence climate through biophysical and biochemical feedbacks [4]. Understanding biome distributions is therefore essential not only for estimating land potential and guiding conservation policy, but also for informing climate projection models (reviewed in Hengl, Walsh [5]).

Many methods have been proposed to model biome distributions (reviewed by Sato and Ise [6]). Traditional approaches often use a limited set of derived climate indices (e.g., annual precipitation, coldest-month temperature) to summarize monthly or seasonal data. However, recent advances in machine learning (ML) allow models to incorporate a larger number of raw variables without summarization, potentially improving accuracy and flexibility. For example, Hengl, Walsh [5] used 160 environmental variables, including soil and topography, and monthly climate variables, to model biome distribution using ML algorithms. However, excessive input dimensionality may reduce generalizability and increase computational cost, underscoring the need to balance model complexity and performance. Among the diverse variables now accessible through ML methods, extreme climate events—such as severe droughts and rare low-temperature incidents—have emerged as particularly important predictors of boundaries of biomes (reviewed in Beigaite, Tang [7]). Incorporating extreme climate indices alongside regular variables has been found to improve the performance of decision tree-based models [7].

This study evaluates the effectiveness of machine learning models in predicting the global distribution of potential natural vegetation (PNV), i.e., vegetation in the absence of human influence, under current and future climate conditions. It addresses three questions: (1) Which machine learning algorithm performs best in reproducing the current biome distribution? (2) Does using raw monthly climate variables improve model performance compared to using derived climate indices (BIOCLIM)? (3) Does incorporating extreme climate indices (CLIMDEX) improve model robustness under projected future conditions (2060–2080, RCP8.5)? The analysis focuses on prediction accuracy, model stability, and spatial consistency under extrapolative scenarios.

To contextualize the modeling approach adopted in this study, two reference points are particularly relevant. First, process-based dynamic global vegetation models (DGVMs) simulate vegetation dynamics through mechanistic representations of ecosystem processes [8]. These include plant physiological functions, carbon cycling, and interspecies competition. While this allows for detailed representation of ecological mechanisms, DGVMs rely on numerous parameterizations and assumptions, which can introduce uncertainty—especially when projecting into novel future climates. In contrast, this study, like the one by [9], adopts a purely data-driven approach using ensemble machine learning algorithms that infer climate–vegetation relationships directly from observed data. This enables the models to capture empirical patterns with fewer structural assumptions, offering a computationally efficient and flexible alternative for large-scale biome prediction, particularly under extrapolative scenarios. Second, unlike land cover maps derived from satellite imagery, which primarily reflect current human-modified landscapes, this study aims to model potential natural vegetation (PNV)—that is, the vegetation that would exist without anthropogenic disturbance—at a 0.5° resolution. In this context, data-driven modeling provides an effective means for identifying climate-sensitive biome boundaries and anticipating potential shifts under future scenarios, independent of current and future land-use patterns.

Methods

Biome data

This study used the potential natural vegetation (PNV) dataset compiled by Beigaite, Tang (7) for their decision tree-based models of the global PNV distribution. Here, the same dataset is applied to a wider range of machine learning algorithms. In their study, the authors derived the PNV data from the Moderate Resolution Imaging Spectroradiometer (MODIS) MCD12C1 land cover product in 2001 [10], specifically using the International Geosphere Biosphere Programme (IGBP) land cover classification. This dataset based on the supervised learning classification of the MODIS Terra and Aqua reflectance data [11], contains percent cover for 17 IGBP classes [12] in each grid cell at a resolution of 0.05°. They resampled the data to 50 km × 50 km grids and identified the dominant natural vegetation type in each grid cell. Grid cells partly affected by human activity were retained, under the assumption that relative proportions of natural vegetation remain stable despite such modifications. Of the original 17 categories, only 13 representing natural vegetation were used (Fig 1, S1 Table). Cells with 100% human activity, water cover, or both were excluded as was Antarctica, leaving 52,297 grid cells for analysis. Although Beigaite, Tang (7) did not specify the proportion removed, it is estimated that roughly 10–11% of the land grid cells (excluding Antarctica) were excluded.

Fig 1. Distribution of observation-based PNV data, which were used to train machine learning-based models in this study.

Fig 1

Although many biome classifications exist (e.g., [13,14]), I used this PNV dataset for three practical reasons aligned with global climate-vegetation modelling at 0.5°: (i) the workflow-deriving the dominant natural class from MODIS/IGBP and resampling to ~50-km grids-matches my analysis grid; (ii) its climate inputs and projections are processed consistently with the BIOCLIM (AveI) and CLIMDEX (CEI) indices used here, with harmonized definitions and resolution for present data and CMIP5-based futures; and (iii) employing the same dataset as related machine-learning studies facilitates comparability without additional preprocessing.

Because the aim of this study is to model potential natural vegetation (PNV), I retained grid cells that are only partially affected by human activity, on the premise that the dominant natural-vegetation signal remains informative at 0.5° resolution. By contrast, cells with 100% human cover and/or water were excluded, leaving 52,297 land grid cells (approximately 10–11% of land cells excluded, Antarctica removed). This choice minimizes spatial coverage bias while preserving information on the prevailing natural type. I acknowledge, however, that uncertainty in PNV estimation may increase in regions with strong human influence, and I interpret results in those areas with caution.

Machine learning algorithms

Four machine learning algorithms were employed: RF [15], SVM [16], NV [17], and CNN [6]. RF, SVM, and NV were implemented in R v3.3.3 [18] using the randomForest, ksvm, and naiveBayes packages, respectively, with commands such as randomForest(VegNo ~ ., DatasetTrain), where DatasetTrain is the training dataset and VegNo is the biome category column. All models were run with default settings to (1) ensure fair comparability by avoiding bias from parameter tuning, (2) keep the implementation straightforward and reproducible, and (3) align with the study’s objective of evaluating algorithms and data combinations rather than optimizing a single best-performing model. This choice also simplified implementation. The RF model, for instance, already achieved 100% accuracies on the training sets with default parameters, indicating strong overfitting and suggesting that further tuning would not improve generalization in this case. While strategies such as cross-validation and variable selection can reduce overfitting [19], the RF results indicate that high-capacity models may still overfit even under best-practices settings.

In contrast to Beigaite, Tang (7), a decision tree algorithm was not included in this study. Although decision trees rapidly provide interpretable boundary conditions for the distribution of a given output variable, they are generally inferior to the algorithms explored in this study in terms of reconstruction accuracy [20]. The RF algorithm is an ensemble of decision tree algorithms, which should provide higher model accuracy [15].

Although the models other than CNN were trained using numerical climate data, applying CNN algorithms requires converting the climate data into images. CNNs are typically applied to analyze visual imagery and have been successfully adapted for species distribution modeling at regional [21,22] and global scales [6]. The present CNN was trained following the method of Sato and Ise (6). This method represents climatic conditions using graphical images and employs them as training, testing, and prediction data for CNN models. I selected this method because it allows CNNs—originally developed for image analysis—to automatically extract nonlinear seasonal patterns from multiple climate variables while preserving their temporal structure, enabling convolutional filters to identify spatially coherent features. This method automatically extracts nonlinear seasonal patterns for climatic variables relevant to biome classification. Each graphical image is 256 × 256 pixels and is divided into rectangular cells representing each data point, with tiles in each cell expressing the values in grayscale. Prior to this visualization, climate variables were standardized to the range 0.01–1.00 using a log transformation. The R code for generating the images is available in the online open data repository.

In this paper, the term “CNN” is used, for convenience, to refer to the method of Sato and Ise (6), a machine-learning approach that includes transforming climate data into images as an integral part of its framework. In this method, the arbitrariness inherent in converting climate data into images can lead to variation in learning accuracies. However, Sato and Ise (6) demonstrated that, despite employing various strategies for image transformation and evaluating the resulting differences in learning performance, the effects on biome prediction were minimal: the range of training accuracies was 58.3–59.7% among four different imaging methods (such as pie charts and various color palettes), and 56.2–57.8% among four schemes used to transform climatic variables prior to graphical conversion (such as linear, log, and sigmoid transformations). Although these results are based on different climate and biome datasets than those used in the present study, they provide a useful reference.

The four algorithms selected in this study represent contrasting assumptions and capabilities in handling nonlinearity and feature interactions. RF is an ensemble-based decision tree method known for its robustness, but it can easily overfit training data without proper regularization or depth constraints, especially when using default settings [23]. CNNs have a strong capacity to extract complex nonlinear patterns from spatially structured inputs and have demonstrated superior performance in recognizing hierarchical relationships in multidimensional data [24]. SVMs and NV represent more traditional machine learning approaches: SVMs are sensitive to data scaling and kernel choice, while NV classifiers rely on conditional independence assumptions that may not hold for ecological data. Including these diverse algorithms allowed a broad assessment of how model assumptions influence biome classification performance. Although CNNs were trained using visually encoded climate data rather than tabular variables, the underlying information was identical across all algorithms, ensuring comparability despite differences in input format.

Climate data

This study used four climate datasets: averaged monthly air temperature and precipitation (Ave, 24 variables), averaged monthly climate indices (AveI, 16 variables), climate extreme indices representing extreme conditions on a daily scale such as the maximum length of a dry spell (CEI, 27 variables), and a subset of CEI (CEIpart, 21 variables). The variables included in AveI and CEI are listed in Tables 1 and 2, respectively. S1–S3 Figs show the present (1970–2000) and future (2061–2080) distributions of Ave, AveI, and CEI, respectively. Among all of the climatic variables used in this study, only six in the CEI dataset (Tn10p, Tx10p, Tn90p, Tx90p, WSDI, and CSDI) had completely separate distributions between the present and future. Another indexed extreme climate dataset, CEIpart, was constructed by excluding these variables from the CEI dataset.

Table 1. Indexed (summarized) monthly climate data (AveI).

ID Description Unit
Bio1 Annual mean temperature °C
Bio2 Mean diurnal range (mean of monthly [range]) °C
Bio3 Isothermality °C
Bio5 Maximum temperature of the warmest month °C
Bio6 Minimum temperature of the coldest month °C
Bio8 Mean temperature of the wettest quarter °C
Bio9 Mean temperature of the driest quarter °C
Bio10 Mean temperature of the warmest quarter °C
Bio11 Mean temperature of the coldest quarter °C
Bio12 Annual precipitation mm
Bio13 Precipitation of the wettest month mm
Bio14 Precipitation of the driest month mm
Bio16 Precipitation of the wettest quarter mm
Bio17 Precipitation of the driest quarter mm
Bio18 Precipitation of the warmest quarter mm
Bio19 Precipitation of the coldest quarter mm

Note: Units: °C = degrees Celsius; mm = millimeters.

Table 2. Indexed extreme climate (CEI).

ID Description Unit
FD Number of frost days: annual count of days when TN (daily minimum) < 0°C days
SU Number of summer days: annual count of days when TX (daily maximum temperature) > 25°C days
ID Number of icing days: annual count of days when TX (daily maximum temperature) < 0°C days
TR Number of tropical nights: annual count of days when TN (daily minimum temperature) > 20°C days
GSL Growing season length: annual (January 1 to December 31 in the Northern Hemisphere (NH), July 1 to June 30 in the Southern Hemisphere (SH)) count between first span of at least 6 days with daily mean temperature > 5°C and first span after July 1 in NH (January 1 in SH) of 6 days with TG < 5°C days
TXx Monthly maximum value of daily maximum temperature °C
TNx Monthly maximum value of daily minimum temperature °C
TXn Monthly minimum value of daily maximum temperature °C
TNn Monthly minimum value of daily minimum temperature °C
Tn10p * Cool nights: percentage of days when TN < 10th percentile %
Tx10p * Cool days: percentage of days when TX < 10th percentile %
Tn90p * Warm nights: percentage of days when TN > 90th percentile %
Tx90p * Warm days: percentage of days when TX > 90th percentile %
WSDI * Warm spell duration index: annual count of days with at least 6 consecutive days when TX > 90th percentile days
CSDI * Cold spell duration index: annual count of days with at least 6 consecutive days when TN < 10th percentile days
DTR Diurnal temperature range: monthly mean value of the difference between Tx and Tn °C
Rx1day Monthly maximum consecutive 1-day precipitation mm
Rx5day Monthly maximum consecutive 5-day precipitation mm
SDII Simple precipitation intensity index: annual total precipitation divided by the number of wet days (defined as PRCP ≥ 1.0 mm) in the year mm/day
R10mm Number of heavy precipitation days: annual count of days when PRCP ≥ 10 mm days
R20mm Number of very heavy precipitation days: annual count of days when PRCP ≥ 20 mm days
R1mm Number of wet days: annual count of days when PRCP ≥ 1 mm (days) days
CDD Maximum length of dry spell: maximum number of consecutive days with RR (daily precipitation amount) < 1 mm days
CWD Maximum length of wet spell: maximum number of consecutive days with RR ≥ 1 mm days
R95p Very wet days precipitation: annual total PRCP when RR > 95th percentile mm
R99p Extremely wet days precipitation: annual total PRCP when RR > 99th percentile mm
PRCPTOT Annual total precipitation on wet days (RR ≥ 1 mm) mm

* CEIpart does not contain these six variables.

The Ave data were obtained from the WorldClim version 2.1 product (released January 2020; Fick and Hijmans (25)), which represents average monthly air temperature and precipitation data for 1970–2000. The original WorldClim 2.1 product [25] was downloaded at a spatial resolution of 10 min, and resampled to 50 km × 50 km grids using the nearest-neighbor method. AveI was released by Beigaite, Tang (7), summarizing WorldClim 2.1 properties in terms of annual means (e.g., BIO1 and BIO12), seasonality (e.g., BIO4, BIO7, and BIO15), and limiting environmental factors to a monthly scale (e.g., BIO5, BIO6, and BIO14).

The CEI product was also released by Beigaite, Tang (7) using the CLIMDEX [26,27]. CLIMDEX comprises four datasets that were derived from different reanalysis datasets. Among these, Beigaite, Tang (7) used a dataset calculated from the ERA-Interim reanalysis dataset, which accurately reproduces the observed climate extremes [28]. The CEI data derived from the ERA-Interim reanalysis dataset covers 32 years (1979–2010). Multi-year CEI values were averaged for each grid; multi-year averages of extreme indices are commonly used to represent average extreme conditions in the past and future [26,27,29]. The original resolution of the CEI data was 1.5° × 1.5°; they were transformed onto 10 min × 10 min grids through conservative interpolation and then resampled to 50 km × 50 km grids using nearest-neighbor interpolation.

Future climate conditions (2061–2080) were projected using BIOCLIM [25] and CLIMDEX [26,27] indices derived from the Intergovernmental Panel on Climate Change (IPCC) Coupled Model Intercomparison Project Phase 5 (CMIP5) ensemble means of 11 models, averaged over that period. RCP8.5 was selected as it represents the most severe climate change scenario, providing a stringent test of model robustness under conditions far beyond the present-day range. In the IPCC’s Fifth Assessment Report (AR5, 2014), RCP8.5 assumes continued growth in global GHG emissions throughout the 21st century, reaching ~758 ppm CO2 by 2080 [30]. All variables were standardized to ensure compatibility between present and future datasets. Present-day BIOCLIM indices were obtained directly from WorldClim v2 [25]. Future BIOCLIM indices were generated from CMIP5 outputs that had been bias-corrected and adjusted to match the definitions and resolution and definition of the present-day BIOCLIM data. CLIMDEX indices were calculated using the same methods for historical and projected data [26,27]. These data sources and processing steps follow the same approach used in [7]. Equivalent CMIP6 products with the same resolution and correction methods are not yet widely available.

Intermediate projection periods (e.g., 2041–2060) were not used because the aim was to test model robustness under the most extreme climate conditions. Since RCP8.5 represents the highest emission trajectory, its far-future projection (2061–2080) was considered sufficient for evaluating extrapolative performance.

Data analysis

The learning performance of six climate dataset combinations—Ave, Ave + CEI, Ave + CEIpart, AveI, AveI + CEI, and AveI + CEIpart—was compared to disentangle three effects: (1) summarizing the climate data into indices, (2) adding extreme climate indices, and (3) adding overlapping extreme climate indices whose distributions are shared between current and future conditions. Four machine learning algorithms were applied to each dataset combination, resulting in 24 models in total. By structuring the analysis this way, I avoided the complexity of interpreting results from an exhaustive examination of which specific feature combinations the machine learning algorithms most strongly depended on.

For each model, 25% of all 52,297 grids (13,074 grids) were randomly selected for training, and the remaining 75% (39,223 grids) for testing. This proportion is the reverse of the typical 70–80% allocation to training [31], chosen to emphasize model robustness over performance and to ensure that rare vegetation types (<1%) were adequately represented in the test set. Training accuracy was defined as the proportion of correct predictions on the training data, and test accuracy as the proportion on the test data; their difference was used as the overfitting score [32]. To reduce sampling bias and better assess generalization, each model was evaluated in ten replicate experiments using different random seeds. The results were averaged across the ten independent 25:75 splits, serving a similar role to k-fold cross-validation.

To evaluate model robustness under future climate scenarios, I compared biome maps produced by different machine learning algorithms trained on the same dataset. Here, robustness refers to the stability of predictions across different algorithms. It was measured using the pairwise coincidence rate—the percentage of grid cells where two models assigned the same biome class. This metric captures agreement among models regardless of their match with the observed map, providing an indicator of consistency rather than accuracy (see Table 4).

Table 4. Degree of coincidence (%) in pairwise comparisons of simulated PNV under both current and the RCP8.5 future climate scenario. Asterisks indicate the exclusion of models with poor accuracies (i.e., excluding models trained with CEI as input data.). Models trained with NV were also excluded.

Ave Ave + CEI Ave + CEIpart AveI AveI + CEI AveI + CEIpart
RF
vs
SVM
85.6*
78.6*
86.4
3.4
86.3*
82.8*
84.4*
81.2*
86.1
4.1
85.5*
82.0*
RF
vs
CNN
70.8*
56.3*
71.5
43.4
74.1*
63.1*
70.8*
60.8*
70.9
56.8
73.5*
66.3*
SVM
vs
CNN
70.1*
65.4*
72.3
1.6
70.6*
51.7*
71.9*
65.4*
72.4
5.1
70.7*
66.0*

Results

Overall model accuracy

Across all training datasets, three of the four machine learning models—RF, CNN, and SVM—showed high test accuracy in reconstructing global PNV distributions (Table 3). Test accuracy ranged from 80.1–81.4% for RF, 77.1–82.0% for CNN, 74.6–78.0% for SVM, and 44.2–50.1% for NV. Based on these values, the ranking in descending order of accuracy was RF, CNN, SVM, and NV. The NV model performed markedly worse, with large misclassification areas in boreal and tropical forests (Figs 1, S4–S7 Fig).

Table 3. Test accuracy, training accuracy, and overfitting scores (% mean ± standard deviation; n = 10) for models based on four machine learning algorithms: random forest (RF), support vector machine (SVM), Naive Bayes classifier (NV), and convolutional neural network (CNN). Input variable abbreviations: Ave, averaged monthly air temperature and precipitation; AveI, averaged monthly climate indices; CEI, climate extreme indices; and CEIpart, a subset of CEI.

Input variable combinations RF SVM NV CNN
Ave 81.2 ± 0.21
100.0 ± 0.00
18.9 ± 0.21
76.4 ± 0.15
77.8 ± 0.28
1.38 ± 0.30
46.7 ± 0.84
46.8 ± 0.70
0.11 ± 0.40
79.1 ± 0.15
81.1 ± 0.90
2.05 ± 0.99
Ave + CEI 81.4 ± 0.20
100.0 ± 0.00
18.7 ± 0.20
78.0 ± 0.15
79.9 ± 0.25
1.92 ± 0.30
45.2 ± 1.20
45.3 ± 1.02
0.15 ± 0.36
80.1 ± 0.12
81.9 ± 0.94
1.78 ± 0.91
Ave + CEIpart 81.5 ± 0.22
100.0 ± 0.00
18.6 ± 0.22
77.7 ± 0.19
79.5 ± 0.32
1.83 ± 0.43
44.2 ± 1.24
44.4 ± 1.08
0.14 ± 0.32
81.8 ± 0.30
83.0 ± 0.44
0.75 ± 0.62
AveI 80.1 ± 0.22
100.0 ± 0.00
20.0 ± 0.22
74.6 ± 0.12
76.1 ± 0.33
1.53 ± 0.39
50.1 ± 0.88
50.5 ± 0.97
0.33 ± 0.59
77.1 ± 0.18
78.3 ± 1.02
1.19 ± 1.03
AveI + CEI 81.2 ± 0.21
100.0 ± 0.00
18.8 ± 0.21
77.7 ± 0.19
79.8 ± 0.14
2.05 ± 0.29
44.6 ± 1.82
44.7 ± 1.74
0.15 ± 0.49
79.9 ± 0.16
82.1 ± 0.90
2.17 ± 0.86
AveI + CEIpart 81.3 ± 0.18
100.0 ± 0.00
18.8 ± 0.18
76.9 ± 0.16
78.5 ± 0.38
1.91 ± 0.46
43.3 ± 2.26
43.5 ± 2.14
0.16 ± 0.43
82.0 ± 0.31
82.9 ± 0.48
1.06 ± 0.57

All models exceeded the baseline accuracy of 17.8%, obtained by predicting all grid cells as the most frequent PNV class, grassland (SI S1 Table). As a further reference, a naive model assigning each grid cell the most frequent PNV observed at its latitude achieved 49% accuracy—lower than all machine learning models except NV.

The NV model’s low test accuracy was mainly due to overestimating areas dominated by boreal forest, tropical rainforest, and deciduous broadleaf forest (Fig 4). In contract, the other models primarily showed discrepancies along PNV boundaries (Figs 2, 3, and 5), consistent with the observed fragmentation of biome distributions along PNV boundaries (Fig 1). The biome distributions reconstructed by the models, however, exhibited more continuous structures (Figs. S4–S7 Fig). The NV model’s poor performance likely stem from its assumption that prediction variables are independent—a condition violated by climate variables. The NV model was excluded due to its poor performance from further analysis and discussion Fig 4.

Fig 2. The geographical distributions of the PNVs that are incorrect compared to the observation-based data in the model output based on the Random Forest (RF) algorithm.

Fig 2

Four sets of climate data were used for training and simulation: (a) averaged monthly air temperature and precipitation (Ave), (b) averaged monthly climate indices (AveI), (c) Ave + climate extreme indices (CEI), (d) AveI + CEI, (e) Ave + a subset of CEI (CEIpart), and (f) AveI + CEIpart.

Fig 3. As in Fig 2, but for a support vector machine (SVM)-based model.

Fig 3

Fig 5. As in Fig 2, but for a convolutional neural network (CNN)-based model.

Fig 5

Fig 4. As in Fig 2, but for a Naive Bayesian classifier (NV)-based model.

Fig 4

Error analysis

All models showed similar weaknesses in classifying certain small-area biomes, “Wetlands” and “Closed Shrublands” (0.5% and 0.2% of grid cells, respectively). Using the most basic AVE dataset, test accuracy for Wetlands was RF: 23.4%, SVM: 0.0%, and CNN: 3.6%. For Closed Shrublands, the corresponding values were RF: 7.3%, SVM: 29.0%, and CNN: 5.7% (S2-S4 Tables). Notably, the SVM model did not classify any grid cells as “Wetland” across all 39,233 grids×10 tests. These errors were widely dispersed and occurred in all models. The poor performance for these small-area biomes likely reflects their limited representation in the training data, which in turn reduces classification accuracy in both present-day reconstructions and future projections, and increase inter-model variability in predicted range shifts.

Overall accuracy is a standard performance metric, but it can be misleading when class distributions are highly imbalanced, as in this study, where some biomes occupy less than 1% of grid cells. To provide a fairer assessment that accounts for chance agreement, Cohen’s Kappa [33] was calculated from the confusion matrices of models trained with the Ave dataset. Kappa values range from 0 to 1, with 0.61–0.80 indicating “substantial” agreement and 0.81–1.00 indicating “almost perfect” agreement. The results were RF: 0.784, SVM: 0.728, and CNN: 0.759, all within the “substantial” range. The ranking by Kappa was consistent with that by test accuracy: RF > CNN > SVM.

While Cohen’s Kappa offers a chance-corrected measure of overall agreement, it does not indicate the nature of classification discrepancies. To distinguish whether these discrepancies stem from differences in class proportions or from their spatial arrangement, I decomposed the errors into quantity disagreement and allocation disagreement [34]. Quantity disagreement quantifies the difference in the number of grid cells assigned to each biome class, whereas allocation disagreement measures mismatches in their spatial placement. This breakdown can help guide whether model improvements should focus on adjusting overall class proportions or on refining spatial accuracy.

Quantity disagreement values were RF: 0.019, SVM: 0.045, and CNN: 0.024. Allocation disagreement values were notably higher—RF: 0.169, SVM: 0.191, and CNN: 0.185—across all models. In every case, allocation disagreement (0.169–0.191) exceeded quantity disagreement (0.019–0.045), indicating that spatial allocation errors are the primary source of model uncertainty. Consistent with other metrics, the ranking was RF > CNN > SVM. This pattern likely reflects the fragmented nature of observation-based biome distributions in regions with similar climatic conditions (Fig 1), with most mismatches occurring along biome boundaries (Figs 2, 3, and 5).

Across models, summarizing data into indices reduced test accuracy. The change from Ave to AveI was −1.1% for RF, −1.8% for SVM, and −2.0% for CNN (Table 3). Adding extreme climate indices (CEI) generally improved accuracy. Compared with Ave, Ave + CEI increased accuracy by +0.2% (RF), + 1.6% (SVM), and +1.0% (CNN). Compared with AveI, AveI + CEI increased accuracy by +1.1% (RF), + 3.1% (SVM), and +2.8% (CNN). Replacing CEI with partial CEI (CEIpart) produced no consistent trend: For RF, the change was negligible (+0.1% vs. + 0.1%). For SVM, accuracy decreased (−0.3% vs. −0.8%), whereas for CNN, it increased (+1.7% vs. + 2.1%).

Training accuracy, overfitting, and future-climate consistency

For all models-datasets combinations, training accuracy exceeded test accuracy, resulting in a positive overfitting score (training accuracy – test accuracy; Table 3). The RF model consistently achieved 100% training accuracy, leading to highest overfitting scores (18.6–20.0%). SVM and CNN had much lower overfitting scores, at 1.38–2.05% and 0.75–2.17%, respectively.Under current climatic conditions, all models reconstructed highly coincident PNV distributions regardless of the training datasets (accuracy: 70.1–86.4%, Table 4). Differences in reconstructed PNV correspondence between model pairs were small: less than 2.0% between RF and SVM (accuracy: 84.5–86.4%), less than 3.3% between RF and CNN (accuracy: 70.8–74.1%), and less than 2.3% between SVM and CNN (accuracy: 70.1–72.4%).

When the trained models were applied to future climatic conditions−i.e., conditions beyond the training data−much larger differences emerged among the PNV distributions generated by different model and dataset combinations (accuracy: 1.6–82.8%; Table 4). Discrepancies were particularly pronounced in PNV maps from models trained on CEI datasets (see Figs S8–S11). SVM models trained on the CEI dataset predicted only evergreen broadleaf forest (Fig. SI9c, d), whereas CNN models produced maps dominated by grassland and savanna (Fig. SI11c,d). Substituting CEI data with partial CEI (CEIpart) reduced these extreme outputs (Figs. SI9e, f and 11e,f). When models trained with the NV algorithm and CEI dataset were excluded, the remaining models produced much more consistent PNV distributions under future climate conditions (accuracy: 51.7–82.8%, Table 4).

Discussion

Comparative performance of machine learning algorithms

Across all input dataset combinations, RF and CNN algorithms provided more accurate global PNV models than SVM and NV. A concise rationale for selecting these four algorithms and their contrasting assumptions is provided in the Methods section (“Machine learning algorithms”).

Hengl, Walsh (5) found that the RF algorithm consistently outperformed other machine learning algorithms, including neural networks. In their study, a stack of 160 global maps representing biophysical conditions over the terrestrial surface—including atmospheric, climatic, relief, and lithologic variables—was used as explanatory variables to predict 20 biome classes in the BIOME 6000 dataset [35]. Although a direct comparison with the present study is not possible, their findings support RF as an effective machine learning algorithm for reconstructing biome maps. The present study is the first to compare the performance of a CNN algorithm adapted for biome modeling [6] with that of other machine learning algorithms; this CNN showed performance comparable to that of the RF algorithm.

Model reliability: Overfitting and generalization

The RF and CNN algorithms exhibited comparable test accuracy; however, CNN was preferred because RF produced much higher overfitting scores than the other machine learning algorithms examined in this study. Although overfitting in RF could potentially be reduced by limiting tree depth, doing so would require adjusting the model based on the test data, which would compromise the experimental design by failing to keep calibration and testing independent. Overfitting is an inevitable risk associated with empirical models [32]. Fourcade, Besnard (19) demonstrated an extreme example of pseudo-predictive variables−randomly chosen classical paintings−increasing the accuracy of species distribution modeling; these models sometimes had even higher evaluation scores than models trained with relevant environmental variables. To avoid overfitting or the use of pseudo-predictive variables, Fourcade, Besnard (19) suggested investing greater effort in cross-validation and ensuring the selection of the most important predictors. This approach was followed in the present analysis.

Model robustness under climate extremes

Adding extreme climate data slightly improved test accuracy; however, it substantially reduced model robustness, defined here as the consistency of model predictions under forecast climate conditions. This reduction in robustness was primarily driven by six CEI variables whose distributions deviated markedly from those in the training data, underscoring the importance of assessing the distributions of both training and prediction variables when building empirical models. Because the improvement in test accuracy obtained by including extreme climate data did not outweigh the loss in robustness, I recommend excluding extreme climate data when predicting global biome distributions at the geographical resolution used in this study (0.5°).

Here, “robustness” is operationalized as cross-algorithm consistency under future climates; while this proxy is useful for decision-making, it can also reflect reduced sensitivity to novel or extreme conditions. The observed loss of consistency was primarily driven by a small set of CEI variables whose distributions diverged strongly between training and projection conditions. In practice, whether to include CEI should follow the research question: (i) for a high-agreement baseline map under extrapolative climates, exclude CEI; (ii) to explore a broader possibility space or risk envelopes, include CEI as a supplementary input; and (iii) for species- or local-scale questions, extremes may contribute more and could be prioritized.

Role of climate indices in simplifying input data

The climate index data used in this study reduced the number of variables by two-thirds (from 24 to 16). However, it only slightly decreased model accuracy (−1.1%, −1.8%, and −2.0% for RF, SVM, and CNN, respectively), demonstrating that the typical climate indices employed here effectively captured extracted essential climate information relevant to global biome distribution. Nevertheless, indexing has limited utility in building machine learning-based non-transparent models, it is essential for constructing interpretable models such as decision trees [7].

Contextualizing with previous studies and future directions

As shown above, the reliability of empirical models cannot be guaranteed beyond the range of their training data. In contrast, process-based models are expected to behave appropriately even when applied to environmental conditions that deviate slightly from those represented in observational datasets. This is one reason why many groups have proposed and developed dynamic global vegetation models (DGVMs) with greater fidelity to process to ecological processes, aiming to predict biome distributions mechanistically, considering climate, soil, and the fundamentals of plant physiology and ecology [36]. For example, Pugh, Rademacher (37) compared several DGVMs and identified discrepancies in their outputs and mechanisms, although their study did not specifically focus on biome distribution prediction. The expected increase in the frequency of extreme climate values in the future, which could significantly differ from the current distribution, may justify a shift from empirical models toward DGVMs. However, current DGVMs are not yet a reliable option for reconstructing plant population dynamic processes on a global scale; biome map predictions under commonly used climate change scenarios differ significantly among state-of-the-art DGVMs [37,38]. Therefore, empirical models continue to play an essential role in the approximate mapping of biomes under changing climatic conditions.

There is clear evidence that climate extremes influence a plant demographic processes such as growth [39,40], regeneration [41], and mortality [42,43], all of which affect plant species distributions. However, this does not necessarily mean that extreme climate data should always be included to improve biome map reconstruction, as mean climatic values are often strongly correlated with extreme climatic variables. Nevertheless, at local and species levels, extreme climate conditions may serve as more important predictors; Zimmermann, Yoccoz [44] showed that augmenting mean climate predictors with variables representing climate extremes can improves the predictive power of species distribution models.

A crucial disadvantage of the climatic envelope approach is that extrapolating current correlations between climate and biome distributions into the future may lead to substantially biased predictions. Thus, strong model performance under present climate conditions does not guarantee similar performance under novel climatic conditions that may arise in the future. However, no models—except those trained with the NV algorithm and the CEI dataset—showed notable increases in PNV uncertainty under projected climatic scenarios. This finding suggests that robust models can be developed beyond the training data if machine learning algorithms and climatic variables are carefully selected. The climatic envelope approach also has other limitations; for example, it ignores time lags between climate change and vegetation response, changes in atmospheric CO2, and human land-use change (as discussed in Sato and Ise (6)). Nonetheless, the climatic envelope approach remains useful for various applications, including benchmarking DGVMs [36].

Conclusion

CNN models trained on complete climate datasets without extreme indices provided the most balanced performance in terms of accuracy and robustness. This study examined how different machine learning algorithms and representations of climate data influence the accuracy and robustness of global PNV models. Both RF and CNN algorithms produced highly accurate models; however, RF exhibited substantial overfitting, which undermined its robustness. Consequently, the CNN model was considered more reliable overall.

Summarizing climate data into indices provided minimal benefit and slightly reduced model accuracy (by 1–2% in RF, CNN, and SVM), suggesting that such simplification may be unnecessary for non-transparent models like CNN. Although incorporating extreme climate variables marginally improved accuracy (by 1–2% across the same models), it also reduced robustness—particularly under future climate conditions that deviated from those represented in the training data. These results underscore the need to weigh carefully the trade-off between model accuracy and stability when including extreme climate variables.

Overall, the findings suggest that, within the current modeling framework, CNN-based models trained on complete climate datasets without extreme indices achieve a favorable balance between predictive accuracy and generalizability in global biome modeling. Although the CNN approach uses a graphically encoded input format, the underlying climate data remains consistent across models, enabling cautious yet meaningful comparisons between algorithms.

Supporting information

S1 Fig. Histograms of average monthly air temperature and precipitation (Ave, 24 variables).

Red bars: averages for 1970–2000; Blue bars averages for 2061–2080.

(PDF)

pone.0324107.s001.pdf (424.4KB, pdf)
S2 Fig. Histograms of average monthly climate indices (AveI, 16 variables).

Red bars: averages for 1970–2000; Blue bars averages for 2061–2080.

(PDF)

pone.0324107.s002.pdf (124.9KB, pdf)
S3 Fig. Histograms of climate extreme indices (CEI, 27 variables).

Red bars: averages for 1970–2000; Blue bars averages for 2061–2080.

(PDF)

pone.0324107.s003.pdf (303.8KB, pdf)
S4 Fig. Simulated potential natural vegetation (PNV) under current climatic conditions using the RF model.

Four sets of climate data were used for training and simulation: (a) Ave, (b) AveI, (c) Ave + CEI, (d) AveI + CEI, (e) Ave + CEIpart, and (f) AveI + CEIpart.

(PDF)

pone.0324107.s004.pdf (303.2KB, pdf)
S5 Fig. Simulated PNV under current climatic conditions using the SVM model.

The same experimental setup as in S4 Fig. was used.

(PDF)

pone.0324107.s005.pdf (271KB, pdf)
S6 Fig. Simulated PNV under current climatic conditions using the NV model.

The same experimental setup as in S4 Fig. was used.

(PDF)

pone.0324107.s006.pdf (363.4KB, pdf)
S7 Fig. Simulated PNV under current climatic conditions using the CNN model.

The same experimental setup as in S4 Fig. was used.

(PDF)

pone.0324107.s007.pdf (277.5KB, pdf)
S8 Fig. Simulated PNV under future climatic conditions (2061–2080) projected under the IPCC RCP8.5 scenario using the RF model.

Four climate datasets were used for training and simulation: (a) Ave, (b) AveI, (c) Ave + CEI, (d) AveI + CEI, (e) Ave + CEIpart, and (f) AveI + CEIpart.

(PDF)

pone.0324107.s008.pdf (305.4KB, pdf)
S9 Fig. Simulated PNV under future climatic conditions (2061–2080) using the SVM model.

The same experimental setup as in S8 Fig. was used.

(PDF)

pone.0324107.s009.pdf (262.7KB, pdf)
S10 Fig. Simulated PNV under future climatic conditions (2061–2080) using the NV model.

The same experimental setup as in S8 Fig. was used.

(PDF)

pone.0324107.s010.pdf (280.1KB, pdf)
S11 Fig. Simulated PNV under future climatic conditions (2061–2080) using the CNN model.

The same experimental setup as in S8 Fig. was used.

(PDF)

pone.0324107.s011.pdf (282.3KB, pdf)
S1 Table. Potential Natural Vegetation (PNV) classes used in the modelings.

From the IGBP classification, three human-mediated classifications (Croplands, Cropland/Natural Vegetation Mosaics, and Urban and Built-Up Lands) and Water Bodies were neglected. Descriptions were based on Loveland and Belward.

(PDF)

pone.0324107.s012.pdf (207.7KB, pdf)
S2 Table. Confusion matrix for biome classification using the AVE climate data set and the RF model.

Columns represent the actual classes, while rows represent the predicted class. This matrix is based on the test grid only, with a total of 392,230 predictions from 10 independent trials (39,223 grids×10 test). Shaded diagonal cells indicate correct classifications. Each cell shows the count (top) and column-wise percentage (bottom) within the actual class.

(PDF)

pone.0324107.s013.pdf (159KB, pdf)
S3 Table. Confusion matric for biome classification using the SVM model.

The same dataset and evaluation as in S2 Table were used.

(PDF)

pone.0324107.s014.pdf (74KB, pdf)
S4 Table. Confusion matrix for biome classification using the CNN model.

The same dataset and avaluation procedure as in S2 Table were used.

(PDF)

pone.0324107.s015.pdf (79.3KB, pdf)

Acknowledgments

The author thanks the anonymous reviewers of the previous version of the manuscript. Dr. Shuntaro WATANABE (Kagoshima Univ.) and Dr. Takeshi ISE (Kyoto Univ.) offered technical support regarding issues of CNN, including the installation of the pertinent computer environments.

Data Availability

All data required to reproduce the analyses described herein are openly available via Zenodo (https://doi.org/10.5281/zenodo.8113935).

Funding Statement

HS received two grants from the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan: [[Arctic Challenge for Sustainability II (ArCS II) [Program Grant Number JPMXD1420318865] [Arctic Challenge for Sustainability III (ArCS III) [Program Grant Number JPMXD1720251001]]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Lincoln R, Boxshall G, Clark P. A Dictionary of Ecology, Evolution and Systematics. 2nd edition ed. United Kingdom University Press, Cambridge; 1998. [Google Scholar]
  • 2.Adams J. Plants on the move. Vegetation-Climate Interaction - How Plants Make the Global Environment. Second Edition ed. Springer; 2010. p. 67–96. [Google Scholar]
  • 3.Prentice IC, Cramer W, Harrison SP, Leemans R, Monserud RA, Solomon AM. Special paper: a global biome model based on plant physiology and dominance, soil properties and climate. Journal of Biogeography. 1992;19(2):117. doi: 10.2307/2845499 [DOI] [Google Scholar]
  • 4.Pitman AJ. The evolution of, and revolution in, land surface schemes designed for climate models. Int J Climatol. 2003;23(5):479–510. [Google Scholar]
  • 5.Hengl T, Walsh MG, Sanderman J, Wheeler I, Harrison SP, Prentice IC. Global mapping of potential natural vegetation: an assessment of machine learning algorithms for estimating land potential. PeerJ. 2018;6:e5457. doi: 10.7717/peerj.5457 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sato H, Ise T. Predicting global terrestrial biomes with the LeNet convolutional neural network. Geosci Model Dev. 2022;15(7):3121–32. doi: 10.5194/gmd-15-3121-2022 [DOI] [Google Scholar]
  • 7.Beigaitė R, Tang H, Bryn A, Skarpaas O, Stordal F, Bjerke JW, et al. Identifying climate thresholds for dominant natural vegetation types at the global scale using machine learning: Average climate versus extremes. Glob Chang Biol. 2022;28(11):3557–79. doi: 10.1111/gcb.16110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Argles APK, Moore JR, Cox PM. Dynamic global vegetation models: searching for the balance between demographic process representation and computational tractability. PLOS Clim. 2022;1(9):e0000068. doi: 10.1371/journal.pclm.0000068 [DOI] [Google Scholar]
  • 9.Bonannella C, Hengl T, Parente L, de Bruin S. Biomes of the world under climate change scenarios: increasing aridity and higher temperatures lead to significant shifts in natural vegetation. PeerJ. 2023;11:e15593. doi: 10.7717/peerj.15593 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Friedl M, Sulla-Menashe D. MODIS/Terra Aqua Land Cover Type Yearly L3 Global 0.05Deg CMG V006. In Center NLPDAA; 2015. [Google Scholar]
  • 11.Friedl MA, Sulla-Menashe D, Tan B, Schneider A, Ramankutty N, Sibley A, et al. MODIS collection 5 global land cover: algorithm refinements and characterization of new datasets. Remote Sensing of Environment. 2010;114(1):168–82. doi: 10.1016/j.rse.2009.08.016 [DOI] [Google Scholar]
  • 12.Loveland TR, Belward AS. The International Geosphere Biosphere Programme Data and Information System global land cover data set (DISCover). Acta Astronautica. 1997;41(4–10):681–9. doi: 10.1016/s0094-5765(98)00050-2 [DOI] [Google Scholar]
  • 13.Beierkuhnlein C, Fischer J-C. Global biomes and ecozones – Conceptual and spatial communalities and discrepancies. Erdkunde. 2021;75(4):249–70. doi: 10.3112/erdkunde.2021.04.01 [DOI] [Google Scholar]
  • 14.Mucina L. Biome: evolution of a crucial ecological and biogeographical concept. New Phytol. 2019;222(1):97–114. doi: 10.1111/nph.15609 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Breiman L. Random forests. Machine Learning. 2001;45(1):5–32. doi: 10.1023/a:1010933404324 [DOI] [Google Scholar]
  • 16.Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97. doi: 10.1007/bf00994018 [DOI] [Google Scholar]
  • 17.Langley P, Iba W, Thompson K, editios. An analysis of Bayesian classifiers. In: Menlo Park, CA. 1992. [Google Scholar]
  • 18.R-Core-Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2018. [Google Scholar]
  • 19.Fourcade Y, Besnard AG, Secondi J. Paintings predict the distribution of species, or the challenge of selecting environmental predictors and evaluation statistics. Global Ecol Biogeogr. 2017;27(2):245–56. doi: 10.1111/geb.12684 [DOI] [Google Scholar]
  • 20.Caruana R, Niculescu-Mizil A. An empirical comparison of supervised learning algorithms. In: ICML ‘06: Proceedings of the 23rd international conference on Machine learning. Pittsburgh, Pennsylvania, USA, 2006. [Google Scholar]
  • 21.Benkendorf DJ, Hawkins CP. Effects of sample size and network depth on a deep learning approach to species distribution modeling. Ecological Informatics. 2020;60:101137. doi: 10.1016/j.ecoinf.2020.101137 [DOI] [Google Scholar]
  • 22.Botella C, Joly A, Bonnet P, Monestiez P, Munoz F. A deep learning approach to species distribution modelling. In: Joly A, Vrochidis S, Karatzas K, Karppinen A, Bonnet P, editors. Multimedia tools and applications for environmental & biodiversity informatics. Springer Nature; 2018. p. 169–99. [Google Scholar]
  • 23.Probst P, Wright MN, Boulesteix A. Hyperparameters and tuning strategies for random forest. WIREs Data Min & Knowl. 2019;9(3). doi: 10.1002/widm.1301 [DOI] [Google Scholar]
  • 24.LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44. doi: 10.1038/nature14539 [DOI] [PubMed] [Google Scholar]
  • 25.Fick SE, Hijmans RJ. WorldClim 2: new 1‐km spatial resolution climate surfaces for global land areas. Intl Journal of Climatology. 2017;37(12):4302–15. doi: 10.1002/joc.5086 [DOI] [Google Scholar]
  • 26.Sillmann J, Kharin VV, Zhang X, Zwiers FW, Bronaugh D. Climate extremes indices in the CMIP5 multimodel ensemble: Part 1. Model evaluation in the present climate. JGR Atmospheres. 2013;118(4):1716–33. doi: 10.1002/jgrd.50203 [DOI] [Google Scholar]
  • 27.Sillmann J, Kharin VV, Zwiers FW, Zhang X, Bronaugh D. Climate extremes indices in the CMIP5 multimodel ensemble: Part 2. Future climate projections. JGR Atmospheres. 2013;118(6):2473–93. doi: 10.1002/jgrd.50188 [DOI] [Google Scholar]
  • 28.Donat MG, Sillmann J, Wild S, Alexander LV, Lippmann T, Zwiers FW. Consistency of temperature and precipitation extremes across various global gridded in situ and reanalysis datasets. J Climate. 2014;27(13):5019–35. doi: 10.1175/jcli-d-13-00405.1 [DOI] [Google Scholar]
  • 29.Seneviratne SI, Hauser M. Regional climate sensitivity of climate extremes in CMIP6 Versus CMIP5 multimodel ensembles. Earths Future. 2020;8(9):e2019EF001474. doi: 10.1029/2019EF001474 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Stocker TF, Qin D, Plattner G-K, Tignor M, Allen SK, Boschung J, et al. Climate change 2013: The physical science basis. Contribution of Working Group I to the fifth assessment report of the Intergovernmental Panel on Climate Change. Cambridge, United Kingdom and New York, NY, USA: Cambridge University Press; 2013. p. 1535. [Google Scholar]
  • 31.Géron A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent. 2nd ed. ed. Sebastopol: O’Reilly Media; 2019. [Google Scholar]
  • 32.Leinweber DJ. Stupid data miner tricks. JOI. 2007;16(1):15–22. doi: 10.3905/joi.2007.681820 [DOI] [Google Scholar]
  • 33.Cohen J. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement. 1960;20(1):37–46. doi: 10.1177/001316446002000104 [DOI] [Google Scholar]
  • 34.Pontius RG Jr, Millones M. Death to Kappa: birth of quantity disagreement and allocation disagreement for accuracy assessment. International Journal of Remote Sensing. 2011;32(15):4407–29. doi: 10.1080/01431161.2011.552923 [DOI] [Google Scholar]
  • 35.Harrison SP. Biome 6000 DB classified plotfile version 1. In: Reading U, editor. 2017.
  • 36.Fisher RA, Koven CD, Anderegg WRL, Christoffersen BO, Dietze MC, Farrior CE, et al. Vegetation demographics in Earth System Models: A review of progress and priorities. Glob Chang Biol. 2018;24(1):35–54. doi: 10.1111/gcb.13910 [DOI] [PubMed] [Google Scholar]
  • 37.Pugh TAM, Rademacher T, Shafer SL, Steinkamp J, Barichivich J, Beckage B, et al. Understanding the uncertainty in global forest carbon turnover. Biogeosciences. 2020;17(15):3961–89. doi: 10.5194/bg-17-3961-2020 [DOI] [Google Scholar]
  • 38.Friend AD, Lucht W, Rademacher TT, Keribin R, Betts R, Cadule P, et al. Carbon residence time dominates uncertainty in terrestrial vegetation responses to future climate and atmospheric CO2. Proc Natl Acad Sci U S A. 2014;111(9):3280–5. doi: 10.1073/pnas.1222477110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Ciais P, Reichstein M, Viovy N, Granier A, Ogée J, Allard V, et al. Europe-wide reduction in primary productivity caused by the heat and drought in 2003. Nature. 2005;437(7058):529–33. doi: 10.1038/nature03972 [DOI] [PubMed] [Google Scholar]
  • 40.Jolly WM, Dobbertin M, Zimmermann NE, Reichstein M. Divergent vegetation growth responses to the 2003 heat wave in the Swiss Alps. Geophysical Research Letters. 2005;32(18). doi: 10.1029/2005gl023252 [DOI] [Google Scholar]
  • 41.Ibáñez I, Clark JS, LaDeau S, Lambers JHR. Exploiting temporal variability to understand tree recruitment response to climate change. Ecological Monographs. 2007;77(2):163–77. doi: 10.1890/06-1097 [DOI] [Google Scholar]
  • 42.Bigler C, Bräker OU, Bugmann H, Dobbertin M, Rigling A. Drought as an inciting mortality factor in scots pine stands of the valais, Switzerland. Ecosystems. 2006;9(3):330–43. doi: 10.1007/s10021-005-0126-2 [DOI] [Google Scholar]
  • 43.Villalba R, Veblen TT. Influences of large-scale climatic variability on episodic tree mortality in northern patagonia. Ecology. 1998;79(8):2624–40. doi: 10.1890/0012-9658(1998)079[2624:iolscv]2.0.co;2 [DOI] [Google Scholar]
  • 44.Zimmermann NE, Yoccoz NG, Edwards TC Jr, Meier ES, Thuiller W, Guisan A, et al. Climatic extremes improve predictions of spatial patterns of tree species. Proc Natl Acad Sci U S A. 2009;106 Suppl 2(Suppl 2):19723–8. doi: 10.1073/pnas.0901643106 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Krishna Vadrevu

27 Jun 2025

Dear Dr. Sato,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please address several comments raised by the reviewers. Also, please address these issues: a). random forests classification and overfitting and hyperparameter tuning issues; b). cross validation issues; c). use of very few grid cells from the global datasets: d). providing additional accuracy metrics; e). poor predictions for some of the classes in the model; f). figure modifications, etc.

Please submit your revised manuscript by Aug 11 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Krishna Prasad Vadrevu, Ph.D

Academic Editor

PLOS ONE

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.-->--> -->-->Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at -->-->https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and -->-->https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf-->--> -->-->2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. -->--> -->-->When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.-->--> -->-->3. Thank you for stating the following financial disclosure: -->-->Arctic Challenge for Sustainability II (ArCS II) [Program Grant Number JPMXD1420318865]  -->--> -->-->Please state what role the funders took in the study.  If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." -->-->If this statement is not correct you must amend it as needed. -->-->Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.-->--> -->-->4. Thank you for stating the following in the Acknowledgments Section of your manuscript: -->-->This work was funded by the Arctic Challenge for Sustainability II (ArCS II) [Program Grant Number JPMXD1420318865]. The author thanks the anonymous reviewers of the previous version of the manuscript. The authors declare no conflicts of interest.-->--> -->-->We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. -->-->Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: -->-->Arctic Challenge for Sustainability II (ArCS II) [Program Grant Number JPMXD1420318865] -->--> -->-->Please include your amended statements within your cover letter; we will change the online submission form on your behalf.-->--> -->-->5. Please note that your Data Availability Statement is currently missing the repository name. If your manuscript is accepted for publication, you will be asked to provide these details on a very short timeline. We therefore suggest that you provide this information now, though we will not hold up the peer review process if you are unable.-->--> -->-->6. We note that Figures 1 to 5 and S4 to S11 in your submission contain map images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.-->--> -->-->We require you to either present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or remove the figures from your submission:-->--> -->-->a. You may seek permission from the original copyright holder of Figures 1 to 5 and S4 to S11 publish the content specifically under the CC BY 4.0 license.  -->--> -->-->We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:-->-->“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”-->--> -->-->Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.-->--> -->-->In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”-->--> -->-->b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.-->-->The following resources for replacing copyrighted map figures may be helpful:-->--> -->-->USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/-->-->The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/-->-->Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html-->-->NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/-->-->Landsat: http://landsat.visibleearth.nasa.gov/-->-->USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#-->-->Natural Earth (public domain): http://www.naturalearthdata.com/-->?>

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Partly

Reviewer #4: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: N/A

Reviewer #4: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

Reviewer #4: Yes

**********

Reviewer #1:  Major Revisions

The manuscript attempts to apply different machine learning techniques to classify biome under present and future climate scenarios. This research topic is relevant and holds significant potential for impactful insights. Below are some suggestions which can help improve the clarity and rigor of the manuscript.

Figures

1. For the figures 1, 2, 3, 4, and 5, please increase DPI. Use at least 300 for Image Exports. In the current form, the legend is unable to read.

2. Please include a north arrow and scale bar in the spatial maps.

Tables

1. In Table 1, please consider introducing the abbreviation for the units - C or mm - before the first appearance in the table.

2. In Table 2, the abbreviations must be redefined in the Table caption or as a footnote to provide a better readability.

3. The tables 3, 4, and 5 can be combined into a single big table to reduce the visual clutter as the three tables share a same input variable combination and 4 models. Perhaps even, using a color-coding schema for train, test and overfitting accuracies could help? This could also reduce the overall number of figures and tables.

4. The same as above can be said for Table 6 and 7

Typos

1. Page 19, Line 311: “to an RF” must be changed to “to the RF”

2. The term “pseudo-predicting” must be changed to “pseudo-predictive”. The latter seems more reasonable

Suggestions and Questions

1. While the authors acknowledge the issue of overfitting in RF, the conclusion refers to it as ‘robust’. Given that RF achieves 100 percent training accuracy and significant overfitting, calling it robust appears inconsistent with the results. It maybe helps to rephrase this statement altogether to avoid confusion.

2. I believe the authors should have done hyper parameter tuning, even if it is at minimal, for a more even model comparison.

3. Could the authors clarify if the image conversion from climatic variables performed only to train CNN or, to even test and predict? Could the authors also clarify why they chose the method from Sato and Ise? Are there any advantages to it over other methods?

4. The study lacks a feature sensitivity analysis. Thus, it is not clear which climate variables are driving predictions. If the authors note this addition, it would be helping the study.

5. The authors trained using the climate data from years 1970-2000 and then tried to project for years 2061-2080. Could the authors provide details on the intermediatory validation between the years 2000-2060 and account for this gap? This would also help to assess the model’s temporal generalization.

Reviewer #2:  The manuscript is relevant and shows an interesting approach. I have some suggestions for improvement:

- The first sentence of the abstract was too direct. I suggest making it more explanatory.

- In the abstract, I suggest including the accuracy value mentioned so that the phrases “marginally decreased model accuracy” and “improved accuracy slightly” are not vague. How many % of accuracy?

- The last sentence of the abstract could be left in a way that recommends not using extreme climate data. Using “should not” ends up establishing a rule, which is complicated considering only one or a few studies.

- The introduction could be more detailed and show other examples on the topic. I suggest increasing it a little and referencing studies that have already been done.

- Usually, the largest amount of data is intended for training. Why was only 25% used in this study? Using little in training could be the reason for overfitting in the end. Despite the justification and that there were tests for using 25%, have other studies adopted this methodology? Is it reproducible?

- Were there no tests performed with changes to the algorithm parameters? The use of default parameters is generally not recommended for all analyses and all algorithms. It is interesting to test whether there was any change in performance and to find the hyperparameters. It is said that default was used, but cross validation and other techniques help to ensure that the models are reliable.

- The figures and map captions are blurry. I suggest making them clearer.

- When you say that "while CNN employed graphically converted climate data, preventing a conclusive determination of the superior approach" you limit the comparison with other algorithms and it would be interesting to better justify why you continued with the 4 algorithms.

- The conclusion could be more explanatory, it is very direct.

Reviewer #3: Overview of the Study

This study explores the use of machine learning (ML) models to simulate and project global Potential Natural Vegetation (PNV). The models were trained on baseline climate data from 1970 to 2000 and used to predict future vegetation distributions for the period 2061–2080 under the RCP 8.5 scenario from the IPCC's Coupled Model Intercomparison Project Phase 5 (CMIP5). Four ML algorithms—Random Forest (RF), Support Vector Machine (SVM), Naïve Bayes (NB), and Convolutional Neural Network (CNN)—were tested across six configurations of four climate-related datasets, yielding 24 distinct model scenarios. These datasets included averaged monthly temperature and precipitation, averaged monthly climate indices, climate extreme indices, and a reduced version of the climate extremes dataset with six variables excluded. The author concludes that CNN achieved the best performance. The inclusion of climate indices slightly decreased model accuracy, whereas the use of climate extreme indices led to a slight improvement.

Writing and Organization

Although the manuscript has been reviewed by two native English speakers and is grammatically sound, it lacks logical structure and clarity. The flow of ideas is often difficult to follow, making it challenging for readers to grasp the author's main points. It is recommended that the manuscript be reviewed by a scientific copy editor who can improve coherence, enhance conceptual clarity, and ensure that the writing is logically organized and accessible.

Machine Learning Methodology

The methodological choices in the study raise several concerns. The global dataset consists of 52,297 grid cells, yet only 25% of these were used for training while the remaining 75% were allocated for testing. This deviates from standard ML practice, where typically 70–80% of data is reserved for training. The author repeated this process 10 times to obtain average results, but no explanation is provided for the reversed training-test split or the repetition strategy.

Additionally, the confusion matrices presented in Tables S2 to S4 lack important context. It is unclear whether they correspond to training, test, or full datasets. Examination suggests they were computed on the full dataset, as the total number of records in the matrices is 52,303, which does not match the documented dataset size of 52,297. Furthermore, the matrices are mislabeled, with both rows and columns titled "Predicted Class," without identifying the true labels, which undermines interpretability.

Evaluation Metrics and Imbalanced Data

The manuscript reports only overall accuracy as the evaluation metric across the 24 model configurations. While this metric is commonly used, it is insufficient for imbalanced datasets such as this one, which involves 13 PNV classes based on the International Geosphere-Biosphere Programme classification. Overall accuracy can obscure poor performance on minority classes and misrepresent model effectiveness. To address this issue, the author should include additional performance metrics such as class-level precision, recall, and F1 scores, as well as aggregated measures like Cohen's Kappa and metrics such as the Area Under the ROC Curve (AUC-ROC). These metrics would provide a more robust and informative comparison of model performance. The use of techniques to address class imbalance, such as SMOTE (Synthetic Minority Over-sampling Technique), is also strongly recommended.

Abstract and Supporting Materials

The abstract is not sufficiently informative. It mentions only two of the four ML algorithms and omits any reference to the comparison of the 24 model-dataset configurations. As a result, readers must progress through a significant portion of the manuscript to understand the scope of the study. The abstract should be revised to clearly summarize the methodological design and key findings.

Other Issues

Several minor but important issues were noted. The figures lack adequate resolution, and Figures 6 through 9 are referenced in line 282 but are missing from the manuscript. The citation style is inconsistent; in some cases, the first two authors are named followed by a reference number, even when more than three authors are involved. Long URLs and access dates appear in the body of the text and should instead be properly formatted as references, in accordance with journal guidelines. There is also an inconsistency in the definition of the future projection period: while the study defines this period as 2061–2080, Supplementary Figures S8 to S11 appear to show data for the year 2100. This discrepancy requires clarification. Finally, the author does not justify the choice of CMIP5 data over the more recent CMIP6 datasets, which are increasingly favored in contemporary climate studies. Providing a rationale for this choice is essential.

Reviewer #4: The author presents a study on modelling biome distribution at global scale using machine learning. The performances of four machine-learning algorithms driven by six combinations of four current climate datasets are compared. The performances of each of the 24 resulting models (i.e., algorithm-climate dataset combinations) are assessed based on its ability to reproduce the current observation-based potential vegetation map produced by Beigaite et al. (2022) with limited overfitting, and its consistency with other models when predicting future potential natural vegetation under a Representative Concentration Pathway 8.5 (RCP8.5) climate scenario. Based on such comparisons of model performances, the author recommends using the CNN algorithm from Sato and Ise (2022) and excluding extreme climate data when building models to predict global-scale biome distribution.

I believe this is an interesting and timely study that aims to clarify and compare the current potential of different machine learning methods for predicting vegetation distribution on a global scale. Overall, while I think the study has the potential to be published, several major revisions are required. I find that the manuscript would particularly benefit from (1) a clarification of the research question, (2) a better contextualisation, and (3) a better description and justification for the methods, products, and programs used. I also find that the manuscript tends to rely too little on the literature, which leads to a certain amount of unsourced methodological choices and statements.

Major comments

(1) Stating a clear question along with some hypotheses from start would improve the readability. Several questions are addressed, and I'm having trouble identifying the real focus: is the priority to identify the best-performing ML algorithm, to assess the value of not summarising climate data into climate indices, to study the potential of incorporating extreme climate indices, or the potential of machine learning to predict the future distribution of biomes in the context of climate change?

(2) To better emphasise the value of the issues addressed in this study, I would find it helpful to describe from the start the general approach of machine learning and the interest of using it to predict the distribution of vegetation. In particular, what are the advantages of using machine learning compared with other existing approaches such as other correlative models or more process-based models. For example, Beigaite et al. (2022) and Bonannella et al. (2023) provide some details in their introductions that could be mentioned here. It is suggested on L40-52 that the main advantage of machine learning is the ability to use a larger amount of climate data, but it is not stated how this gives the machine learning an advantage in terms of its performance over other approaches. In general, I find it difficult to identify use cases for such models, given that biome/land-cover maps derived from satellite observations seem more relevant for mapping current distribution, and that the present manuscript demonstrates the difficulties of such models for predicting future vegetation distribution on a global scale. This point requires further attention and justification.

Bonannella, C., T. Hengl, L. Parente, and S. de Bruin. 2023. Biomes of the world under climate change scenarios: increasing aridity and higher temperatures lead to significant shifts in natural vegetation. PeerJ 11:e15593.

(3) Most of the products and programs used are treated as “black boxes” (machine learning algorithms and their default parameters, the potential natural vegetation map, climate data). Generally, little or nothing is said about the data and methods used to construct and validate them, the assumptions associated with them and the reasons for selecting them for this study. I find that this blurs the implications and conclusions of this study.

(4) The manuscript would benefit from a better description of the fundamental similarities and differences between the four machine learning approaches compared in this study, and the hypotheses associated with them. Hengl et al. (2018) included other machine learning algorithms in the comparison (from Friedman 2002, and Venables and Ripley 2002). Why choose to compare these four in particular? So far, although some specific aspects are being discussed, the algorithms generally appear to be black boxes and it is difficult to understand the origin of the differences observed. Before drawing general conclusions, it would be interesting to detail, at least in the form of hypotheses, the potential reasons why a model outperforms another, and what does it mean for the associated sets of data and assumptions.

Friedman, J. H. 2002. Stochastic gradient boosting. Computational Statistics & Data Analysis 38:367–378.

Venables, W. N., and B. D. Ripley. 2002. Modern applied statistics with S (Fourth Edition). New York: Springer-Verlag.

(5) The performances of the models are compared against their ability to predict the PNV map of Beigaite et al. (2022) which is itself derived from IGBP-MODIS from Friedl et al. (2010). The reasons for this choice are not given, although there are many other biome maps and classifications with numerous differences (reviewed for example in Mucina 2019, Beierkuhnlein and Fischer 2021, Hunter et al. 2021, Fischer et al. 2022, Champreux et al. 2024). After reading the manuscript, the potential effects of the choice of this product on the results and conclusions remain unclear.

Beierkuhnlein, C., and J.-C. Fischer. 2021. Global biomes and ecozones – Conceptual and spatial communalities and discrepancies. Erdkunde 75:249–270.

Champreux, A., F. Saltré, W. Traylor, T. Hickler, and C. J. A. Bradshaw. 2024. How to map biomes: Quantitative comparison and review of biome-mapping methods. Ecological Monographs 94:e1615.

Fischer, J.-C., A. Walentowitz, and C. Beierkuhnlein. 2022. The biome inventory – Standardizing global biogeographical land units. Global Ecology and Biogeography 31:2172–2183.

Hunter, J., S. Franklin, S. Luxton, and J. Loidi. 2021. Terrestrial biomes: a conceptual review. Vegetation Classification and Survey 2:73–85.

Mucina, L. 2019. Biome: evolution of a crucial ecological and biogeographical concept. New Phytologist 222:97–114.

(6) The unequal spatial coverage of the different biomes does not appear to be considered in the manuscript. The model performances are assessed via the percentage of correct answers. Because of its simplicity, such a measure seems to facilitate interpretation. However, it does not consider the variability of the areas covered by the different biomes. The choice of this metric over other common map comparison metrics such as Cohen’s kappa (Cohen 1960, Monserud and Leemans 1992) or the quantity-and-allocation agreement (Pontius and Millones 2011) thus requires better justification. It also seems obvious to me that the models should tend to better predict the most extensive biomes on a global scale since the models necessarily receive more data on them during the training phase. This is perhaps one of the potential reasons for the poor performance of the models in predicting the “Wetland” and “Closed Shrubland” biomes as stated on L236-244, respectively representing 0.5 and 0.2 % of the grid cells according to S1 table. The differences observed in future predictions are also likely to be affected by this unequal distribution of biomes. While the poor prediction of biomes covering smaller total areas globally affects current predictions less, their potential expansion in the future is likely to be poorly captured and to mechanically increase the disparities among models.

Cohen, J. 1960. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20:37–46.

Monserud, R. A., and R. Leemans. 1992. Comparing global vegetation maps with the Kappa statistic. Ecological Modelling 62:275–293.

Pontius, R. G., Jr, and M. Millones. 2011. Death to Kappa: birth of quantity disagreement and allocation disagreement for accuracy assessment. International Journal of Remote Sensing 32:4407–4429.

Minor comments

(7) L18. I would find it helpful to mention the four machine learning algorithms compared in the abstract.

(8) L40. “To date, many methods have been proposed to model biome distributions” Sato and Ise (2022) is cited here to support this statement, but also citing other sources reviewing the various methods to model biome distributions in greater details would be more appropriate. Generally, the statements in this paragraph should refer more to existing literature.

(9) L84-85. I wonder why eliminating only grid cells with 100 % human activity. Keeping in the analysis other highly impacted areas could possibly strongly affect the outputs. For example, Champreux et al. (2024) suggested that human activities could cause disagreement among biome maps including among PNV maps. Anyway, providing the percentage of grid cells that were eliminated due to the “100 % human activity” criteria would be helpful.

(10) L130-140. The models are trained with current, observation-based climate datasets but future predictions are made with simulated data that are associated with more uncertainties. Although these methodological differences are obviously unavoidable, I wonder whether they are likely to bias the assessment of model performances in future predictions. For example, it is not stated if the variables used have a standard definition that is common to both current and future datasets, or if some equivalencies have been performed.

(11) L134-135. “The Representative Concentration Pathway (RCP) 8.5 was the only used.” The choice of this single scenario should be justified.

(12) L144. Reference for CNN (Sato and Ise, 2022) is missing.

(13) L150-151. The author state “Simplicity was maintained and potential overfitting was mitigated by using the default settings in these commands”. This statement needs to be better justified.

(14) L154-158. This statement needs to be associated with supporting literature.

(15) L181. While I understand that model robustness was assessed based on the consistency among models for future predictions, this part of the analysis is not described in the method section, except for the presentation of the BIOCLIM data on L130-140. Describing it in the “Data analysis” section would be appropriate.

(16) L190. It is unclear if the selected value of 25% corresponds to standards of the discipline or was decided due to other reasons. Justifying it is important as it probably impacts the differences in overfitting scores of the models in the end, especially considering the 100% training accuracy of the RF models mentioned on L256-257.

(17) L199-200. “precisely” should be defined, as no decision threshold was provided in the method section.

(18) L204-205. “all grids at the same latitude were assigned the most frequent PNV at that latitude” This statement is unclear.

(19) L242. I guess that “The incorrectly identified biomes” is a typo that should be replaced with “The incorrectly identified grid cells”.

(20) L298-301. “Default parameter settings were used for all methods adopted in this study, and the models except the CNN were trained with climate data, while CNN employed graphically converted climate data, preventing a conclusive determination of the superior approach.” The implications of this methodological difference are unclear. It would be interesting to elaborate a bit more on this.

(21) Did Hengl et al. (2018) consider this overfitting issue with the RF algorithm?

(22) L322-325. If this statement means that the RF model is still showing strong overfitting despite these considerations, this calls into question the recommendations of Fourcade et al. (2018) and deserves to be discussed.

(23) L335-338. The consistency among future predictions is considered here as a proxy of model robustness. However, it also highlights the weaknesses of the machine learning algorithms for predicting future biome distributions where extremes are supposed to become more frequent and unprecedented. This point is discussed in the subsequent paragraphs, but the use of the consistency metric to assess model robustness is not questioned.

(24) L351-353. This is a rather strong statement justifying the use of machine learning over DGVMs for future biome map predictions. The statement is only supported here by citing Pugh et al. (2020). However, while Pugh et al. (2020) did compare several DGVMs and show several output and mechanism discrepancies, the study did not focus on predicting biome distribution, and did not provide such a conclusion.

(25) L364-367. This statement may be helpful in the introduction section to justify testing the incorporation of extreme climate variables for predictions at larger spatial scales (here biomes).

(26) L383-384. “thus, the CNN model is preferable”. This conclusion seems paradoxical given the earlier claim suggesting that CNN's performance is difficult to compare with that of other models, on L299-301: “the models except the CNN were trained with climate data, while CNN employed graphically converted climate data, preventing a conclusive determination of the superior approach.”

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Reviewer #4: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

PLoS One. 2026 Feb 26;21(2):e0324107. doi: 10.1371/journal.pone.0324107.r002

Author response to Decision Letter 1


13 Aug 2025

Dear Editor and Reviewers,

I sincerely thank the reviewers for their constructive comments, and I greatly appreciate your support and patience throughout the review process. I have revised the manuscript in accordance with the reviewers' suggestions and comments. Please find below my point-by-point responses addressing each concern raised.

In this letter, the line numbers in my responses refer to the clean version of the revised manuscript (i.e., the version without track changes). Throughout this letter, quoted comments from the reviewers are indicated by lines beginning with ">"”.

Kind regards,

Hisashi Sato (Author)

_____________________________

Revisions according to the editor's comments

1. Manuscript style

I have revised manuscript so that it meets PLOS ONE's style requirements, including those for file naming.

2. Funding

I removed funding-related text from the manuscript. Please use following statement for the grant information:

This work has been financed by the Arctic Challenge for Sustainability II (ArCS II) [Program Grant Number JPMXD1420318865]. I received no additional external funding for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

3. Data Availability Statement

Sorry for not stating the repository name in the Data Availability Statement. Please use the following phrase for this section:

All dataset files are available from the online repository: https://doi.org/10.5281/zenodo.8113935 (Zenodo data repository).

4. Copyright of map images in figures

I created these world maps using the R library "maps". The world map data included in the maps package is derived from public domain sources, and there are no copyright restrictions on general use or redistribution. The package's online documentation (https://cran.r-project.org/web/packages/maps/maps.pdf) includes the following statement on page 32, under "Description": This world map (updated in 2013) is imported from the public domain Natural Earth project (the 1:50m resolution version).

_____________________________

One-to-one response to comments by Reviewer #1

> The manuscript attempts to apply different machine learning techniques to classify biome under present and future climate scenarios. This research topic is relevant and holds significant potential for impactful insights. Below are some suggestions which can help improve the clarity and rigor of the manuscript.

Thank you for your effort in reviewing my manuscript.

I carefully studied your comments and addressed them as follows.

> Figures

> 1. For the figures 1, 2, 3, 4, and 5, please increase DPI. Use at least 300 for Image Exports. In the current form, the legend is unable to read.

I appreciate the reviewer's comment regarding the low resolution of Figures in the submitted PDF. The reduced quality appears to have resulted from the journal’s submission system automatically converting the manuscript and figures into a single PDF, which down-sampled the images during processing. The original image files meet or exceed 300 dpi. In the revised submission, I have re-uploaded all figures as separate high-resolution files (>=300 dpi) to ensure that the final published version maintains full clarity and readability.

> 2. Please include a north arrow and scale bar in the spatial maps.

Since the maps provided are global-scale Mercator projections with clearly indicated latitude and longitude lines, the direction (north upward) is universally implicit, and the scale varies significantly with latitude, making a single scale bar misleading. Therefore, I opted not to include a north arrow or scale bar, following standard cartographic practice.

> Tables

> 1. In Table 1, please consider introducing the abbreviation for the units - C or mm - before the first appearance in the table.

I have added a brief note below Table 1 to clarify units used.

Note: Units: °C = degrees Celsius; mm = millimeters.

> 2. In Table 2, the abbreviations must be redefined in the Table caption or as a footnote to provide a better readability.

I want to clarify that Table 2 does not contain any abbreviations that require definition. The letter combinations (e.g., FD) used in this table represent variable IDs, assigned solely for identification purposes, rather than abbreviations. Although the IDs seem to originate from descriptive terms (e.g., 'FD' from 'Frost Days'), they are intended only as identifiers in this context, and thus, explicit definitions of their origin in the caption or footnotes are unnecessary. I want your understanding regarding this decision.

> 3. The tables 3, 4, and 5 can be combined into a single big table to reduce the visual clutter as the three tables share a same input variable combination and 4 models. Perhaps even, using a color-coding schema for train, test and overfitting accuracies could help? This could also reduce the overall number of figures and tables.

I merged Tables 3-5 into a single table as recommended, revised the caption accordingly, and updated the corresponding in-text references. (Table 3 )

> 4. The same as above can be said for Table 6 and 7

I followed the same approach for Tables 6 and 7: they were combined into a single table, the caption was revised, and all in-text references were updated accordingly. (Table 4)

> Typos

> 1. Page 19, Line 311: "to an RF" must be changed to "to the RF"

Addressed (L378).

> 2. The term "pseudo-predicting" must be changed to "pseudo-predictive". The latter seems more reasonable

Addressed (L387, 390).

> Suggestions and Questions

> 1. While the authors acknowledge the issue of overfitting in RF, the conclusion refers to it as 'robust'. Given that RF achieves 100 percent training accuracy and significant overfitting, calling it robust appears inconsistent with the results. It maybe helps to rephrase this statement altogether to avoid confusion.

I agree with the inconsistency regarding the use of the term "robust" in describing the RF model performance. I have replaced the term "robust" with "effective," acknowledging the presence of overfitting while more accurately highlighting the model's predictive capabilities.

Previous (L306-308):

Although a direct comparison with the findings of the current study is impossible, this previous report supports RF as a robust machine learning algorithm for reconstructing biome maps.

Revised (L373-375):

Although a direct comparison with the present study is not possible, their findings support RF as an effective machine learning algorithm for reconstructing biome maps.

> 2. I believe the authors should have done hyper parameter tuning, even if it is at minimal, for a more even model comparison.

In the Machine learning algorithms subsection, I added sentences to clarify this rationale.

Inserted (L105-112):

All models were run with default settings to (1) ensure fair comparability by avoiding bias from parameter tuning, (2) keep the implementation straightforward and reproducible, and (3) align with the study’s objective of evaluating algorithms and data combinations rather than optimizing a single best-performing model. This choice also simplified implementation. The RF model, for instance, already achieved 100% accuracies on the training sets with default parameters, indicating strong overfitting and suggesting that further tuning would not improve generalization in this case.

> 3. Could the authors clarify if the image conversion from climatic variables performed only to train CNN or, to even test and predict? Could the authors also clarify why they chose the method from Sato and Ise? Are there any advantages to it over other methods?

I clarified in the revised manuscript that the image conversion was applied not only during training but also during testing and prediction to ensure consistency in the CNN modeling process. I also elaborated on the rationale for choosing the method proposed by Sato and Ise (2022).

Previous (L163-164):

This method represents climatic conditions using graphical images and employs them as training data for CNN models.

Revised (L124-130):

This method represents climatic conditions using graphical images and employs them as training, testing, and prediction data for CNN models. I selected this method because it allows CNNs-originally developed for image analysis-to automatically extract nonlinear seasonal patterns from multiple climate variables while preserving their temporal structure, enabling convolutional filters to identify spatially coherent features. This method automatically extracts nonlinear seasonal patterns for climatic variables relevant to biome classification.

> 4. The study lacks a feature sensitivity analysis. Thus, it is not clear which climate variables are driving predictions. If the authors note this addition, it would be helping the study.

I agree that feature sensitivity analysis is valuable in many contexts. However, the primary focus of this study was to evaluate (1) differences in performance among machine learning algorithms, (2) the effect of summarizing climate data into indices, and (3) the impact of incorporating extreme climate indices. Identifying the importance of specific climate variables was beyond the scope of this research.

Furthermore, as the CNN model used in this study relies on automatically extracted features from image-based inputs, conventional feature importance analysis is not readily applicable.

> 5. The authors trained using the climate data from years 1970-2000 and then tried to project for years 2061-2080. Could the authors provide details on the intermediatory validation between the years 2000-2060 and account for this gap? This would also help to assess the model’s temporal generalization.

I agree that intermediate validation could be informative in other contexts. However, as noted in the revised manuscript, this study was designed to test model robustness under maximum climate divergence. Therefore, only the RCP8.5 scenario and its end-of-century projection were used to fulfill this specific goal.

Inserted (L200-203):

Intermediate projection periods (e.g., 2041-2060) were not used because the aim was to test model robustness under the most extreme climate conditions. Since RCP8.5 represents the highest emission trajectory, its far-future projection.

_____________________________

One-to-one response to comments by Reviewer #2

> The manuscript is relevant and shows an interesting approach. I have some suggestions for improvement:

Thank you for your effort in reviewing my manuscript.

I carefully studied your comments and addressed them as follows.

> The first sentence of the abstract was too direct. I suggest making it more explanatory.

Indeed, the original sentence may have been too concise and direct, making the background and context of the study unclear to readers. Therefore, we revised the opening to first address the importance of the research topic before referring to previous studies.

Previous (L12-13):

Many methodologies have been proposed for modeling global biome distributions.

Revised (L12-13):

Understanding the global distribution of biomes is essential for biodiversity conservation, climate modeling, and land-use planning.

> In the abstract, I suggest including the accuracy value mentioned so that the phrases "marginally decreased model accuracy" and "improved accuracy slightly" are not vague. How many % of accuracy?

I revised the abstract and the conclusion to include specific accuracy changes (-1~2% for summarization, +1~2% for extreme indices), based on the RF, CNN, and SVM models.

Previous (L25-26):

Summarization of climate data into indices marginally decreased model accuracy, whereas incorporating extreme climate indices improved accuracy slightly.

Revised (L20-22):

Summarizing climate data into indices reduced accuracy by 1-2%, while adding extreme indices increased accuracy by <2% (except for NV, which performed poorly overall).

> The last sentence of the abstract could be left in a way that recommends not using extreme climate data. Using "should not" ends up establishing a rule, which is complicated considering only one or a few studies.

To shift from a prescriptive to a more suggestive tone and to allow room for reader interpretation, I revised the sentence as follows.

Previous (L29-31):

Based on these findings, extreme climate data should not be included in global-scale biome prediction models due to their detrimental impact on model robustness.

Revised (L24-26):

These results indicate that including extreme climate data in global biome prediction models offers limited accuracy gains but can significantly weaken robustness, so caution is advised.

> The introduction could be more detailed and show other examples on the topic. I suggest increasing it a little and referencing studies that have already been done.

I added a new paragraph (L57-74) in the Introduction for clarifying the advantages of machine learning over process-based models and land cover maps. I believe this addition sufficiently addresses the reviewer's request for more context and examples of previous work.

> Usually, the largest amount of data is intended for training. Why was only 25% used in this study? Using little in training could be the reason for overfitting in the end. Despite the justification and that there were tests for using 25%, have other studies adopted this methodology? Is it reproducible?

I added a sentence in the Methods section to clarify why a 25% training ratio was used in this study.

Inserted (L215-218)

This proportion is the reverse of the typical 70-80% allocation to training [29], chosen to emphasize model robustness over performance and to ensure that rare vegetation types (<1%) were adequately represented in the test set.

> Were there no tests performed with changes to the algorithm parameters? The use of default parameters is generally not recommended for all analyses and all algorithms. It is interesting to test whether there was any change in performance and to find the hyperparameters. It is said that default was used, but cross validation and other techniques help to ensure that the models are reliable.

In the Machine learning algorithms subsection, I added a sentences to clarify this rationale.

Inserted (L105-114)�

All models were run with default settings to (1) ensure fair comparability by avoiding bias from parameter tuning, (2) keep the implementation straightforward and reproducible, and (3) align with the study’s objective of evaluating algorithms and data combinations rather than optimizing a single best-performing model. This choice also simplified implementation. The RF model, for instance, already achieved 100% accuracies on the training sets with default parameters, indicating strong overfitting and suggesting that further tuning would not improve generalization in this case. While strategies such as cross-validation and variable selection can reduce overfitting [19], the RF results indicate that high-capacity models may still overfit even under best-practices settings.

> The figures and map captions are blurry. I suggest making them clearer.

I appreciate the reviewer's comment regarding the low resolution of Figures in the submitted PDF. The reduced quality appears to have resulted from the journal’s submission system automatically converting the manuscript and figures into a single PDF, which down-sampled the images during processing. The original image files meet or exceed 300 dpi. In the revised submission, I have re-uploaded all figures as separate high-resolution files (>=300 dpi) to ensure that the final published version maintains full clarity and readability.

> When you say that "while CNN employed graphically converted climate data, preventing a conclusive determination of the superior approach" you limit the comparison with other algorithms and it would be interesting to better justify why you continued with the 4 algorithms.

I deleted the sentence suggesting limited comparability of CNN, as earlier text already clarifies that the same underlying information was used. I also added brief notes to explain the rationale for

Attachment

Submitted filename: Responces.txt

pone.0324107.s017.txt (52.4KB, txt)

Decision Letter 1

Chong Xu

17 Nov 2025

Dear Dr. Sato,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jan 01 2026 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Chong Xu

Academic Editor

PLOS ONE

Journal Requirements:

If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise.

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: All comments have been addressed

Reviewer #4: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions??>

Reviewer #1: Yes

Reviewer #4: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: Yes

Reviewer #4: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: Yes

Reviewer #4: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

Reviewer #4: Yes

**********

Reviewer #1: I thank the author for addressing my previous comments, and I am satisfied with the responses. The figures and tables have been improved, and the manuscript is now clearer and sound. I recommend acceptance.

Reviewer #4: The author presents a study on modelling biome distribution at global scale using machine learning under present and future climate scenarios. This is the second time I read the manuscript, which has been largely revised, greatly improving its quality compared to the previous version. The author has made an effort to respond carefully to all comments. In particular, the overall research questions are stated more clearly and most of the methodological choices are now better justified.

I only have a few minor comments :

- L86-88. I still question the relevance of retaining in the analysis grid cells that are highly impacted by human activities but do not reach 100%. The manuscript states that "grid cells partially affected by human activity were retained, on the assumption that the relative proportions of natural vegetation remain stable despite these changes. However, it has been shown that PNV maps disagree when human activities are high, suggesting that the proportion of remaining natural vegetation is not sufficiently informative in these grid cells. I therefore find the justification insufficient.

- L93-94. While the manuscript explains the importance of using a PNV map over maps derived from satellite imagery, and describes the product used, the choice of this specific PNV map for this study over others is insufficiently justified. The statement « this dataset was selected because it is particularly well suited for global climate-vegetation modeling » is not associated to any argument.

- L352-366. The paragraph justifying the choice of these four specific ML algorithms and explaining their differences appears in the Discussion section. As it highlights one of the strengths of the study and illustrates well how the manuscript adresses the overall research questions, it would be more appropriate in the ‘Methods’ section under ‘Machine learning algorithms’.

- L392-402. The manuscript recommends not including extreme climate data since it reduces robustness (here, consistency among models) under forecast climate conditions. However, the differences observed in predictions of the future state of vegetation also make it possible to define a universe of possibilities and therefore seem rather interesting. The fact that this extreme climate data improves test accuracy, even slightly, makes this observation even more interesting. The possibility that only one of these models accurately predicts the response of biome distribution to climate change remains plausible, although unprovable. The choice of whether to include or exclude this data therefore seems to me to depend on the research question, and would thus benefit from more discussion.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

Reviewer #4: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

To ensure your figures meet our technical requirements, please review our figure guidelines: https://journals.plos.org/plosone/s/figures

You may also use PLOS’s free figure tool, NAAS, to help you prepare publication quality figures: https://journals.plos.org/plosone/s/figures#loc-tools-for-figure-preparation.

NAAS will assess whether your figures meet our technical requirements by comparing each figure against our figure specifications.

PLoS One. 2026 Feb 26;21(2):e0324107. doi: 10.1371/journal.pone.0324107.r004

Author response to Decision Letter 2


3 Dec 2025

______________________________________

Reviewer #1:

> I thank the author for addressing my previous comments, and I am satisfied with the responses. The figures and tables have been improved, and the manuscript is now clearer and sound. I recommend acceptance.

Response:

Thank you very much for your positive evaluation and supportive comments.

I sincerely appreciate your careful reading of the revised manuscript and your recognition of the improvements.

______________________________________

Reviewer #4:

> The author presents a study on modelling biome distribution at global scale using machine learning under present and future climate scenarios. This is the second time I read the manuscript, which has been largely revised, greatly improving its quality compared to the previous version. The author has made an effort to respond carefully to all comments. In particular, the overall research questions are stated more clearly and most of the methodological choices are now better justified.

Response:

Thank you very much for your positive evaluation and supportive comments.

I sincerely appreciate your careful reading of the revised manuscript and your recognition of the improvements.

Below, I provide point-by-point responses to each of your comments.

(1) Comment on L86-88.

> I still question the relevance of retaining in the analysis grid cells that are highly impacted by human activities but do not reach 100%. The manuscript states that "grid cells partially affected by human activity were retained, on the assumption that the relative proportions of natural vegetation remain stable despite these changes. However, it has been shown that PNV maps disagree when human activities are high, suggesting that the proportion of remaining natural vegetation is not sufficiently informative in these grid cells. I therefore find the justification insufficient.

Response:

In Methods (section "Biome data"), I added a brief explanation of the study aim-estimating PNV at 0.5°-and why grid cells with partial human influence still retain an informative dominant natural-vegetation signal at this scale. I also specified the exclusion of cells with 100% human cover and/or water and the corresponding exclusion ratio, and explicitly noted increased uncertainty in heavily impacted regions.

[Inserted text (L102-109)]

Because the aim of this study is to model potential natural vegetation (PNV), I retained grid cells that are only partially affected by human activity, on the premise that the dominant natural-vegetation signal remains informative at 0.5° resolution. By contrast, cells with 100% human cover and/or water were excluded, leaving 52,297 land grid cells (approximately 10-11% of land cells excluded, Antarctica removed). This choice minimizes spatial coverage bias while preserving information on the prevailing natural type. I acknowledge, however, that uncertainty in PNV estimation may increase in regions with strong human influence, and I interpret results in those areas with caution.

(2) Comment on L93-94.

> While the manuscript explains the importance of using a PNV map over maps derived from satellite imagery, and describes the product used, the choice of this specific PNV map for this study over others is insufficiently justified. The statement <<this dataset was selected because it is particularly well suited for global climate-vegetation modeling >> is not associated to any argument.

Response:

In Methods, I added sentences giving practical reasons for selecting this PNV dataset (alignment with our analysis resolution, consistent processing with AveI/CEI and future projections, and improved comparability with related studies).

[Inserted text (L94-101)]

I used this PNV dataset for three practical reasons aligned with global climate-vegetation modelling at 0.5°: (i) the workflow-deriving the dominant natural class from MODIS/IGBP and resampling to ~50-km grids-matches my analysis grid; (ii) its climate inputs and projections are processed consistently with the BIOCLIM (AveI) and CLIMDEX (CEI) indices used here, with harmonized definitions and resolution for present data and CMIP5-based futures; and (iii) employing the same dataset as related machine-learning studies facilitates comparability without additional preprocessing.

(3) Comment on L352-366.

> The paragraph justifying the choice of these four specific ML algorithms and explaining their differences appears in the Discussion section. As it highlights one of the strengths of the study and illustrates well how the manuscript adresses the overall research questions, it would be more appropriate in the ‘Methods’ section under ‘Machine learning algorithms’.

Response:

As suggested, I kept only the opening performance-summary sentence in the Discussion and moved the subsequent rationale and contrasts among the algorithms to Methods (Machine learning algorithms). I also added a one-sentence pointer in the Discussion directing readers to Methods; the substantive content itself was unchanged.

[Inserted text (L382-384)]

A concise rationale for selecting these four algorithms and their contrasting assumptions is provided in the Methods section ("Machine learning algorithms").

(4) Comment on L392-402.

> The manuscript recommends not including extreme climate data since it reduces robustness (here, consistency among models) under forecast climate conditions. However, the differences observed in predictions of the future state of vegetation also make it possible to define a universe of possibilities and therefore seem rather interesting. The fact that this extreme climate data improves test accuracy, even slightly, makes this observation even more interesting. The possibility that only one of these models accurately predicts the response of biome distribution to climate change remains plausible, although unprovable. The choice of whether to include or exclude this data therefore seems to me to depend on the research question, and would thus benefit from more discussion.

Response:

To avoid redundancy, I deleted the original closing sentence of the paragraph and replaced it with three sentences that (i) define our operational use of robustness, (ii) attribute the loss of consistency to CEI variables with strong distributional shifts, and (iii) provide purpose-dependent guidance on when to include or exclude CEI.

[Deleted L400-402 of the cleaned previous manuscript]

Although consistency is used here as a proxy for robustness, it may also indicate a lack of sensitivity to novel or extreme conditions.

[Inserted text (L419-427)]

Here, "robustness" is operationalized as cross-algorithm consistency under future climates; while this proxy is useful for decision-making, it can also reflect reduced sensitivity to novel or extreme conditions. The observed loss of consistency was primarily driven by a small set of CEI variables whose distributions diverged strongly between training and projection conditions. In practice, whether to include CEI should follow the research question: (i) for a high-agreement baseline map under extrapolative climates, exclude CEI; (ii) to explore a broader possibility space or risk envelopes, include CEI as a supplementary input; and (iii) for species- or local-scale questions, extremes may contribute more and could be prioritized.

Attachment

Submitted filename: ReponseLetter.docx

pone.0324107.s018.docx (17.6KB, docx)

Decision Letter 2

Chong Xu

10 Dec 2025

Predicting dominant terrestrial biomes at a global scale using machine learning algorithms, climate variable indices, and extreme event indices

PONE-D-25-21376R2

Dear Dr. Sato,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager®  and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support .

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Chong Xu

Academic Editor

PLOS One

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Chong Xu

PONE-D-25-21376R2

PLOS One

Dear Dr. Sato,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS One. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Chong Xu

Academic Editor

PLOS One

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Histograms of average monthly air temperature and precipitation (Ave, 24 variables).

    Red bars: averages for 1970–2000; Blue bars averages for 2061–2080.

    (PDF)

    pone.0324107.s001.pdf (424.4KB, pdf)
    S2 Fig. Histograms of average monthly climate indices (AveI, 16 variables).

    Red bars: averages for 1970–2000; Blue bars averages for 2061–2080.

    (PDF)

    pone.0324107.s002.pdf (124.9KB, pdf)
    S3 Fig. Histograms of climate extreme indices (CEI, 27 variables).

    Red bars: averages for 1970–2000; Blue bars averages for 2061–2080.

    (PDF)

    pone.0324107.s003.pdf (303.8KB, pdf)
    S4 Fig. Simulated potential natural vegetation (PNV) under current climatic conditions using the RF model.

    Four sets of climate data were used for training and simulation: (a) Ave, (b) AveI, (c) Ave + CEI, (d) AveI + CEI, (e) Ave + CEIpart, and (f) AveI + CEIpart.

    (PDF)

    pone.0324107.s004.pdf (303.2KB, pdf)
    S5 Fig. Simulated PNV under current climatic conditions using the SVM model.

    The same experimental setup as in S4 Fig. was used.

    (PDF)

    pone.0324107.s005.pdf (271KB, pdf)
    S6 Fig. Simulated PNV under current climatic conditions using the NV model.

    The same experimental setup as in S4 Fig. was used.

    (PDF)

    pone.0324107.s006.pdf (363.4KB, pdf)
    S7 Fig. Simulated PNV under current climatic conditions using the CNN model.

    The same experimental setup as in S4 Fig. was used.

    (PDF)

    pone.0324107.s007.pdf (277.5KB, pdf)
    S8 Fig. Simulated PNV under future climatic conditions (2061–2080) projected under the IPCC RCP8.5 scenario using the RF model.

    Four climate datasets were used for training and simulation: (a) Ave, (b) AveI, (c) Ave + CEI, (d) AveI + CEI, (e) Ave + CEIpart, and (f) AveI + CEIpart.

    (PDF)

    pone.0324107.s008.pdf (305.4KB, pdf)
    S9 Fig. Simulated PNV under future climatic conditions (2061–2080) using the SVM model.

    The same experimental setup as in S8 Fig. was used.

    (PDF)

    pone.0324107.s009.pdf (262.7KB, pdf)
    S10 Fig. Simulated PNV under future climatic conditions (2061–2080) using the NV model.

    The same experimental setup as in S8 Fig. was used.

    (PDF)

    pone.0324107.s010.pdf (280.1KB, pdf)
    S11 Fig. Simulated PNV under future climatic conditions (2061–2080) using the CNN model.

    The same experimental setup as in S8 Fig. was used.

    (PDF)

    pone.0324107.s011.pdf (282.3KB, pdf)
    S1 Table. Potential Natural Vegetation (PNV) classes used in the modelings.

    From the IGBP classification, three human-mediated classifications (Croplands, Cropland/Natural Vegetation Mosaics, and Urban and Built-Up Lands) and Water Bodies were neglected. Descriptions were based on Loveland and Belward.

    (PDF)

    pone.0324107.s012.pdf (207.7KB, pdf)
    S2 Table. Confusion matrix for biome classification using the AVE climate data set and the RF model.

    Columns represent the actual classes, while rows represent the predicted class. This matrix is based on the test grid only, with a total of 392,230 predictions from 10 independent trials (39,223 grids×10 test). Shaded diagonal cells indicate correct classifications. Each cell shows the count (top) and column-wise percentage (bottom) within the actual class.

    (PDF)

    pone.0324107.s013.pdf (159KB, pdf)
    S3 Table. Confusion matric for biome classification using the SVM model.

    The same dataset and evaluation as in S2 Table were used.

    (PDF)

    pone.0324107.s014.pdf (74KB, pdf)
    S4 Table. Confusion matrix for biome classification using the CNN model.

    The same dataset and avaluation procedure as in S2 Table were used.

    (PDF)

    pone.0324107.s015.pdf (79.3KB, pdf)
    Attachment

    Submitted filename: Responces.txt

    pone.0324107.s017.txt (52.4KB, txt)
    Attachment

    Submitted filename: ReponseLetter.docx

    pone.0324107.s018.docx (17.6KB, docx)

    Data Availability Statement

    All data required to reproduce the analyses described herein are openly available via Zenodo (https://doi.org/10.5281/zenodo.8113935).


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES