A long-term gridded dataset of aboveground net primary productivity for global natural grasslands

Ziwei Chen; Dongsheng Zhao; Zhiyuan Zhang; Liming Zhang; Du Zheng

doi:10.1038/s41597-026-06944-7

. 2026 Feb 27;13:550. doi: 10.1038/s41597-026-06944-7

A long-term gridded dataset of aboveground net primary productivity for global natural grasslands

Ziwei Chen ^1,², Dongsheng Zhao ^2,^✉, Zhiyuan Zhang ^1,^✉, Liming Zhang ¹, Du Zheng ²

PMCID: PMC13066554 PMID: 41748605

Abstract

A long-term dataset of aboveground net primary productivity (ANPP) for global natural grasslands is essential for carbon dynamics modeling and sustainable land management. However, existing datasets are limited: they often fail to separate above- and below-ground productivity or reflect only post-disturbance conditions. To address these gaps, we developed a gridded annual ANPP dataset using machine learning, spanning historical (1958–2023) and future (2015–2100) periods. Historical ANPP data were derived from TerraClimate at 1/24° spatial resolution, while future projections came from CMIP6 models under SSP245 and SSP585 scenarios at 1/2° resolution. Our model performed robustly (R² = 0.675 ± 0.009), showing temporal and spatial reliability through cross-validation with published products. Notably, systematic ANPP underestimation occurs in high-productivity regions (>700 g m⁻²) due to sparse field observations, so values in these areas should be interpreted with caution. Our dataset provides a spatially explicit baseline of climate-driven productivity, supporting precise evaluation of human impacts on grasslands and informing adaptive management under climate change.

Subject terms: Grassland ecology, Carbon cycle, Agroecology

Background & Summary

Grasslands cover around half of the world’s ice-free land¹ and provide about one-third of terrestrial net primary productivity (NPP)². Aboveground NPP (ANPP) directly supplies energy for animal products and underpins livestock farming³. ANPP also reflects the plant carbon uptake, affecting the global carbon balance⁴. Mapping the global distribution of grassland ANPP is essential for assessing the sustainability of grazing ecosystems and advancing carbon cycle research.

Climate change and human activities, especially livestock grazing, significantly influence grassland ANPP^5,6. Quantifying the isolated impacts of human activities on ANPP facilitates the development of targeted grassland management and restoration strategies⁷, which requires knowledge of baseline levels of ANPP in natural grasslands without human disturbance⁸. Additionally, since climate change is a long-term issue⁹, compiling multidecadal ANPP datasets for natural grasslands is indispensable for refining ecological models and informing policymaking.

However, the estimation of ANPP in natural grasslands encounters methodological challenges. Climate-based NPP estimations, such as the Miami model¹⁰, and satellite-based NPP estimations, such as the light use efficiency model¹¹, often neglect the NPP allocation within plants, biasing ANPP estimates. Moreover, satellite observations primarily reflect anthropogenically modified vegetation conditions, rendering them unsuitable for reconstructing pre-disturbance natural grassland ANPP¹². While ecosystem process models can simulate carbon uptake and allocation in natural grasslands¹³, they rely on extensive plant physiological parameters¹⁴, posing significant constraints on global-scale calibration and validation¹⁵.

Accumulated field observations of grassland ANPP and developed machine learning techniques have fostered global simulation of natural grassland ANPP. For instance, Sun, Feng et al.⁸ used the random forest model to generate a global gridded dataset of natural grassland ANPP, based on site-specific observations. However, their study has some limitations. First, it provides only multi-year averages without time-varying data. Second, mismatches in timing between ANPP observations and covariates reduce prediction accuracy. Lastly, the large number of covariates and the lack of access to some of them limit model generalization and data reuse.

To address these gaps, we developed a new framework to simulate annual ANPP over the long term in natural grasslands (Fig. 1). By combining multi-year ANPP averages from Sun, Feng et al.⁸ with annual weather records, we effectively captured the interannual dynamics of ANPP while minimizing the complexity of model inputs. Ultimately, we established a gridded annual ANPP dataset for global natural grasslands¹⁶, consisting of two subsets: one based on the TerraClimate database, spanning 1958–2023 at 1/24° resolution; the other derived from CMIP6 projections under SSP245 and SSP585 climate scenarios, from 2015 to 2100 at 1/2° resolution. The dataset allows researchers to investigate how global grassland ANPP responds to climate change and human activities while also supporting precise assessments of grassland carrying capacity.

Methods

This study developed a spatially explicit gridded dataset of annual grassland ANPP by integrating published multi-year mean ANPP data with field observations and time-varying meteorological records. This study operates under the premise that interannual ANPP variability is primarily attributed to climatic factors rather than time-invariant regulators, e.g., soil and topography.

The main steps of this study were as follows:

Field observations of natural grassland ANPP from undisturbed sites were collected from multiple sources and compiled into a comprehensive dataset.
Multi-year mean ANPP values from the published dataset were spatially matched and extracted to align with field-measured ANPP locations.
Historical and future meteorological data were aggregated into annual values.
Annual anomalies of each climate variable relative to their 1970–2000 multi-year means were quantified.
Six machine learning models were used to model grassland ANPP, with the top-performing one selected based on predictive accuracy.
The hyperparameters of the top-performing model were optimized via grid search, and the model was trained across optimal hyperparameters.
The best model was applied to historical and future climate scenarios to generate gridded ANPP estimates.
Dataset reliability was validated by benchmarking against published ANPP datasets.

Data

Field-measured grassland ANPP data

We integrated three distinct datasets into the comprehensive ANPP field-measured dataset for natural grasslands to date. The data sources included: (1) the global change experiment dataset by Song et al.¹⁷ (10.6084/m9.figshare.7442915.v13); (2) the grassland ANPP field observation dataset by Sun et al.¹⁸ (10.5061/dryad.7sqv9s4vv); and (3) the livestock grazing experiment dataset extracted from Chen et al.¹⁹. Records were retained if two criteria were met: (1) explicit reporting of sampling location and year; and (2) derivation from natural grasslands (control sites without grazing). After deduplication, the final dataset comprises 1,503 unique records—100 from Song et al.¹⁷, 1,284 from Sun et al.¹⁸, and 119 from Chen et al.¹⁹. Observations in this dataset are spatially extensive, biome-diverse, and globally representative, spanning all continents except Antarctica (Fig. 2).

Fig. 2 — Spatial distribution of aboveground net primary productivity (ANPP) measurement sites. Global grassland distribution was derived from Dixon *et al*.⁵⁷.

Mean annual grassland ANPP gridded data

We sourced the global gridded ANPP dataset for natural grasslands from Sun, Feng et al.⁸ (https://zenodo.org/records/5554579), which features a 1-km spatial resolution and reflects the multi-year average spanning 1970–2000. Subsequently, we spatially aligned and extracted mean annual ANPP values for sites in the field measurement dataset based on their geographic locations.

Time-varying climate data

We obtained time-varying data for five climate variables—temperature, precipitation, solar radiation, potential evapotranspiration, and aridity index since Sun, Feng et al.⁸ incorporated these variables in their grassland ANPP simulation. Potential evapotranspiration was calculated via the Penman-Monteith method²⁰, while the aridity index was defined as the ratio of precipitation to potential evapotranspiration.

Historical climate data were sourced from the TerraClimate database²¹ (10.7923/G43J3B0R), which was developed based on the WorldClim v1.4²², Climate Research Unit time series (CRU TS) v4.0²³, and the Japanese 55-year Reanalysis (JRA-55) datasets²⁴. This database provides monthly climate data at 1/24° resolution for 1958–2023, later aggregated to annual values.

We obtained future climate data from the Coupled Model Intercomparison Project Phase 6 (CMIP6, https://esgf-node.llnl.gov/search/cmip6), including monthly climate simulations from 13 CMIP6 models (Table 1) and potential evapotranspiration derived from these simulations²⁵ (10.5281/zenodo.7789759). All CMIP6 data were bias-corrected using the Quantile Delta Mapping (QDM) method²⁶ to align with the TerraClimate baseline (1970–2000). This approach preserves the long-term trends of all quantiles of the climate distribution, thereby improving the reliability of our future projections. We evaluated two Shared Socioeconomic Pathways (SSPs) scenarios: SSP245 (low-emission) and SSP585 (high-emission). All CMIP6 data were spatially resampled to 1/2° resolution using bilinear interpolation²⁷ and aggregated to annual values.

Table 1.

Basic information of the 13 CMIP6 models used in this study.

Model	Institute	Country	Resolution
ACCESS-CM2	Commonwealth Scientific and Industrial Research Organisation	Australia	1.875° × 1.25°
ACCESS-ESM1-5	Commonwealth Scientific and Industrial Research Organisation	Australia	1.875° × 1.2414°
CMCC-CM2-SR5	Fondazione Centro Euro-Mediterraneo	Italy	1.25° × 0.9375°
CMCC-ESM2	Fondazione Centro Euro-Mediterraneo	Italy	1.25° × 0.9375°
EC-Earth3	Swedish Meteorological and Hydrological Institute et al.	Europe	0.7031° × 0.7031°
GFDL-ESM4	National Oceanic and Atmospheric Administration, Geophysical Fluid Dynamics Laboratory	America	1.25° × 1.0°
INM-CM4-8	Institute for Numerical Mathematics	Russia	2.0° × 1.5°
INM-CM5-0	Institute for Numerical Mathematics	Russia	2.0° × 1.5°
IPSL-CM6A-LR	Institut Pierre Simon Laplace	France	2.5° × 1.2587°
MIROC6	Atmosphere and Ocean Research Institute, The University of Tokyo, Japan Agency for Marine-Earth Science and Technology	Japan	1.4063° × 1.4063°
MPI-ESM1-2-HR	Max Planck Institute for Meteorology	Germany	0.9375° × 0.9375°
MPI-ESM1-2-LR	Max Planck Institute for Meteorology	Germany	1.875° × 1.9565°
MRI-ESM2-0	Meteorological Research Institute	Japan	1.25° × 1.25°

Open in a new tab

Satellite-derived NPP data

We validated our ANPP dataset using two NPP products: GLASS (Global LAnd Surface Satellite) NPP and BEPS (Boreal Ecosystem Productivity Simulator) NPP. The GLASS NPP relies on vegetation indices derived from the AVHRR (Advanced Very High Resolution Radiometer) to quantify vegetation growth²⁸. It employs the revised EC-LUE (Eddy Covariance-Light Use Efficiency) model to simulate gross primary productivity (GPP) and integrates the ratio of autotrophic respiration to GPP simulated by 10 dynamic global vegetation models from the TRENDY project to derive NPP²⁸. This dataset (https://www.glass.hku.hk/archive/NPP/AVHRR/0.05D) spans 1982–2018 with a 0.05° spatial resolution.

The BEPS model is a process-based diagnostic model that fuses AVHRR and MODIS (Moderate Resolution Imaging Spectroradiometer) data to simulate vegetation carbon input²⁹. It calculates GPP for sunlit and shaded leaves using the Farquhar model, accounting for growth and maintenance respiration fluxes to derive NPP²⁹. The BEPS NPP dataset (10.12199/nesdc.ecodb.2016YFA0600200.02.002) covers 1981–2019 at a 0.0727° spatial resolution.

Process-based NPP data

We validated the temporal reliability of our ANPP dataset with independent NPP outputs from 20 DGVMs in TRENDY v14 (Table 2). These models simulate vegetation carbon dynamics through distinct ecological mechanisms, providing true independence from our machine-learning approach³⁰. We extracted monthly NPP outputs under the S2 scenario (historical climate change with constant pre-industrial land use) for 1958–2023 (https://mdosullivan.github.io/GCB). This scenario is widely adopted to isolate climate-driven signals in vegetation productivity³¹. All DGVM outputs were to 1/2° by bilinear interpolation, aggregated to annual values, and ensemble-averaged.

Table 2.

Basic information of the 20 dynamic global vegetation models used in this study.

Model	Institute	Country	Resolution
CABLE-POP	University of Technology Sydney	Australia	1° × 1°
CLASSIC	Environment and Climate Change Canada	Canada	1° × 1°
CLM-FATES	Lawrence Berkeley National Laboratory	America	1.8947° × 2.5°
CLM6.0	National Center for Atmospheric Research	America	0.9424° × 1.25°
DLEM	Boston College	America	0.5° × 0.5°
E3SM	Lawrence Livermore National Laboratory	America	0.9424° × 1.25°
EDv3	University of Maryland	America	0.5° × 0.5°
ELM-FATES	Lawrence Berkeley National Laboratory	America	1.8947° × 2.5°
GDSTEM	University of California, Davis	America	0.5° × 0.5°
IBIS	University of Wisconsin–Madison	America	0.5° × 0.5°
ISAM	University of Illinois at Urbana-Champaign	America	0.5° × 0.5°
JSBACH	Max Planck Institute for Meteorology	Germany	1.8606° × 1.875°
JULES-ES	Centre for Ecology & Hydrology	United Kingdom	0.5° × 0.5°
LPJ-EOSIM	Potsdam Institute for Climate Impact Research	Germany	0.5° × 0.5°
LPJ-GUESS	Karlsruhe Institute of Technology	Germany	0.5° × 0.5°
LPX-Bern	University of Bern	Switzerland	0.5° × 0.5°
ORCHIDEE	Institute Pierre Simon Laplace	France	0.5° × 0.5°
SDGVM	Oak Ridge National Laboratory	America	1° × 1°
VISIT-UT	The University of Tokyo	Japan	0.5° × 0.5°
iMAPLE	Nanjing University of Information Science & Technology	China	1° × 1°

Open in a new tab

Soil and topography data

To evaluate the contribution of time-invariant regulators, we compiled soil and topography variables for all field sites. Soil data were extracted from the WISE30sec database³² (https://data.isric.org/geonetwork/srv/api/records/dc7b283a-8f19-45e1-aaed-e9bd515119bc), which provides global soil property estimates at 30 arc-second resolutions. Selected variables included soil pH, total nitrogen content (g kg⁻¹), soil organic carbon content (g kg⁻¹), soil carbon-to-nitrogen (C/N) ratio, clay content (%), silt content (%), and bulk density (kg m⁻³). Topographic data were derived from the Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) at 30 arc-second resolutions³³ (https://worldclim.org/data/worldclim21.html). Elevation (m), slope (°), and aspect (°) were extracted for each site. These variables were spatially matched to measurement sites and incorporated into an extended model to test whether explicitly representing static regulators improves predictive performance compared to the baseline model, which implicitly captures these effects through multi-year mean ANPP.

Methodology

Climatic anomalies calculation

Annual grassland ANPP was predicted using machine learning models, with predictors including sampling-year climate anomalies. The climate anomalies accounted for deviations between the ANPP baseline and actual measurements. Due to strong correlations between sampling year and mean annual climate data, climate anomalies were used as model inputs rather than raw data to mitigate the impact of multicollinearity.

For the four climate factors except temperature (precipitation, solar radiation, potential evapotranspiration, and aridity index), anomalies were calculated as:

X_{ano} = \frac{X_{act} - X_{mean}}{X_{mean}}

where $X_{ano}$ was the climate anomaly, $X_{act}$ was the actual climate data observed in the sampling year, and $X_{mean}$ was the mean annual climate data (1970–2000).

Since the mean annual air temperature can approach 0 °C in some sites, calculating temperature anomaly using Eq. (1) may produce extreme outliers and distort the direction of changes, thereby masking real ecological signals. Therefore, the temperature anomaly was calculated as³⁴:

X_{ano} = X_{act} - X_{mean}

Model comparison and selection

Machine learning techniques are particularly advantageous for modeling complex, highly nonlinear ecosystems³⁵, as they do not rely on the reliance on simplistic statistical assumptions or rigidly prescribed variable interactions that constrain traditional statistical methods³⁶. We evaluated six algorithms to simulate grassland ANPP: linear model, support vector machine, random forest, artificial neural network, bagged classification and regression tree (CART), and eXtreme Gradient Boosting (XGBoost). Comprehensive descriptions of these algorithms can be found in references^37–39.

Our models used measured ANPP as the dependent variable, with mean annual ANPP, climatic baselines, and annual climatic anomalies as predictors. We employed cross-validation to evaluate model performance and prevent overfitting. To comprehensively assess generalization ability and evaluate potential overestimation due to spatial autocorrelation, we systematically compared three cross-validation strategies⁴⁰. First, random 10-fold cross-validation served as the baseline. Second, spatial-block cross-validation divided the data into geographically contiguous blocks assigned to different folds. This ensured spatial independence between training and validation sets, reducing inflated performance estimates caused by spatial autocorrelation. Third, environmentally-stratified cross-validation clustered observations into 10 climate classes using k-means⁴¹ on multi-year mean ANPP and five climatic variables. Each class was then distributed proportionally across folds to preserve environmental representativeness and avoid extrapolation bias. Each strategy was repeated 10 times with different random seeds. Model performance was measured via the coefficient of determination (R²), root mean square error (RMSE), and mean absolute error (MAE). Differences between validation strategies were assessed with repeated-measures ANOVA and Tukey’s HSD test.

Under both random 10-fold and environmentally-stratified cross-validation, the random forest model consistently performed best, followed by the XGBoost model (p < 0.001, Fig. 3). This consistency across strategies reinforced confidence in the random forest model’s superiority. Spatial-block cross-validation reduced performance and increased variability for all models (Fig. 3). This occurs because the strategy disrupts continuous spatial environmental gradients. As a result, large differences arise in predictive space between training and validation sets, yielding overly pessimistic error estimates⁴⁰. Given that our goal is to produce a global dataset covering the full environmental gradient, not extreme extrapolation, environmentally- stratified cross-validation offers a more balanced evaluation framework. We therefore tuned and trained the random forest model exclusively within this framework.

Fig. 3 — Performance comparison of six machine learning algorithms under three cross-validation (CV) strategies. (a) Coefficient of determination (R²), (b) root mean square error (RMSE), and (c) mean absolute error (MAE). Higher R² values and lower RMSE/MAE values indicate superior predictive performance. For each algorithm, three validation strategies are displayed from left to right: random 10-fold CV, spatial-block CV, and environmentally-stratified CV. Box plots show medians, quartiles, non-outlier ranges, and outliers from 10 repetitions with different random seeds. LM = linear model, SVR = support vector machine, RF = random forest, ANN = artificial neural network, BAG = bagged classification and regression tree, and XGB = eXtreme Gradient Boosting.

This study assumes that interannual variability in grassland ANPP is mainly controlled by climatic factors, while time-invariant regulators (e.g., soil and topography) are largely reflected in multi-year mean ANPP. To validate this, we tested whether including static environmental factors could improve model performance. An extended model was built by incorporating ten soil and topographic predictors (see Soil and topography data), and its performance was compared to the baseline model using environmentally-stratified cross-validation. Results showed statistically significant yet marginal improvement with the extended model (p < 0.001, Fig. 4).

Fig. 4 — Performance comparison of three model setups under environmentally-stratified cross-validation. (a) Coefficient of determination (R²), (b) root mean square error (RMSE), and (c) mean absolute error (MAE). Higher R² values and lower RMSE/MAE values indicate superior predictive performance. Model 1 is the baseline random forest model incorporating multi-year mean ANPP, climatic baselines, and anomalies. Model 2 represents the ensemble mean of random forest and eXtreme Gradient Boosting predictions. Model 3 extends the random forest model with soil and topographic factors. Box plots show medians, quartiles, non-outlier range, and outliers across 10 repetitions with different random seeds.

Given that random forest performed best, we further examined whether combining it with other high-performing models could reduce uncertainty. We created an ensemble model by averaging predictions from random forest and XGBoost (the second-best) models. However, this ensemble did not significantly outperform the single random forest model (p > 0.05, Fig. 4). Since neither extended variables nor model integration yielded substantial gains, we retained the original random forest model for its parsimony, computational efficiency, and reproducibility.

Model tuning and prediction

Random forest model tuning targeted two hyperparameters⁴²: the number of trees (ntree) and the number of random variables at each node (mtry). We evaluated 25 parameter combinations (ntree: 200, 400, 600, 800, 1000; mtry: 2, 4, 6, 8, 10) over 10 repeated training cycles. In each cycle, model performance was assessed via environmentally-stratified cross-validation using field-measured grassland ANPP, with R² guiding hyperparameter selection. The optimal combination was ntree = 400 and mtry = 2, achieving a cross-validated R² value of 0.672 (Table 3).

Table 3.

Hyperparameter selection for the grassland ANPP prediction model.

mtry	ntree	Coefficient of determination (R²)										Means
mtry	ntree	1	2	3	4	5	6	7	8	9	10	Means
2	200	0.672	0.669	0.671	0.672	0.672	0.669	0.672	0.670	0.671	0.671	0.6709
4	200	0.669	0.670	0.672	0.671	0.671	0.673	0.670	0.672	0.672	0.673	0.6713
6	200	0.672	0.673	0.674	0.672	0.667	0.670	0.671	0.672	0.670	0.670	0.6712
8	200	0.669	0.673	0.669	0.669	0.673	0.670	0.669	0.668	0.671	0.671	0.6702
10	200	0.671	0.668	0.671	0.671	0.669	0.669	0.670	0.671	0.669	0.669	0.6697
2	*400*	*0.672*	*0.672*	*0.672*	*0.672*	*0.673*	*0.673*	*0.673*	*0.671*	*0.673*	*0.669*	*0.6720*
4	400	0.668	0.673	0.673	0.671	0.670	0.671	0.671	0.669	0.670	0.672	0.6707
6	400	0.671	0.670	0.670	0.674	0.671	0.669	0.671	0.672	0.673	0.671	0.6712
8	400	0.672	0.671	0.669	0.669	0.669	0.669	0.672	0.669	0.672	0.671	0.6704
10	400	0.671	0.672	0.672	0.670	0.669	0.669	0.671	0.671	0.670	0.670	0.6705
2	600	0.671	0.671	0.671	0.672	0.675	0.673	0.670	0.669	0.672	0.672	0.6717
4	600	0.671	0.671	0.671	0.671	0.671	0.672	0.673	0.672	0.673	0.673	0.6719
6	600	0.669	0.670	0.671	0.671	0.671	0.671	0.670	0.669	0.672	0.672	0.6706
8	600	0.670	0.669	0.670	0.672	0.672	0.671	0.671	0.672	0.671	0.672	0.6709
10	600	0.671	0.670	0.669	0.673	0.670	0.669	0.671	0.669	0.668	0.671	0.6702
2	800	0.671	0.675	0.670	0.670	0.671	0.673	0.671	0.670	0.670	0.672	0.6711
4	800	0.673	0.672	0.671	0.671	0.673	0.672	0.670	0.669	0.672	0.672	0.6715
6	800	0.671	0.670	0.672	0.672	0.672	0.668	0.670	0.669	0.671	0.672	0.6708
8	800	0.670	0.669	0.668	0.673	0.669	0.673	0.670	0.671	0.671	0.670	0.6703
10	800	0.669	0.669	0.670	0.669	0.669	0.668	0.670	0.670	0.671	0.669	0.6694
2	1000	0.672	0.670	0.671	0.670	0.671	0.672	0.673	0.671	0.673	0.670	0.6712
4	1000	0.672	0.672	0.673	0.671	0.672	0.669	0.673	0.673	0.670	0.672	0.6718
6	1000	0.671	0.669	0.672	0.669	0.672	0.671	0.671	0.670	0.671	0.672	0.6707
8	1000	0.672	0.671	0.670	0.671	0.671	0.668	0.672	0.669	0.670	0.670	0.6702
10	1000	0.670	0.673	0.671	0.670	0.669	0.668	0.672	0.670	0.670	0.670	0.6704

Open in a new tab

Using the optimal hyperparameters, we trained random forest models 500 times to generate 500 independent models. Each model underwent environmentally-stratified cross-validation for performance assessment. Ideally, ensemble predictions from all 500 models would provide robust grassland ANPP estimates and uncertainty quantification⁴³. However, given the computational and temporal constraints of high-resolution, long-term ANPP predictions, we selected the single best-performing model for final predictions, rather than averaging all 500.

Annual grassland ANPP was estimated at 1/24° spatial resolution for 1958–2023 by integrating mean annual ANPP data with time-varying climate variables from the TerraClimate database. Additionally, natural grassland ANPP projections for 2015–2100 at 1/2° resolution were derived from SSP245 and SSP585 scenario data from 13 CMIP6 models.

Spatial validation

Multiple global ANPP datasets have been developed using statistical^44–46 and machine learning methods⁸. To validate our ANPP dataset’s spatial distribution, comparisons were performed with these four published datasets. Notably, most rely on 1970–2000 climatic averages, so our corresponding results during this period were used for comparisons. For Del Grosso et al.⁴⁵, which uses 1961–1990 climate averages, we substituted 1970–2000 climatic averages into their statistical model to generate comparable ANPP estimates.

Specifically, we conducted three quantitative spatial analyses to quantify the spatial pattern consistency between this dataset and existing products. First, we classified continuous ANPP data into five categories based on their cumulative distribution percentiles (0, 20, 40, 60, 80, 100%) and then calculated Kappa statistics between our dataset and published datasets⁴⁷. Kappa values above 0.6 indicate substantial agreement⁴⁸. Second, we plotted a Taylor diagram to visualize comparisons between published datasets and our dataset⁴⁹. This diagram simultaneously displays three key statistics: correlation coefficients with our dataset, standard deviations for each dataset, and centered root mean square errors. This allows for an intuitive assessment of each dataset’s similarity to our dataset in terms of spatial variability magnitude and pattern. Third, we computed global and local Moran’s I, and generated local indicators of spatial association (LISA) cluster maps to evaluate whether our dataset exhibits realistic spatial autocorrelation patterns comparable to prior studies⁵⁰.

Temporal validation

Direct validation of our dataset’s temporal reliability (i.e., the accuracy of interannual variability) is hindered by the absence of a long-term global grassland ANPP dataset suitable for benchmarking. The inherent limitations of satellite-based NPP products also compromise their utility for indirect validation. First, ANPP constitutes the aboveground portion of NPP, and its proportion relative to total NPP varies with biome types and environmental conditions⁵¹. Empirical conversions from NPP to ANPP inherently introduce uncertainties. Second, satellite-derived NPP products often lack sufficient temporal coverage. For example, MOD17, one of the most widely used remote sensing NPP products, is only available post-2000⁵², precluding comparisons with the pre-2000 portion of our dataset. Third, these products rely on vegetation indices (e.g., NDVI, SIF) and thus reflect real-world conditions, failing to isolate anthropogenic disturbances like grazing.

Despite these constraints, we preliminarily validated the temporal reliability of our dataset through three approaches. First, we assessed systematic model bias by analyzing the relationship between model residuals (differences between observed and predicted ANPP) and sampling years. Since residuals theoretically follow a normal distribution, unusually large residuals may correlate with large ANPP values or sample sizes. To test whether residual outliers could be attributed to observed data characteristics, we grouped sampling years into five-year intervals and calculated the relative residual standard deviation (standard deviation of residuals divided by mean observed ANPP) and sample size for each group. Second, we compared our dataset with two long-term NPP products—GLASS NPP (1982–2018) and BEPS NPP (1981–2019)—under the assumptions of a stable “ANPP/NPP ratio” and “anthropogenic disturbance intensity” within individual grids. It has been indicated that ANPP/NPP ratios are primarily regulated by mean annual precipitation⁴⁶. For a specific grid cell, mean annual precipitation is a constant, so the ANPP/NPP ratio is also considered temporally stable. We computed correlations between remote sensing NPP and our predicted ANPP for each grid to evaluate their dynamic consistency. Globally positive correlations would reinforce confidence in our dataset’s temporal accuracy. Third, to overcome temporal coverage limitations of satellite products and provide a more independent assessment, we validated our dataset using 20 DGVMs from the TRENDY project. For each grid cell, we calculated the correlation between the ANPP time series and the DGVMs’ NPP outputs (1958–2023). We assessed the robustness of our ANPP dataset’s temporal dynamics based on the 20 DGVMs and the TRENDY ensemble mean.

Data Records

The global gridded dataset of natural grassland ANPP developed in this study is publicly accessible at 10.5281/zenodo.18171957¹⁶. The dataset comprises two compressed files: “historical_ANPP_map” and “future_ANPP_map”. After decompression, all files adopt the GeoTIFF format with annual temporal resolution. ANPP data are measured in g m⁻², and NoData values are set to NaN.

The “historical_ANPP_map” dataset, derived from TerraClimate data, features a 1/24° spatial resolution. It follows the naming convention map_YYYY.tif, where YYYY corresponds to the years 1958–2023.

The “future_ANPP_map” dataset, derived from CMIP6 data, features a 1/2° resolution. It follows the naming convention map_MODEL_SSP_YYYY.tif, where MODEL represents the 13 CMIP6 models (Table 1), SSP denotes the SSP245 and SSP585 scenarios, and YYYY corresponds to the years 2015–2100. To enable users to assess uncertainties in future projections, we provide statistical layers for each year based on the ensemble of 13 models. They follow the naming convention map_STATISTIC_SSP_YYYY.tif, where STATISTIC denotes the mean, standard deviation (std), median, 5th percentile (p05), and 95th percentile (p95).

Technical Validation

Model validation based on field measurements

Based on the optimized hyperparameters (ntree = 400 and mtry = 2), 500 random forest models were generated by training on the complete dataset. The performance of these models exhibited normal distribution characteristics (Fig. 5). Specifically, R² was 0.675 ± 0.009, RMSE was 100.4 ± 1.3 g m⁻², and MAE was 61.5 ± 0.5 g m⁻² (mean ± standard deviation). These results demonstrate the robust predictive performance of random forest models.

Fig. 5 — Performance distributions across 500 random forest models. (a) R² (coefficient of determination), (b) RMSE (root mean square error), and (c) MAE (mean absolute error). Red curves denote fitted normal distributions, scatters indicate mean values, and error bars represent standard deviations.

Given computational and time constraints, we employed the top-performing model from the 500 candidates for grassland ANPP prediction. The resulting ANPP dataset demonstrates overall reliability as simulated and measured values were uniformly distributed around the 1:1 line (Fig. 6a). Further validation against Sun, Feng et al.⁸ confirmed the model’s superior accuracy in specific-year estimates (Fig. 6), attributed to its incorporation of interannual climate variability.

Fig. 6 — Comparison of observed and predicted aboveground net primary productivity (ANPP) for natural grasslands. (a) Specific-year results from this study and (b) multi-year average results from Sun, Feng *et al*.⁸.

Spatial reliability assessment

The spatial pattern of our natural grassland ANPP dataset was validated against four published datasets. Qualitatively, our estimates align broadly with prior products (Fig. 7a–e), capturing the established biogeographic gradient: high ANPP in the savannas of central Africa and eastern South America, and low ANPP on the Tibetan Plateau.

Fig. 7 — Spatial distribution and quantitative comparison of global grassland aboveground net primary productivity (ANPP) datasets. (a–d) Multi-year mean ANPP (1970–2000) from four published datasets, (e) multi-year mean ANPP (1970–2000) from this study, and (f) Taylor diagram comparing the five datasets, summarizing standard deviation, Pearson correlation coefficient, and centered root-mean-square difference. Letters a–e in (f) correspond to the datasets in (a–e), with our dataset as the reference.

Quantitative metrics further support this consistency. First, Kappa statistics (after classifying ANPP into five percentile-based levels) show substantial agreement with existing datasets, with values ranging from 0.65 to 0.74 (all > 0.6). Second, a Taylor diagram (Fig. 7f) demonstrates strong spatial correlations (all > 0.8) between our dataset and each previous study, while its standard deviation falls within the range of the others, confirming realistic spatial variability. Third, global Moran’s I analysis reveals a moderate, reasonable degree of spatial autocorrelation (I = 0.32), comparable to published datasets (I = 0.28–0.37). Local spatial cluster analysis revealed that core areas of high-high and low-low clustering show strong congruence across all datasets (Fig. 8).

Fig. 8 — Spatial autocorrelation patterns of global grassland aboveground net primary productivity (ANPP) datasets. Local indicators of spatial association (LISA) cluster maps and global Moran’s I statistics are shown for (a–d) published datasets and (e) our dataset.

Despite overall spatial concordance, our ANPP estimates differ systematically from previous studies: they exceed Del Grosso et al.⁴⁵ but are lower than Sun, Feng et al.⁸, Sun, Yang et al.⁴⁴, and Gherardi & Sala⁴⁶. This divergence arises primarily from underestimation in high-productivity regions (>700 g m⁻², Fig. 6a), where field measurements are scarce (fewer than 5% of training samples). Limited training data reduced model accuracy in these areas. While our dataset reliably captures spatial patterns and global gradients, users should exercise caution when interpreting absolute ANPP magnitudes in high-productivity ecosystems such as African and South American savannas. Additional field measurements in these underrepresented regions are needed to reduce model uncertainty and improve prediction accuracy.

Temporal reliability assessment

We employed three methods to validate the temporal reliability of our ANPP dataset. First, we examined the relationship between model residuals (differences between observed and predicted ANPP values) and sampling years. Results show that residuals are evenly distributed around the y = 0 line across all sampling years (Fig. 9a). However, unusually large or small residuals occasionally occurred in specific years (e.g., 2003). This can be attributed to the theoretical normal distribution of residuals, where outliers are more likely to occur with larger sample sizes or higher ANPP observations. To further verify this, we partitioned the sampling years into five-year intervals and analyzed the correlation between relative residual standard deviation (the standard deviation of residuals divided by the mean observed ANPP) and sample size within each interval. We revealed that the sample size well explains the residual variation after correcting for observations (Fig. 9b, R² = 0.54, p = 0.006). These findings confirm no significant temporal patterns in model residuals, indicating robust model performance without year-specific over- or underestimation.

Fig. 9 — Temporal reliability assessment based on residual analysis. (a) Changes in model residuals with sampling years in this study. (b) Changes in relative residual standard deviation with sample sizes. The model residual represents the difference between observed and predicted aboveground net primary productivity (ANPP), and the relative residual standard deviation represents the standard deviation of model residuals divided by the mean observed ANPP every five years.

We employed two long-term (~40-year) satellite-derived NPP datasets (GLASS and BEPS) to assess whether our ANPP dataset exhibits comparable interannual variability. Grid-level correlations between our ANPP and annual NPP from these products show strong temporal consistency: 78% of grassland grids exhibit positive correlations with GLASS NPP, and 86% with BEPS NPP (Fig. 10a,b). This indicates that our model effectively captures interannual grassland productivity dynamics.

Fig. 10 — Temporal reliability assessment against satellite-derived net primary productivity (NPP) products and dynamic global vegetation models (DGVMs) simulations. Maps show Pearson correlation coefficients between our aboveground net primary productivity (ANPP) dataset and (a) GLASS NPP (1982–2018), (b) BEPS NPP (1981–2019), and (c) TRENDY ensemble mean NPP (1958–2023). The superimposed plot in (c) exhibits the percentage of grassland grids with positive correlation between our ANPP and NPP from each of the 20 individual DGVMs, and the ensemble mean.

Complementing satellite-based validation, comparison with 20 TRENDY DGVMs provides independent support for the temporal reliability of our dataset. Our ANPP correlates positively with the TRENDY multi-model ensemble mean NPP over 85% of global natural grassland area (Fig. 10c). Among individual DGVMs, 18 of 20 models exhibited positive correlations in >70% of grids (Fig. 10c), with the CLASSIC model achieving the highest agreement (87% positive correlation). This consistency across diverse DGVMs strongly suggests that our dataset robustly captures climate-driven interannual variability in grassland productivity, supporting its use for long-term trend analysis and climate-impact studies.

Nevertheless, persistently negative correlations between NPP and ANPP are observed in the East Sudanian savanna, Tibetan Plateau alpine steppe, and East European forest steppe across multiple datasets. These patterns reflect region-specific mechanisms. In mixed herbaceous-woody systems, woody expansion can raise total NPP⁵³ while suppressing herbaceous productivity through competition⁵⁴. On the Tibetan Plateau, warming-enhanced evapotranspiration²⁰ and water availability shifts induced by permafrost degradation⁵⁵ promote belowground allocation⁵⁶, increasing total NPP but reducing ANPP. Such discrepancies highlight the need for region-specific interpretation. We recommend local validation in these areas and using observed mismatches to guide further mechanistic research.

Usage Notes

This study provides a global gridded annual ANPP dataset for natural grasslands. It distinguishes itself from existing NPP datasets by focusing exclusively on aboveground NPP, a rarely isolated metric in global-scale products. Unlike satellite-derived products that reflect post-disturbance conditions, our dataset explicitly models ANPP under undisturbed, natural grassland conditions. This dataset’s temporal resolution (annual) and coverage (extending back to 1958) exceed those of mean-annual studies, e.g., Sun, Feng et al.⁸, and shorter-term remote sensing products, e.g., the MOD17 product. Its simplicity—rooted in accessible model inputs and transparent machine learning workflows—enhances reproducibility and broad applicability across ecological and agricultural research.

This dataset supports a range of applications, including but not limited to climate change analysis, carbon allocation investigation, and anthropogenic impact evaluations. Long-term, high-resolution ANPP data enable spatiotemporal assessments of grassland productivity trends, interannual variability, and regional hotspots under climate change. When combined with remote sensing NPP products, the dataset clarifies spatial and temporal patterns in above- vs. below-ground carbon allocation, revealing vegetation adaptation strategies. By contrasting natural ANPP estimates with field-measured or satellite-derived post-disturbance grasslands, researchers can quantify human impacts (e.g., grazing or land use change), assess grassland carrying capacity, and inform sustainable management.

This dataset represents natural grassland ANPP and should not be used to estimate total NPP or ANPP of disturbed grasslands. Please note that this dataset exhibits systematic underestimation in high-productivity regions (ANPP > 700 g m^–2), particularly in the savannas of Africa and South America, due to the scarcity of field observations from these ecosystems in the training data (<5%). Users should therefore exercise caution when conducting quantitative analyses in these regions. We strongly recommend that applications in high-productivity areas be calibrated and validated with local field measurements to correct for potential bias.

Acknowledgements

The research presented in this paper were funded by the National Natural Science Foundation of China (42207477 & 42271489), the Natural Science Foundation Program of Fujian Province, China (2024J01412), and the Science and Technology Innovation Fund Project of Fujian Agriculture and Forestry University (KFB24131A).

Author contributions

Dongsheng Zhao and Ziwei Chen conceptualized the study. Ziwei Chen generated the dataset and drafted the work. Zhiyuan Zhang conducted quality control on the dataset. All authors contributed to preparing the manuscript and approved the submission.

Data availability

The data supporting this study and the ANPP dataset generated in this study are publicly available at 10.5281/zenodo.18171957.

Code availability

The R code supporting this study is available at 10.5281/zenodo.18171957.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Dongsheng Zhao, Email: zhaods@igsnrr.ac.cn.

Zhiyuan Zhang, Email: zhangzhiyuancn@foxmail.com.

References

1.Cobon, D.H. et al. Agroclimatology in Grasslands. Agroclimatology 369-423 10.2134/agronmonogr60.2016.0013 (2020).
2.Fay, P. A. et al. Grassland productivity limited by multiple nutrients. Nat. Plants1, 15080, 10.1038/nplants.2015.80 (2015). [DOI] [PubMed] [Google Scholar]
3.Roux, N. et al. Embodied HANPP of feed and animal products: Tracing pressure on ecosystems along trilateral livestock supply chains 1986–2013. Sci. Total Environ.851, 158198, 10.1016/j.scitotenv.2022.158198 (2022). [DOI] [PubMed] [Google Scholar]
4.Piao, S., He, Y., Wang, X. & Chen, F. Estimation of China’s terrestrial ecosystem carbon sink: Methods, progress and prospects. Sci. China Earth Sci.65, 641–651, 10.1007/s11430-021-9892-6 (2022). [Google Scholar]
5.Zhou, G. et al. Effects of livestock grazing on grassland carbon storage and release override impacts associated with global climate change. Global Change Biol.25, 1119–1132, 10.1111/gcb.14533 (2019). [DOI] [PubMed] [Google Scholar]
6.He, M. et al. Grazing and global change factors differentially affect biodiversity-ecosystem functioning relationships in grassland ecosystems. Global Change Biol.28, 5492–5504, 10.1111/gcb.16305 (2022). [DOI] [PubMed] [Google Scholar]
7.Bardgett, R. D. et al. Combatting global grassland degradation. Nature Reviews Earth & Environment2, 720–735, 10.1038/s43017-021-00207-2 (2021). [Google Scholar]
8.Sun, Y. et al. Field-Based Estimation of Net Primary Productivity and Its Above- and Belowground Partitioning in Global Grasslands. J. Geophys. Res.: Biogeosci.126, e2021JG006472, 10.1029/2021JG006472 (2021). [Google Scholar]
9.IPCC. Climate change 2021: The physical science basis. Contribution of working group I to the sixth assessment report of the Intergovernmental Panel on Climate Change (2021).
10.Lieth, H. Modeling the Primary Productivity of the World, 10.1007/978-3-642-80913-2_12 (Springer Berlin Heidelberg, Berlin, Heidelberg, 1975).
11.Wang, J. et al. New Global MuSyQ GPP/NPP Remote Sensing Products From 1981 to 2018. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.14, 5596–5612, 10.1109/JSTARS.2021.3076075 (2021). [Google Scholar]
12.Hilker, T., Natsagdorj, E., Waring, R. H., Lyapustin, A. & Wang, Y. Satellite observed widespread decline in Mongolian grasslands largely due to overgrazing. Global Change Biol.20, 418–428, 10.1111/gcb.12365 (2014). [DOI] [PubMed] [Google Scholar]
13.Hartman, M. D. et al. Seasonal grassland productivity forecast for the U.S. Great Plains using Grass-Cast. Ecosphere11, e03280, 10.1002/ecs2.3280 (2020). [Google Scholar]
14.Maselli, F., Argenti, G., Chiesi, M., Angeli, L. & Papale, D. Simulation of grassland productivity by the combination of ground and satellite data. Agric. Ecosyst. Environ.165, 163–172, 10.1016/j.agee.2012.11.006 (2013). [Google Scholar]
15.Hu, Q. et al. Intercomparison of global terrestrial carbon fluxes estimated by MODIS and Earth system models. Sci. Total Environ.810, 152231, 10.1016/j.scitotenv.2021.152231 (2022). [DOI] [PubMed] [Google Scholar]
16.Chen, Z., Zhao, D., Zhang, Z., Zhang, L. & Zheng, D. A global gridded ANPP dataset for natural grasslands from 1958 to 2100. Zenodo10.5281/zenodo.18171957 (2026).
17.Song, J. et al. A global database of plant production and carbon exchange from global change manipulative experiments. figshare10.6084/m9.figshare.7442915.v13 (2020). [DOI] [PMC free article] [PubMed]
18.Sun, Y., Chang, J. & Fang, J. Above- and belowground net-primary productivity: a field-based global database of grasslands. DRYAD10.5061/dryad.7sqv9s4vv (2022). [DOI] [PubMed]
19.Chen, Z. et al. Single-peaked responses of grassland productivity to grazing intensity. Agric. Ecosyst. Environ.387, 109630, 10.1016/j.agee.2025.109630 (2025). [Google Scholar]
20.Zomer, R. J., Xu, J. & Trabucco, A. Version 3 of the Global Aridity Index and Potential Evapotranspiration Database. Sci. Data9, 409, 10.1038/s41597-022-01493-1 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Abatzoglou, J., Dobrowski, S., Parks, S. & Hegewisch, K. Monthly climate and climatic water balance for global terrestrial surfaces from 1958-2015. University of Idaho10.7923/G43J3B0R (2017).
22.Fick, S. E. & Hijmans, R. J. WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol.37, 4302–4315, 10.1002/joc.5086 (2017). [Google Scholar]
23.Harris, I., Jones, P. D., Osborn, T. J. & Lister, D. H. Updated high-resolution grids of monthly climatic observations – the CRU TS3.10 Dataset. Int. J. Climatol.34, 623–642, 10.1002/joc.3711 (2014). [Google Scholar]
24.Kobayashi, S. et al. The JRA-55 Reanalysis: General Specifications and Basic Characteristics. Journal of the Meteorological Society of Japan. Ser. II93, 5–48, 10.2151/jmsj.2015-001 (2015). [Google Scholar]
25.Bjarke, N., Barsugli, J. & Livneh, B. Ensemble of CMIP6 derived reference and potential evapotranspiration with radiative and advective components. Sci. Data10, 417, 10.1038/s41597-023-02290-0 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Cannon, A. J., Sobie, S. R. & Murdock, T. Q. Bias Correction of GCM Precipitation by Quantile Mapping: How Well Do Methods Preserve Changes in Quantiles and Extremes? J. Clim.28, 6938–6959, 10.1175/JCLI-D-14-00754.1 (2015). [Google Scholar]
27.Deng, S. et al. Global Distribution and Projected Variations of Compound Drought-Extreme Precipitation Events. Earth’s Future12, e2024EF004809, 10.1029/2024EF004809 (2024). [Google Scholar]
28.Liang, S. et al. Updates on Global LAnd Surface Satellite (GLASS) products suite. National Remote Sensing Bulletin27, 831–856, 10.11834/jrs.20232462 (2023). [Google Scholar]
29.Chen, J. M. et al. Vegetation structural change since 1981 significantly enhanced the terrestrial carbon sink. Nat. Commun.10, 4259, 10.1038/s41467-019-12257-8 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Friedlingstein, P. et al. Global Carbon Budget 2024. Earth Syst. Sci. Data17, 965–1039, 10.5194/essd-17-965-2025 (2025). [Google Scholar]
31.Sitch, S. et al. Trends and Drivers of Terrestrial Sources and Sinks of Carbon Dioxide: An Overview of the TRENDY Project. Global Biogeochem. Cycles38, e2024GB008102, 10.1029/2024GB008102 (2024). [Google Scholar]
32.Batjes, N. H. Harmonized soil property values for broad-scale modelling (WISE30sec) with estimates of global soil carbon stocks. Geoderma269, 61–68, 10.1016/j.geoderma.2016.01.034 (2016). [Google Scholar]
33.Farr, T.G. et al. The Shuttle Radar Topography Mission. Rev. Geophys. 4510.1029/2005RG000183 (2007).
34.O’Sullivan, M. et al. Climate-Driven Variability and Trends in Plant Productivity Over Recent Decades Based on Three Global Products. Global Biogeochem. Cycles34, e2020GB006613, 10.1029/2020GB006613 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Pichler, M. & Hartig, F. Machine learning and deep learning—A review for ecologists. Methods Ecol. Evol.14, 994–1016, 10.1111/2041-210X.14061 (2023). [Google Scholar]
36.Scowen, M., Athanasiadis, I. N., Bullock, J. M., Eigenbrod, F. & Willcock, S. The current and future uses of machine learning in ecosystem service research. Sci. Total Environ.799, 149263, 10.1016/j.scitotenv.2021.149263 (2021). [DOI] [PubMed] [Google Scholar]
37.Hsieh, W.W. Machine Learning Methods in the Environmental Sciences: Neural Networks and Kernels, 10.1017/CBO9780511627217 (Cambridge University Press, Cambridge, 2009).
38.Huettmann, F. et al. Use of Machine Learning (ML) for Predicting and Analyzing Ecological and ‘Presence Only’ Data: An Overview of Applications and a Good Outlook. Machine Learning for Ecology and Sustainable Natural Resource Management (eds. Humphries, G., Magness, D.R. & Huettmann, F.) 27-61, 10.1007/978-3-319-96978-7_2 (Springer International Publishing, Cham, 2018).
39.Olden, J. D., Lawler, J. J. & Poff, N. L. Machine Learning Methods Without Tears: A Primer for Ecologists. The Quarterly Review of Biology83, 171–193, 10.1086/587826 (2008). [DOI] [PubMed] [Google Scholar]
40.Roberts, D. R. et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography40, 913–929, 10.1111/ecog.02881 (2017). [Google Scholar]
41.McQueen, J.B. Some methods of classification and analysis of multivariate observations. Proc. of 5th Berkeley Symposium on Math. Stat. and Prob. 281-297 (1967).
42.Hu, Y. et al. A long-term daily gridded snow depth dataset for the Northern Hemisphere from 1980 to 2019 based on machine learning. Big Earth Data8, 274–301, 10.1080/20964471.2023.2177435 (2024). [Google Scholar]
43.Haaf, D., Six, J. & Doetterl, S. Global patterns of geo-ecological controls on the response of soil respiration to warming. Nat. Clim. Change11, 623–627, 10.1038/s41558-021-01068-9 (2021). [Google Scholar]
44.Sun, Y. et al. Global patterns and climatic drivers of above- and belowground net primary productivity in grasslands. Sci. China Life Sci.64, 739–751, 10.1007/s11427-020-1837-9 (2021). [DOI] [PubMed] [Google Scholar]
45.Del Grosso, S. et al. Global potential net primary production predicted from vegetation class, precipitation, and temperature. Ecology89, 2117–2126, 10.1890/07-0850.1 (2008). [DOI] [PubMed] [Google Scholar]
46.Gherardi, L. A. & Sala, O. E. Global patterns and climatic controls of belowground net carbon fixation. Proceedings of the National Academy of Sciences117, 20038–20043, 10.1073/pnas.2006715117 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Cohen, J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological bulletin70, 213–220, 10.1037/h0026256 (1968). [DOI] [PubMed] [Google Scholar]
48.Landis, J. R. & Koch, G. G. The measurement of observer agreement for categorical data. Biometrics33, 159–174, 10.2307/2529310 (1977). [PubMed] [Google Scholar]
49.Taylor, K. E. Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res.: Atmos.106, 7183–7192, 10.1029/2000JD900719 (2001). [Google Scholar]
50.Goodchild, M.F. Spatial autocorrelation, (Geo, Norwich, 1986).
51.Hui, D. & Jackson, R. B. Geographical and interannual variability in biomass partitioning in grassland ecosystems: a synthesis of field data. New Phytol.169, 85–93, 10.1111/j.1469-8137.2005.01569.x (2006). [DOI] [PubMed] [Google Scholar]
52.Zhao, M., Heinsch, F. A., Nemani, R. R. & Running, S. W. Improvements of the MODIS terrestrial gross and net primary production global data set. Remote Sens. Environ.95, 164–176, 10.1016/j.rse.2004.12.011 (2005). [Google Scholar]
53.Stevens, N., Lehmann, C. E. R., Murphy, B. P. & Durigan, G. Savanna woody encroachment is widespread across three continents. Global Change Biol.23, 235–244, 10.1111/gcb.13409 (2017). [DOI] [PubMed] [Google Scholar]
54.Archer, S.R. et al. Woody plant encroachment: causes and consequences. Rangeland systems: Processes, management and challenges 25-84 (Springer International Publishing Cham, 2017).
55.Jin, X.-Y. et al. Impacts of climate-induced permafrost degradation on vegetation: A review. Adv. Clim. Change Res.12, 29–47, 10.1016/j.accre.2020.07.002 (2021). [Google Scholar]
56.Zhao, J., Yang, W., Tian, L., Qu, G. & Wu, G.-L. Warming differentially affects above- and belowground ecosystem functioning of the semi-arid alpine grasslands. Sci. Total Environ.914, 170061, 10.1016/j.scitotenv.2024.170061 (2024). [DOI] [PubMed] [Google Scholar]
57.Dixon, A. P., Faber-Langendoen, D., Josse, C., Morrison, J. & Loucks, C. J. Distribution mapping of world grassland types. J. Biogeogr.41, 2003–2019, 10.1111/jbi.12381 (2014). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Chen, Z., Zhao, D., Zhang, Z., Zhang, L. & Zheng, D. A global gridded ANPP dataset for natural grasslands from 1958 to 2100. Zenodo10.5281/zenodo.18171957 (2026).
Song, J. et al. A global database of plant production and carbon exchange from global change manipulative experiments. figshare10.6084/m9.figshare.7442915.v13 (2020). [DOI] [PMC free article] [PubMed]
Sun, Y., Chang, J. & Fang, J. Above- and belowground net-primary productivity: a field-based global database of grasslands. DRYAD10.5061/dryad.7sqv9s4vv (2022). [DOI] [PubMed]

Data Availability Statement

The data supporting this study and the ANPP dataset generated in this study are publicly available at 10.5281/zenodo.18171957.

The R code supporting this study is available at 10.5281/zenodo.18171957.

[CR1] 1.Cobon, D.H. et al. Agroclimatology in Grasslands. Agroclimatology 369-423 10.2134/agronmonogr60.2016.0013 (2020).

[CR2] 2.Fay, P. A. et al. Grassland productivity limited by multiple nutrients. Nat. Plants1, 15080, 10.1038/nplants.2015.80 (2015). [DOI] [PubMed] [Google Scholar]

[CR3] 3.Roux, N. et al. Embodied HANPP of feed and animal products: Tracing pressure on ecosystems along trilateral livestock supply chains 1986–2013. Sci. Total Environ.851, 158198, 10.1016/j.scitotenv.2022.158198 (2022). [DOI] [PubMed] [Google Scholar]

[CR4] 4.Piao, S., He, Y., Wang, X. & Chen, F. Estimation of China’s terrestrial ecosystem carbon sink: Methods, progress and prospects. Sci. China Earth Sci.65, 641–651, 10.1007/s11430-021-9892-6 (2022). [Google Scholar]

[CR5] 5.Zhou, G. et al. Effects of livestock grazing on grassland carbon storage and release override impacts associated with global climate change. Global Change Biol.25, 1119–1132, 10.1111/gcb.14533 (2019). [DOI] [PubMed] [Google Scholar]

[CR6] 6.He, M. et al. Grazing and global change factors differentially affect biodiversity-ecosystem functioning relationships in grassland ecosystems. Global Change Biol.28, 5492–5504, 10.1111/gcb.16305 (2022). [DOI] [PubMed] [Google Scholar]

[CR7] 7.Bardgett, R. D. et al. Combatting global grassland degradation. Nature Reviews Earth & Environment2, 720–735, 10.1038/s43017-021-00207-2 (2021). [Google Scholar]

[CR8] 8.Sun, Y. et al. Field-Based Estimation of Net Primary Productivity and Its Above- and Belowground Partitioning in Global Grasslands. J. Geophys. Res.: Biogeosci.126, e2021JG006472, 10.1029/2021JG006472 (2021). [Google Scholar]

[CR9] 9.IPCC. Climate change 2021: The physical science basis. Contribution of working group I to the sixth assessment report of the Intergovernmental Panel on Climate Change (2021).

[CR10] 10.Lieth, H. Modeling the Primary Productivity of the World, 10.1007/978-3-642-80913-2_12 (Springer Berlin Heidelberg, Berlin, Heidelberg, 1975).

[CR11] 11.Wang, J. et al. New Global MuSyQ GPP/NPP Remote Sensing Products From 1981 to 2018. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.14, 5596–5612, 10.1109/JSTARS.2021.3076075 (2021). [Google Scholar]

[CR12] 12.Hilker, T., Natsagdorj, E., Waring, R. H., Lyapustin, A. & Wang, Y. Satellite observed widespread decline in Mongolian grasslands largely due to overgrazing. Global Change Biol.20, 418–428, 10.1111/gcb.12365 (2014). [DOI] [PubMed] [Google Scholar]

[CR13] 13.Hartman, M. D. et al. Seasonal grassland productivity forecast for the U.S. Great Plains using Grass-Cast. Ecosphere11, e03280, 10.1002/ecs2.3280 (2020). [Google Scholar]

[CR14] 14.Maselli, F., Argenti, G., Chiesi, M., Angeli, L. & Papale, D. Simulation of grassland productivity by the combination of ground and satellite data. Agric. Ecosyst. Environ.165, 163–172, 10.1016/j.agee.2012.11.006 (2013). [Google Scholar]

[CR15] 15.Hu, Q. et al. Intercomparison of global terrestrial carbon fluxes estimated by MODIS and Earth system models. Sci. Total Environ.810, 152231, 10.1016/j.scitotenv.2021.152231 (2022). [DOI] [PubMed] [Google Scholar]

[CR16] 16.Chen, Z., Zhao, D., Zhang, Z., Zhang, L. & Zheng, D. A global gridded ANPP dataset for natural grasslands from 1958 to 2100. Zenodo10.5281/zenodo.18171957 (2026).

[CR17] 17.Song, J. et al. A global database of plant production and carbon exchange from global change manipulative experiments. figshare10.6084/m9.figshare.7442915.v13 (2020). [DOI] [PMC free article] [PubMed]

[CR18] 18.Sun, Y., Chang, J. & Fang, J. Above- and belowground net-primary productivity: a field-based global database of grasslands. DRYAD10.5061/dryad.7sqv9s4vv (2022). [DOI] [PubMed]

[CR19] 19.Chen, Z. et al. Single-peaked responses of grassland productivity to grazing intensity. Agric. Ecosyst. Environ.387, 109630, 10.1016/j.agee.2025.109630 (2025). [Google Scholar]

[CR20] 20.Zomer, R. J., Xu, J. & Trabucco, A. Version 3 of the Global Aridity Index and Potential Evapotranspiration Database. Sci. Data9, 409, 10.1038/s41597-022-01493-1 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Abatzoglou, J., Dobrowski, S., Parks, S. & Hegewisch, K. Monthly climate and climatic water balance for global terrestrial surfaces from 1958-2015. University of Idaho10.7923/G43J3B0R (2017).

[CR22] 22.Fick, S. E. & Hijmans, R. J. WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol.37, 4302–4315, 10.1002/joc.5086 (2017). [Google Scholar]

[CR23] 23.Harris, I., Jones, P. D., Osborn, T. J. & Lister, D. H. Updated high-resolution grids of monthly climatic observations – the CRU TS3.10 Dataset. Int. J. Climatol.34, 623–642, 10.1002/joc.3711 (2014). [Google Scholar]

[CR24] 24.Kobayashi, S. et al. The JRA-55 Reanalysis: General Specifications and Basic Characteristics. Journal of the Meteorological Society of Japan. Ser. II93, 5–48, 10.2151/jmsj.2015-001 (2015). [Google Scholar]

[CR25] 25.Bjarke, N., Barsugli, J. & Livneh, B. Ensemble of CMIP6 derived reference and potential evapotranspiration with radiative and advective components. Sci. Data10, 417, 10.1038/s41597-023-02290-0 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Cannon, A. J., Sobie, S. R. & Murdock, T. Q. Bias Correction of GCM Precipitation by Quantile Mapping: How Well Do Methods Preserve Changes in Quantiles and Extremes? J. Clim.28, 6938–6959, 10.1175/JCLI-D-14-00754.1 (2015). [Google Scholar]

[CR27] 27.Deng, S. et al. Global Distribution and Projected Variations of Compound Drought-Extreme Precipitation Events. Earth’s Future12, e2024EF004809, 10.1029/2024EF004809 (2024). [Google Scholar]

[CR28] 28.Liang, S. et al. Updates on Global LAnd Surface Satellite (GLASS) products suite. National Remote Sensing Bulletin27, 831–856, 10.11834/jrs.20232462 (2023). [Google Scholar]

[CR29] 29.Chen, J. M. et al. Vegetation structural change since 1981 significantly enhanced the terrestrial carbon sink. Nat. Commun.10, 4259, 10.1038/s41467-019-12257-8 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Friedlingstein, P. et al. Global Carbon Budget 2024. Earth Syst. Sci. Data17, 965–1039, 10.5194/essd-17-965-2025 (2025). [Google Scholar]

[CR31] 31.Sitch, S. et al. Trends and Drivers of Terrestrial Sources and Sinks of Carbon Dioxide: An Overview of the TRENDY Project. Global Biogeochem. Cycles38, e2024GB008102, 10.1029/2024GB008102 (2024). [Google Scholar]

[CR32] 32.Batjes, N. H. Harmonized soil property values for broad-scale modelling (WISE30sec) with estimates of global soil carbon stocks. Geoderma269, 61–68, 10.1016/j.geoderma.2016.01.034 (2016). [Google Scholar]

[CR33] 33.Farr, T.G. et al. The Shuttle Radar Topography Mission. Rev. Geophys. 4510.1029/2005RG000183 (2007).

[CR34] 34.O’Sullivan, M. et al. Climate-Driven Variability and Trends in Plant Productivity Over Recent Decades Based on Three Global Products. Global Biogeochem. Cycles34, e2020GB006613, 10.1029/2020GB006613 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Pichler, M. & Hartig, F. Machine learning and deep learning—A review for ecologists. Methods Ecol. Evol.14, 994–1016, 10.1111/2041-210X.14061 (2023). [Google Scholar]

[CR36] 36.Scowen, M., Athanasiadis, I. N., Bullock, J. M., Eigenbrod, F. & Willcock, S. The current and future uses of machine learning in ecosystem service research. Sci. Total Environ.799, 149263, 10.1016/j.scitotenv.2021.149263 (2021). [DOI] [PubMed] [Google Scholar]

[CR37] 37.Hsieh, W.W. Machine Learning Methods in the Environmental Sciences: Neural Networks and Kernels, 10.1017/CBO9780511627217 (Cambridge University Press, Cambridge, 2009).

[CR38] 38.Huettmann, F. et al. Use of Machine Learning (ML) for Predicting and Analyzing Ecological and ‘Presence Only’ Data: An Overview of Applications and a Good Outlook. Machine Learning for Ecology and Sustainable Natural Resource Management (eds. Humphries, G., Magness, D.R. & Huettmann, F.) 27-61, 10.1007/978-3-319-96978-7_2 (Springer International Publishing, Cham, 2018).

[CR39] 39.Olden, J. D., Lawler, J. J. & Poff, N. L. Machine Learning Methods Without Tears: A Primer for Ecologists. The Quarterly Review of Biology83, 171–193, 10.1086/587826 (2008). [DOI] [PubMed] [Google Scholar]

[CR40] 40.Roberts, D. R. et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography40, 913–929, 10.1111/ecog.02881 (2017). [Google Scholar]

[CR41] 41.McQueen, J.B. Some methods of classification and analysis of multivariate observations. Proc. of 5th Berkeley Symposium on Math. Stat. and Prob. 281-297 (1967).

[CR42] 42.Hu, Y. et al. A long-term daily gridded snow depth dataset for the Northern Hemisphere from 1980 to 2019 based on machine learning. Big Earth Data8, 274–301, 10.1080/20964471.2023.2177435 (2024). [Google Scholar]

[CR43] 43.Haaf, D., Six, J. & Doetterl, S. Global patterns of geo-ecological controls on the response of soil respiration to warming. Nat. Clim. Change11, 623–627, 10.1038/s41558-021-01068-9 (2021). [Google Scholar]

[CR44] 44.Sun, Y. et al. Global patterns and climatic drivers of above- and belowground net primary productivity in grasslands. Sci. China Life Sci.64, 739–751, 10.1007/s11427-020-1837-9 (2021). [DOI] [PubMed] [Google Scholar]

[CR45] 45.Del Grosso, S. et al. Global potential net primary production predicted from vegetation class, precipitation, and temperature. Ecology89, 2117–2126, 10.1890/07-0850.1 (2008). [DOI] [PubMed] [Google Scholar]

[CR46] 46.Gherardi, L. A. & Sala, O. E. Global patterns and climatic controls of belowground net carbon fixation. Proceedings of the National Academy of Sciences117, 20038–20043, 10.1073/pnas.2006715117 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Cohen, J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological bulletin70, 213–220, 10.1037/h0026256 (1968). [DOI] [PubMed] [Google Scholar]

[CR48] 48.Landis, J. R. & Koch, G. G. The measurement of observer agreement for categorical data. Biometrics33, 159–174, 10.2307/2529310 (1977). [PubMed] [Google Scholar]

[CR49] 49.Taylor, K. E. Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res.: Atmos.106, 7183–7192, 10.1029/2000JD900719 (2001). [Google Scholar]

[CR50] 50.Goodchild, M.F. Spatial autocorrelation, (Geo, Norwich, 1986).

[CR51] 51.Hui, D. & Jackson, R. B. Geographical and interannual variability in biomass partitioning in grassland ecosystems: a synthesis of field data. New Phytol.169, 85–93, 10.1111/j.1469-8137.2005.01569.x (2006). [DOI] [PubMed] [Google Scholar]

[CR52] 52.Zhao, M., Heinsch, F. A., Nemani, R. R. & Running, S. W. Improvements of the MODIS terrestrial gross and net primary production global data set. Remote Sens. Environ.95, 164–176, 10.1016/j.rse.2004.12.011 (2005). [Google Scholar]

[CR53] 53.Stevens, N., Lehmann, C. E. R., Murphy, B. P. & Durigan, G. Savanna woody encroachment is widespread across three continents. Global Change Biol.23, 235–244, 10.1111/gcb.13409 (2017). [DOI] [PubMed] [Google Scholar]

[CR54] 54.Archer, S.R. et al. Woody plant encroachment: causes and consequences. Rangeland systems: Processes, management and challenges 25-84 (Springer International Publishing Cham, 2017).

[CR55] 55.Jin, X.-Y. et al. Impacts of climate-induced permafrost degradation on vegetation: A review. Adv. Clim. Change Res.12, 29–47, 10.1016/j.accre.2020.07.002 (2021). [Google Scholar]

[CR56] 56.Zhao, J., Yang, W., Tian, L., Qu, G. & Wu, G.-L. Warming differentially affects above- and belowground ecosystem functioning of the semi-arid alpine grasslands. Sci. Total Environ.914, 170061, 10.1016/j.scitotenv.2024.170061 (2024). [DOI] [PubMed] [Google Scholar]

[CR57] 57.Dixon, A. P., Faber-Langendoen, D., Josse, C., Morrison, J. & Loucks, C. J. Distribution mapping of world grassland types. J. Biogeogr.41, 2003–2019, 10.1111/jbi.12381 (2014). [Google Scholar]

PERMALINK

A long-term gridded dataset of aboveground net primary productivity for global natural grasslands

Ziwei Chen

Dongsheng Zhao

Zhiyuan Zhang

Liming Zhang

Du Zheng

Abstract

Background & Summary

Fig. 1.

Methods

Data

Field-measured grassland ANPP data

Fig. 2.

Mean annual grassland ANPP gridded data

Time-varying climate data

Table 1.

Satellite-derived NPP data

Process-based NPP data

Table 2.

Soil and topography data

Methodology

Climatic anomalies calculation

Model comparison and selection

Fig. 3.

Fig. 4.

Model tuning and prediction

Table 3.

Spatial validation

Temporal validation

Data Records

Technical Validation

Model validation based on field measurements

Fig. 5.

Fig. 6.

Spatial reliability assessment

Fig. 7.

Fig. 8.

Temporal reliability assessment

Fig. 9.

Fig. 10.

Usage Notes

Acknowledgements

Author contributions

Data availability

Code availability

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases