Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2026 Mar 4;123(10):e2524808123. doi: 10.1073/pnas.2524808123

Coupled machine learning–ecosystem ensemble models substantially improve predictions of nitrous oxide (N2O) fluxes from US croplands

Prateek Sharma a, Bruno Basso a,b,c,1, Aditya Manuraj a, Michael S Murillo d, Neville Millar a, Tommaso Tadiello a, Mukta Sharma a, Mathieu Delandmeter a,e, G Philip Robertson b,c,f,1
PMCID: PMC12974439  PMID: 41779779

Significance

Nitrous oxide (N2O) is a potent and increasingly important greenhouse gas currently responsible for ~7% of human-caused atmospheric warming. Agriculture is a major emitter of N2O globally, and agricultural soils are a major if still uncertain source. In large part this uncertainty stems from the challenge of accurately predicting emissions from fertilized crops. Here, we show how an ensemble modeling system that couples a group of ecosystem models with a group of machine learning models can substantially improve cropland N2O flux predictions. The system additionally generates insights that can improve existing ecosystem models, guide field measurement efforts, and advance N2O mitigation strategies under diverse soils and climates in food and bioenergy cropping systems.

Keywords: ensemble modeling, nitrous oxide emissions, multimodel ensemble, machine learning, AI

Abstract

Nitrous oxide (N2O) is a potent and persistent greenhouse gas, with rising atmospheric concentrations driven in part by inefficient use of synthetic nitrogen (N) fertilizers in agriculture. Predicting soil N2O emissions is challenging due to high spatial and temporal variability arising from complex soil biogeochemical processes. Process-based ecosystem models and standalone machine learning (ML) approaches without extensive site-specific calibration often miss high-emission episodes. Here, we show how an Ensemble Modeling System (EMS) based on outputs from an ensemble of ecosystem models coupled to an ensemble of ML models can improve predictions and understanding of N2O fluxes from US cropland. Trained and validated on ~12,000 N2O chamber measurements at 17 US Midwest sites (six crops, 35 management practices), the EMS accurately predicted daily fluxes of N2O at both training (R2 = 0.84, RMSE = 16.4 g N ha−1 d−1) and held-out testing sites (R2 = 0.84, RMSE = 6.2 g N ha−1 d−1). Analyses identified six dominant N2O drivers: soil organic carbon (SOC), NH4+, NO3-, water-filled pore space, temperature, and aboveground biomass production. Wet, warm soils produced large N2O peaks only with sufficient SOC and mineral N; in low-SOC soils, fluxes remained low. Incorporating these drivers into process-based models might significantly improve their predictive capacity. The EMS demonstrates a strong potential to predict N2O fluxes at unseen sites, enabling more reliable regional inventories, improved gap-filling where measurements are sparse, and enhanced understanding of mechanisms to advance targeted mitigation strategies in food, feed, and bioenergy crops.


Atmospheric nitrous oxide (N2O) concentrations, increasing currently at 0.9 ppb y−1, which is 44% faster than at the start of the century, account for 6 to 7% of global anthropogenic radiative forcing (1, 2). Nearly 60% of this rise originates from nitrogen (N) fertilized croplands (3), where pulses of N2O primarily released during nitrification and denitrification are governed by soil mineral-N supply, water content, temperature, pH, and organic carbon (4), factors that are influenced by management practices such as tillage and fertilizer timing as well as weather. Direct measurements identify these controls but operationalizing and validating their influence, which is often site-specific, can be expensive and time-consuming and thus difficult to upscale (5, 6). Consequently, many national greenhouse gas (GHG) inventories for agriculture still rely on approaches that use a single, fixed value emission factor (EF), commonly a global default of 1% of added N (7), to estimate N2O emissions. This approach is tied almost exclusively to N inputs, which masks field-scale management differences and nonlinear N rate effects (8, 9).

Process-based ecosystem or biogeochemical models can simulate daily N2O emissions mechanistically (1018), yet they often miss or inadequately represent high-flux events (19) and typically require site-specific calibration. Saha et al. (20) reported that two of the most commonly used process-based biogeochemical models explained, on average, only ~20% of variability in daily N2O fluxes across all of the cropping systems for which uncalibrated results had thus far been reported. Similarly, Ehrhardt et al. (21) showed comparably large uncertainties for an ensemble of 24 process-based models.

Machine-learning (ML) algorithms, including random forest, gradient boosting, long short-term memory networks, and other deep architectures, improve predictability but require detailed observations of N2O fluxes and associated plant and soil properties. ML models generalize poorly beyond their training domain and frequently struggle with imbalanced datasets that typically include frequent observations of low-to-moderate N2O fluxes and few but important high-emission events. They also lack a mechanistic understanding (20, 2225) and cannot be reliably used to evaluate what-if scenarios or to predict the influence of future weather trends or management practices not included in the training datasets. Perhaps most importantly, few training datasets contain the full suite of potential environmental drivers.

Hybrid modeling approaches can overcome this last limitation by allowing process-based models to provide the detailed plant–soil variables needed as inputs for ML algorithms. For example, Saha et al. (20) improved N2O flux predictions by feeding SALUS (15) outputs of soil nitrate (NO3-) and ammonium (NH4+) into a random forest model to reduce daily RMSE by ~45% when compared to the use of random forest alone. Hybrid frameworks have similarly enhanced ML predictions for other ecosystem processes (2628). However, approaches relying on a single ecosystem model often remain constrained by inherent structural biases, limited transferability across sites, and challenges related to model interpretability (29).

Here, we show how an Ensemble Modeling System (EMS) can overcome these limitations by combining five process-based ecosystem models [APSIM (12), EPIC (14), SALUS (15), DSSAT (13), and STICS (16)] in an ensemble to provide daily plant and soil properties that are in turn used as input data for an ensemble of four ML models [Random Forest (30), Gradient Boosting (31), Support-Vector Regression (SVR) (32), and XGBoost (33)] blended by an SVR metalearner. We used 12,181 individual chamber observations of N2O fluxes from 17 long-term experimental sites in the US Midwest and Great Plains (13 sites for training and four for testing; Fig. 1 and Table 1) spanning six crops and 35 management regimes. Shapely Additive exPlanations (SHAP) (34) analysis reveals drivers that corroborate the well-known biogeochemical processes that generate N2O fluxes, valuable for advancing understanding of the mechanisms most important for improving process-based models (29).

Fig. 1.

Map of the United States with numbered red stars for training and yellow stars for testing locations.

Map of the experiment locations. Symbols mark the individual sites: red stars identify the 13 experiments used for model training and cross-validation, while yellow stars denote the four independent experiments reserved exclusively for out-of-sample testing (see Table 1 and Dataset S1 for a detailed description of each site). Basemap: Esri World Imagery (sources listed on map).

Table 1.

Studied site location, soil conditions, climate, data points, and years

Site ID U.S. State SOC (%) Sand (%) Clay (%) MAT (°C) MAP (mm) Observations (n) Treatments (n) Crops Reference
1 SD 2.1 47 6 7 701 60 1 Corn (35, 36)
2 IN 1.7 17 18 11 990 375 1 Corn (35, 37)
3 MN 3.5 36 30 6 673 676 10 Alfalfa, Corn, Soybean, Wheat (35, 38)
4 NE 1.8 35 14 11 764 2,016 2 Corn (35, 37)
5 IN 1.7 17 18 11 990 817 5 Corn, Rye, Soybean (35, 37)
6 IN 2.6 10 33 11 994 548 4 Corn, Rye, Soybean (35, 39)
7 MN 2.0 10 22 8 842 518 5 Corn (35, 40)
8 PA 1.5 22 35 10 1,036 42 1 Alfalfa, Corn, Soybean (35, 41)
9 IA 4.1 32 31 8 831 105 2 Corn (42)
10 IA 3.5 6 30 9 846 727 18 Corn (43)
11 WI 0.9 48 11 8 888 473 1 Corn (44, 45)
12 MI 0.8 43 19 10 1,013 742 2 Corn, Soybean, Wheat (46)
13 ID 0.9 14 17 9 266 317 1 Alfalfa, Barley, Corn (35, 47)
14 MT 2.9 10 21 7 505 53 1 Wheat (35, 48)
15 MI 2.3 10 22 10 987 790 1 Corn (49, 50)
16 KY 1.7 4 19 15 1,298 2,125 9 Corn (35, 51)
17 CO 0.9 43 19 9 392 1,797 13 Corn (35, 52)

The data from the site IDs 1, 8, 9, and 14 were used for testing, and the others were used to train the model. SOC = soil organic carbon, MAT = mean annual temperature, MAP = mean annual precipitation. Treatments differ by site and include tillage, fertilizer rate, and residue management. Yield, soil temperature, and soil moisture observations are available for sites IDs 1-6 and 12-14. See Dataset S1 for site-specific details.

Our objective is to establish a high-resolution EMS capable of i) accurately representing daily N2O dynamics, including episodic emission peaks not reliably predicted by current models; ii) generalizing across diverse sites and agricultural management practices without the need for site-specific calibration; and iii) identifying the key variables and potential thresholds that drive emissions in order to improve process-based models. By offering enhanced accuracy, transferability, and transparency, this proof-of-concept approach could provide a means for refining national GHG inventories and informing effective field-specific N2O mitigation strategies.

Results and Discussion

The EMS captured 84% of the daily N2O flux variance at the 13 training sites (Fig. 2A) and sustained the same explanatory power (R2 = 0.84) at four independent test sites (Fig. 2 B and C). At the individual test sites, the EMS reliably reproduced both the magnitude and timing of emission peaks (SI Appendix, Fig. S1), explaining 89% of daily variance at the first site (Site ID 1; RMSE = 2.09 g N2O-N ha−1 d−1), 90% at the second (Site ID 8; 2.85 g N2O-N ha−1 d−1), 63% at the third (Site ID 9; 9.14 g N2O-N ha−1 d−1), and 98% at the fourth (Site ID 14; 3.34 g N2O-N ha−1 d−1).

Fig. 2.

A three-panel figure shows predicted vs. observed N sub 2 O for training, testing, and site I D with observed and E M S data.

Predictive performance of the EMS compared with observations. (A) Scatterplot of daily N2O fluxes for the 13 training sites (n = 11,936). (B) Scatterplot for the four fully withheld test sites (n = 260). (C) Violin-plus-box plots for each test site (IDs 1, 8, 9, and 14) comparing the distribution of observed N2O fluxes (gray), and EMS N2O predictions (red). Dashed lines in (A) and (B) are the 1:1 fits. Statistical fit is reported as the coefficient of determination (R2), RMSE (g N ha−1 d−1), and two-tailed significance (P).

For training data, we compared the full range of observed and predicted values across the 13 training sites (SI Appendix, Fig. S2). Across all training sites, the distributions of predicted values align with those of the observations. Next, we examined time‐series alignments at each training site (SI Appendix, Fig. S3). For all locations, the EMS closely followed measured flux events, capturing both the timing and magnitude of emission peaks. Individual site predictions are consistently high (P < 0.001), with site‐level R2 ranging from 0.79 to 0.95 and RMSE between 1.8 and 24.9 g N ha−1 d−1.

Without ecosystem model inputs, i.e., relying only on site-level environmental data, the ML ensemble model performed poorly: While it fit the training data well (fivefold cross-validation R2 = 0.79), performance at the independent test sites was wanting (R2 = 0.26; see SI Appendix, Fig. S4). This loss of predictive skill at independent test sites indicates that emergent behavior captured by the process-based models is critical for generalizing N2O flux predictions beyond the training sites.

To validate the performance of the ecosystem models for capturing soil water and N dynamics, we used yield as an overarching integrator in the absence of appropriate soil data from each site. We compared simulated and observed crop yields for the six sites in our study with yield data (Table 1) supplemented with an additional 17 Midwest sites previously analyzed with the exact same models (53) (SI Appendix, Fig. S5 and Table S1). Individual models predicted yields with R2 = 0.59 to 0.70, and the multimodel ensemble (MME) achieved R2 = 0.73. We also compared modeled vs. observed data for soil temperature and water-filled pore space (WFPS) at the seven sites with available measurements (Table 1; see SI Appendix, Fig. S6), also finding close agreement (r2 = 0.73 to 0.80). Additionally, we compared N2O fluxes simulated by the ensemble of process-based models (53) with those from the ESM (SI Appendix, Table S2). Across all sites, the ESM achieved much lower RMSE and higher R2 than the process-based ensemble’s corresponding predictions.

SHAP Analysis N2O Flux Impact Drivers.

Average SHAP values from the four ML models show that SOC and NH4+ are the two most influential drivers of EMS output, followed by above-ground biomass, WFPS, NO3-, and soil temperature (Fig. 3). These variables are widely recognized as key drivers of N2O emissions (54, 55).

Fig. 3.

Graph of top 10 features explaining eighty-two percent of S H A P. Features include soil organic carbon, soil N H 4 plus, and aboveground biomass.

Relative contribution of the 10 most influential predictors of daily N2O fluxes as identified with SHAP. For every observation, the EMS generates four SHAP values, one from each base learner (RF, XGB, GB, and SVR). The points plotted here are the averages of those four values, so each dot shows the net impact of a single predictor on a single daily prediction. Predictors are ordered by their mean absolute SHAP value; larger means indicate stronger influence on model output. Together, the 10 variables shown account for ~82% of the model’s explanatory power. Point color scales from low (yellow) to high (blue) values of the corresponding predictor, revealing whether large or small magnitudes push the prediction up or down.

The dominance of SOC and mineral-N variables confirms that large quantities of electron donors and C and N substrates must be present in the soil to enable substantial N2O production (4). Above-ground biomass ranks third and highlights the well-known coupling between plant N demand and soil N2O potential (56), while the presence of a soil moisture metric (WFPS) and soil temperature among the top six factors aligns with studies showing that their variability solely or in combination has a major impact on N2O production (57).

Across all four ML models (SI Appendix, Fig. S7), SOC, NH4+, NO3-, WFPS, above-ground biomass, and soil temperature lead the top 10 factors, underscoring their central role in daily N₂O fluxes. The three tree-based algorithms (RF, GB, XGB) consistently place SOC and NH4+ at the top of their rankings, together explaining 82 to 87% of total SHAP importance. SVR, on the other hand, elevates WFPS to first place and introduces relative humidity among its top 10 features, while pushing NH4+ out of the ranking. This difference likely reflects SVR’s sensitivity to linear correlations after standardization, whereas tree methods capture hierarchical splits. Averaging across all four ML models balances these architecture-specific preferences. Plant‐related features also rank among the top 10 factors, including leaf area index (LAI) and belowground biomass (Fig. 3; see SI Appendix, Fig. S7), both of which show relationships to N2O that differ from aboveground biomass. LAI is strongly correlated with aboveground biomass early in the season, then plateaus while biomass continues to accumulate, decoupling their combined effects on N2O. Belowground biomass, in turn, can act both as a sink for nitrogen and as a source of labile carbon, leading to a more complex and weaker net influence on N2O (SI Appendix, Text S1).

SHAP Profiles of the Major Impact Variables.

The N2O flux dependence profiles for the six dominant variables (SI Appendix, Fig. S8) exhibit apparent nonlinear thresholds for each within the study region. For soil NH4+ (SI Appendix, Fig. S8A), the SHAP values are negative at low values (0 to 10 kg N ha−1), increase sharply to >0 at ≈15 kg N ha−1, are maximized at ≈30 to 40 kg N ha−1, and on average remain relatively consistent beyond. For NO3- (SI Appendix, Fig. S8B), SHAP values are more consistently negative until relatively higher concentrations are reached. Values > 0 appear more frequently at ~50 to 60 kg N ha−1 and are consistently >0 at ~125 kg N ha−1, plateauing at ~200 g N ha−1. A recent ML study based on long-term measurements in row crop rotations under conventional and no-till management found that the importance of both NH4+ and NO3- was enhanced beyond concentrations of ~10 to 15 kg N ha−1, peaking at ~20 (NH4+) and 30 to 40 (NO3-) kg N ha−1, then remaining consistent above these values (58). Process-based modeling using field and laboratory data has also shown threshold values for NH4+ of ~8 to 10 kg N ha−1 (59), beyond which N2O fluxes were more strongly impacted. Higher NO3- concentrations are also known to inhibit the conversion from N2O to N2 (60), potentially explaining its continuing impact on N2O fluxes even at very high concentrations (e.g., >300 kg N ha−1).

With respect to WFPS (SI Appendix, Fig. S8C), although there is great variability, there is a positive SHAP peak value at ~0.40, decreasing thereafter, with an inflection point at ~0.60, at which point nitrification and denitrification have been shown to trade places as the dominant N2O production process in laboratory studies (61, 62). Beyond 0.70, SHAP values are more consistently positive, likely due to larger contributions from denitrification under anaerobic conditions (63, 64), a trend also identified by an earlier ML study (20). In coarser sandy soils, the optimum WFPS for high N2O emissions can occur at a lower range (0.40 to 0.60) likely due to increased nitrification facilitated by improved aeration and oxygen availability (65). The shallow trough in the response (0.50 to 0.60) likely reflects texture-dependent optima whereby N2O fluxes may be optimized below and above this range for coarse- (nitrification) and fine- (denitrification) textured soils, respectively.

For SOC content (SI Appendix, Fig. S8D), SHAP values rise quickly and linearly between about 20 and 50 Mg C ha−1, turning predominantly positive ~60 Mg C ha−1, beyond which the response slows, with values remaining consistent beyond ~80 Mg C ha−1, on average. This suggests that higher SOC contents increase the likelihood of larger N2O fluxes (66, 67) but that at sufficiently high levels, SOC can perhaps inhibit N2O production by favoring reduction to N2 (68).

The soil temperature profile shows predominantly negative SHAP values below 20 °C (SI Appendix, Fig. S8E), with a trough and the lowest values ~10 °C. There is a clear threshold ~20 °C, with values consistently positive above, and a peak at ~25 °C.

Nonlinear increases of N2O emissions are well known with increasing temperature (69), and denitrification is extremely sensitive to rising temperatures due to the tight coupling of the microbial C and N cycles and the succession of several temperature-sensitive microbial processes during the process (55), especially following long-term fertilization (70). The positive SHAP values at or below 0 °C may reflect active soil microbes during freeze/thaw processes that can lead to substantial pulses of N2O (71) potentially driven by the release of stored C due to macroaggregate fracturing (72).

The SHAP values for aboveground biomass (SI Appendix, Fig. S8F) show an inverse pattern, unique among the major impact variables (see also Fig. 3). Overall values are predominantly positive at low biomass and decline toward zero and then turn negative as biomass increases, indicating that higher biomass is generally associated with lower predicted N2O fluxes—but the effect is crop-specific (SI Appendix, Fig. S9), reflecting both differences in crop N uptake dynamics as well as the amount of crop residue returned to soil (73, 74). Lower aboveground biomass values have a stronger predictive influence on N2O fluxes than larger values, suggesting that the growing crop is taking up available mineral N from the soil and/or mineral N is being immobilized in crop residue, in either case suppressing N2O fluxes, whereas when N uptake and by extension biomass accumulation (crop growth) slows, the available N is more readily captured by microbes that produce N2O. While the effects of crop N uptake dynamics (e.g., ref. 70) and trade-off thresholds for N2O emissions (56, 75, 76) are well known, these thresholds have not been well captured in prior ML studies (20, 58).

We recognize that these threshold ranges are likely region-dependent, reflecting local soils, climate, and management, and may change elsewhere. However, the ranges we identify are consistent with previous analyses of long-term regional data, underscoring the general veracity of EMS thresholds.

Identifying Highly Episodic N2O Fluxes.

Fig. 4 visualizes the requirements that are conducive to large daily N2O fluxes that arise only when i) the soil has sufficient mineral N and labile C substrate and ii) the dual moisture–temperature threshold has been exceeded. The presence and separation of blue-green “low” and red-purple “high” point clusters reflect the spatial heterogeneity of N2O fluxes observed at subfield scales ranging from centimeters to tens of meters (65, 7780).

Fig. 4.

Scatter plots show soil temperature vs. W F P S with varying soil organic carbon and nitrogen levels. The size of the circles indicates the magnitude of row-specific variables.

Daily N2O flux predictions (~12,000) that cross SOC content (columns) with substrate indicators (rows). Within each panel, the point color denotes the model-predicted N2O flux (see color bins at Bottom) and the point size represents the magnitude of other row-specific variables (NO3- in the Top row—(AC); NH4+ in the Middle row—(DF); and aboveground biomass (ABG) in the Bottom row—(GI). The horizontal dashed line indicates WFPS = 0.65, and the vertical dashed line indicates soil temperature = 15 °C.

To elaborate, we grouped all daily predictions from the EMS into three SOC classes (78): Low SOC (<48 Mg C ha−1), intermediate SOC (48 to 78 Mg C ha−1), and high SOC (>78 Mg C ha−1) (three columns in Fig. 4) and, within each class, show the magnitude of N2O fluxes (different colors in Fig. 4) for different combinations of soil temperature (x-axis in Fig. 4) and WFPS (y-axis in Fig. 4). To further explore the interaction, we separate factors based on the amount (bubble size) of NO3- and NH4+contents as well as aboveground biomass (three rows in Fig. 4). This reveals i) the combinations of C and N substrate, moisture, and temperature factors that generate the largest daily N2O peaks, and ii) whether high plant N uptake (or biomass accumulation) dampens these peaks under otherwise favorable conditions.

In low SOC soils (Fig. 4 A, D, and G), even under high soil moisture (WFPS ≥ 0.7) and warm soil temperatures (≥15 °C), and alongside moderate to large concentrations of mineral N (e.g., >50 kg NO3- or >30 kg NH4+), many of the fluxes are low (<5 g N2O-N ha−1 d−1, blue points), with the vast majority less than 20 g N2O-N ha−1 d−1 (blue or green points), which we interpret as a consequence of low SOC with its inference of low dissolved organic carbon (DOC). This is consistent with the “cannon model” of Zhang et al. (78), wherein localized sites with sufficient NO3- and soil moisture lack the DOC needed to sustain substantial N2O production. Under intermediate SOC (Fig. 4 B, E, and H) under the same soil temperature and moisture conditions (i.e., top right-hand quadrants), daily fluxes of N2O are on average considerably larger than in low SOC soils (>50 g N2O-N ha−1 d−1, orange-red-purple points). Here, the mineral N concentrations are also typically larger, i.e., NO3- > 100 kg N ha−1 or NH4+ > 30 kg N ha−1, but high fluxes are also present with lower amounts, particularly NH4+ (i.e., <30 kg N ha−1). These higher SOC content soils presumably provide more labile DOC that can serve as an energy substrate for N2O-producing microbes (49), resulting in higher fluxes than equivalent conditions under low SOC, a scenario also consistent with Zhang et al.’s (78) conceptual model.

There is further evidence of the role of labile SOC on daily N2O fluxes in high SOC soils (Fig. 4 C, F, and I). In the top right-hand quadrants, the soil can be considered primed for denitrification, with warm, wet conditions accompanied by large soil NO3- or NH4+ pools generating many high flux episodes (>100 g N2O-N ha−1 d−1), visible as clusters of large red-purple points. Here, also, large fluxes (50 to 200 g N2O-N ha−1 d−1, orange-red points) can be seen under drier (WFPS ~ 0.30 to 0.50) conditions with large NO3- concentrations (>150 kg N ha−1) and colder (~8 to 12 °C) scenarios with large NH4+ concentrations (>50 kg N ha−1). That fluxes do not increase at WFPS > 0.9 and as soil temperature increases between 25 and 30 °C (i.e., do not show a color trend toward more red and purple points) suggests that in high SOC soils under very wet and warm conditions N2O fluxes level off, perhaps in deference to greater denitrification, which consumes N2O.

Across SOC stock levels (Fig. 4 G, H, and I), variation in aboveground biomass displays a more prominent threshold with daily soil temperature than with soil moisture. Data points representing greater biomass (>8 Mg ha−1) occur when soil temperatures are >15 °C compared to when temperatures are <15 °C, where aboveground biomass is predominantly <2 Mg ha−1. This reflects annual crop growth seasonality, where lower soil temperatures occur outside the main crop growing season and during the late fall to early spring period when winter cover crops may be grown, for example, in the US Midwest. As with mineral N variables, the highest fluxes (>100 g N2O-N ha−1 d−1, red-purple points) are most common in the top-right quadrant but occur relatively to a lesser extent than when mineral N concentrations are highest.

Although the EMS captures overall daily N2O fluxes well and identifies key drivers of variability, winter/fallow emissions remain a critical, often underrepresented contributor to annual cropland N2O budgets (81, 82). To evaluate model performance during this period, we isolated 1,564 observations from November–February, representing ~13% of the total number of daily records. For this subset, the EMS reproduced the observed winter range and variability well in the training data (r2 = 0.84; see SI Appendix, Fig. S10), but performance declined at the independent test sites (r2 = 0.50), likely reflecting the limited number of winter/fallow observations in the data. These results highlight winter and fallow periods as a key remaining source of uncertainty that warrants greater measurement and modeling attention.

Conclusions

The EMS shows how an ensemble of process-based ecosystem and ML models can substantially enhance the accuracy of N2O flux predictions across both spatial and temporal scales, capturing emission dynamics that traditional approaches often miss. By integrating outputs from multiple ecosystem models, the system both reduces prediction errors and reveals, in general terms, the underlying threshold responses for key drivers of flux variability. These insights provide a stronger scientific basis for refining ecosystem process-based models, perhaps enabling better representation of soil–plant–atmosphere interactions, and should improve their capacity to simulate GHG emissions at scale under diverse food, feed, and bioenergy crops, management practices, and environmental conditions. Incorporating additional data across broader geographies and different crops and management regimes will benefit the future ability of hybrid models such as the EMS to adequately represent soil N2O fluxes, allowing for better assessments of the impacts of farming on N2O emissions and for designing more effective N2O abatement technologies and practices in agriculture.

Materials and Methods

Chamber-Based Measurements.

A total of 17 experimental sites using static chambers to measure N2O fluxes were identified across the northern United States (Fig. 1). Data for 12 of the 17 locations were obtained from the GRACEnet/REAP network of US field experiments (35). Two locations are associated with the KBS Long-term Ecological Research site in Michigan (44, 46), and the remaining two sites were reconstructed from published studies in Iowa (42, 43). In total, data from 544 site-years were used to populate the dataset (Dataset S1A). Sites are broadly representative of regional soils, climates, cropping systems, and management practices (Table 1). Soils span the dominant fine-textured glacial till and loess soils of the Corn Belt (83) and northern Great Plains, and cropping systems represent the main commercial field-crop systems of the region (continuous corn; corn-soybean; corn-soybean-wheat/alfalfa rotations) under contrasting managements, including no-till, strip-till, and conventional tillage; rainfed and irrigated conditions; residue retained and removed; and different N fertilizers applied at rates from unfertilized controls to fertilizer excess. Together, sites account for a large and agronomically important share of US cropland (>50 Mha, ~75% of US corn and soybean area).

Data Generation and Processing.

The dataset was categorized into soil, crop, and weather groups to form the input feature set for our stacked ML model (SI Appendix, Table S3). Soil and crop variables were obtained by running five uncalibrated process-based ecosystem models (described below) for each experimental site, utilizing site-specific soil, weather, and management data. When initial soil conditions were not explicitly available from published studies, we sourced data from the nearest gSSURGO dataset (84) at 30 m resolution. Each process-based model simulates daily carbon-nitrogen dynamics, soil hydrology, and crop phenological stages, generating key state variables such as soil mineral N pools, WFPS, and above- and below-ground biomass production. In postprocessing, simulated volumetric soil–water content was converted to WFPS using site-specific soil bulk density values. Subsequently, all soil-related variables, including WFPS, mineral-N pools, and soil temperature, were applied to the top 30 cm of soil depth and used to calculate an unweighted model ensemble mean for all soil and crop variables. Weather data, including daily precipitation, maximum and minimum air temperature, short-wave radiation, wind speed, and specific humidity, were extracted from the nearest NASA POWER grid cell (85).

Because soil, crop, and weather variables were available daily, while N2O fluxes were measured intermittently, we aligned the datasets by extracting the daily soil, crop, and weather values for the exact dates of the flux measurements, so each flux observation is paired with same-day features. Given the highly skewed distribution of observed N2O fluxes, we applied a log transformation for modeling and then backtransformed predictions to their original physical units by exponentiation. All feature set variables were standardized (centered and scaled to unit variance) to ensure compatibility across different feature scales. After integrating all sources, our compiled dataset consisted of >12,000 site-day records, including 19 feature set variables and observed N2O flux values (Dataset S2).

Model Building.

Process-based ecosystem models.

We employed an ensemble of five process-based ecosystem models: APSIM (12), DSSAT (13), EPIC (14), SALUS (15), and STICS (16) (SI Appendix, Table S4 and Text S2). These models were selected based on their demonstrated ability to represent daily interactions among soil, plants, and the atmosphere, capturing variations in crop yields and ecosystem processes (53). Each model is unique in its mathematical representation of these interactions (SI Appendix, Table S5) and all operate with a daily time step and are responsive to various crop management practices, including variations in N fertilization, tillage, and irrigation. The outputs of the process-based model ensemble were coupled to the ML model ensemble as inputs. Although each process-based model also produces an N2O flux prediction, we did not use N2O as an ML input variable.

Machine learning algorithms.

The EMS integrates four supervised ML algorithms: Random Forest (30), Gradient Boosting Machine (31), SVR (32), and Extreme Gradient Boosting Regression (XgBoost) (33) (SI Appendix, Text S3). We applied a stacking approach, whereby the combined predictions of four base models are fed to a metalearner and fitted using ridge regression, effectively reducing collinearity without forcing uniform weighting (86). Several studies have found that stacked ML ensembles demonstrate superior predictive performance relative to individual models (8789). Hyperparameters for these algorithms were fixed based on preliminary grid searches to avoid cross-validation bias (SI Appendix, Text S4).

Model validation.

Our validation strategy is two-pronged. First, we used k-fold cross-validation (SI Appendix, Text S4) to assess the model’s ability to reproduce fluxes withheld from the training set (13 of the 17 total). In this approach (k = 5), we randomly partitioned all training data into five equal folds. For each iteration, onefold (1/5 of the data) was withheld for testing and the remaining four folds were used for training. Repeating this procedure five times with different folds held out helps reduce selection bias and yields an overall measure of model fit. Model evaluation metrics (R2 and RMSE) for training sites were computed on the held‐out fold in each iteration of fivefold cross‐validation, and then averaged across all folds to present in the final results.

Second, we assigned 20% of the sites (4 of 17) to an independent test set used solely to evaluate overall model performance. Twenty percent is a standard proportion (90) commonly used in wide-ranging fields including medicine (91, 92), geophysics (93), agriculture (94), hydrology (95), and climate science (96102), and as well in other N2O flux studies (20, 103, 104). Details of the test-site selection procedure appear in SI Appendix, Text S5.

Model interpretation using SHAP.

We applied SHAP analysis (34) to help interpret ML model outputs (SI Appendix, Text S6), decomposing predictions into contributions from individual features. SHAP values were calculated separately for tree-based models (TreesHAP) (105) and SVR (KernelsHAP) (106). These were visualized using beeswarm summary plots (Fig. 3) and one-variable dependence plots (SI Appendix, Fig. S8) to identify threshold behaviors for key predictors.

Uncertainty Analysis.

We quantified EMS prediction uncertainty with a Monte-Carlo approach (107). First, soil and plant variables spread was estimated from the five process-based models by treating their ensemble mean (μ) and SD (σ) as the parameters of a normal distribution N(μ, σ). Second, independent input errors were imposed on NASA-POWER weather forcings, using published uncertainty bounds: ±1.5 °C for air temperature (108); ±20% for precipitation (109); ±10% for short-wave radiation (110); and ±10% for wind speed (assumed equal to radiation uncertainty).

For each of the observation records (training set), we simulated 2000 Monte-Carlo iterations by i) sampling process-based model soil and plant variables from N (μ, σ) and ii) perturbing the weather inputs within their stated error ranges. Resulting daily flux predictions were summarized by their mean, SD, and 5th to 95th percentiles (Dataset S3).

Supplementary Material

Appendix 01 (PDF)

Dataset S01 (XLSX)

pnas.2524808123.sd01.xlsx (33.7KB, xlsx)

Dataset S02 (XLSX)

Dataset S03 (XLSX)

pnas.2524808123.sd03.xlsx (859.4KB, xlsx)

Acknowledgments

Support for this research was provided by the Great Lakes Bioenergy Research Center, U.S. Department of Energy, Office of Science, Biological and Environmental Research Program under Award Number DE-SC0018409; the NSF Long-term Ecological Research Program (DEB 2224712) at the Kellogg Biological Station; USDA NIFA (Award no. 2020-67021-32799); Michigan State University AgBioResearch; the CERCA-FFAR project; Climate Trace; and the Soil Inventory Project. M.D. was funded by the F.R.S.-FNRS, Belgium. We gratefully acknowledge the US Department of Agriculture, Natural Resources Conservation Service (USDA-NRCS) for the gSSURGO soils data and the USDA Agricultural Research Service GRACEnet/REAP program for providing N2O flux data via the USDA Ag Data Commons. Several sites are also part of the USDA ARS Long-Term Agroecosystem Research Network, for which we also acknowledge support. Portions of this article previously appeared as part of the PhD thesis of coauthor P.S.

Author contributions

P.S., B.B., and G.P.R. designed the research; P.S. and B.B. conducted the research; P.S., AM., and M.S.M. developed the modeling strategy; P.S., B.B., A.M., T.T., M.S., and M.D. performed the model simulations and analyzed data; P.S. and B.B. conceived the study; and P.S., B.B., N.M., and G.P.R. wrote the paper.

Competing interests

B.B. is cofounder of CIBO Technologies and The Soil Inventory Project. B.B. and G.P.R. hold stock in CIBO Technologies.

Footnotes

Reviewers: S.D.G., Colorado State University; and D.R., Queensland University of Technology Faculty of Science.

Contributor Information

Bruno Basso, Email: basso@msu.edu.

G. Philip Robertson, Email: robert30@msu.edu.

Data, Materials, and Software Availability

Numeric data have been deposited in datadryad.org (10.5061/dryad.pvmcvdnzx) (111).

Supporting Information

References

  • 1.Thompson R. L., et al. , Acceleration of global N2O emissions seen from two decades of atmospheric inversion. Nat. Clim. Chang. 9, 993–998 (2019). [Google Scholar]
  • 2.IPCC, Climate Change 2021: The physical science basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (Cambridge University Press, Cambridge UK and New York, NY, 2021), 10.1017/9781009157896. [DOI]
  • 3.Tian H., et al. , Global nitrous oxide budget (1980–2020). Earth Syst. Sci. Data 16, 2543–2604 (2024). [Google Scholar]
  • 4.Robertson G. P., Groffman P. M., “Chapter 14—Nitrogen transformations” in Soil Microbiology, Ecology and Biochemistry, Paul E. A., Frey S. D., Eds. (Elsevier, ed. 5, 2024), pp. 407–438. [Google Scholar]
  • 5.Robertson G. P., Denitrification and the challenge of scaling microsite knowledge to the globe. mLife 2, 229–238 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Levy P., et al. , Challenges in scaling up greenhouse gas fluxes: Experience from the UK greenhouse gas emissions and feedbacks program. J. Geophys. Res. Biogeosci. 127, e2021JG006743 (2022). [Google Scholar]
  • 7.IPCC, “2019 refinement to the 2006 IPCC guidelines for national greenhouse gas inventories, Volume 4: Agriculture, forestry and other land use. Chapter 11: N2O emissions from managed soils, and CO2 emissions from lime and urea application” (Intergovernmental Panel on Climate Change, Switzerland, 2019). [Google Scholar]
  • 8.Shcherbak I., Millar N., Robertson G. P., Global metaanalysis of the nonlinear response of soil nitrous oxide (N2O) emissions to fertilizer nitrogen. Proc. Natl. Acad. Sci. U.S.A. 111, 9199–9204 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Griffis T. J., et al. , Reconciling the differences between top-down and bottom-up estimates of nitrous oxide emissions for the U.S. Corn Belt. Glob. Biogeochem. Cycles 27, 746–754 (2013). [Google Scholar]
  • 10.Campbell E. E., et al. , Assessing the soil carbon, biomass production, and nitrous oxide emission impact of corn stover management for bioenergy feedstock production using DAYCENT. Bioenerg. Res. 7, 491–502 (2014). [Google Scholar]
  • 11.Del Grosso S. J., Halvorson A. D., Parton W. J., Testing DAYCENT model simulations of corn yields and nitrous oxide emissions in irrigated tillage systems in Colorado. J. Environ. Qual. 37, 1383–1389 (2008). [DOI] [PubMed] [Google Scholar]
  • 12.Keating B. A., et al. , An overview of APSIM, a model designed for farming systems simulation. Eur. J. Agron. 18, 267–288 (2003). [Google Scholar]
  • 13.Jones J. W., et al. , The DSSAT cropping system model. Eur. J. Agron. 18, 235–265 (2003). [Google Scholar]
  • 14.Izaurralde R. C., Williams J. R., McGill W. B., Rosenberg N. J., Jakas M. C. Q., Simulating soil C dynamics with EPIC: Model description and testing against long-term data. Ecol. Modell. 192, 362–384 (2006). [Google Scholar]
  • 15.Basso B., Ritchie J. T., “Simulating crop growth and biogeochemical fluxes in response to land management using the SALUS model” in The Ecology of Agricultural Landscapes: Long-Term Research on the Path to Sustainability, Hamilton S. K., Doll J. E., Robertson G. P., Eds. (Oxford University Press, NY, 2015), pp. 252–274. [Google Scholar]
  • 16.Beaudoin N., et al. , STICS Soil-Crop Model (Éditions Quae, 2023). [Google Scholar]
  • 17.Parton W. J., Hartman M., Ojima D., Schimel D., DAYCENT and its land surface submodel: Description and testing. Glob. Planet. Change 19, 35–48 (1998). [Google Scholar]
  • 18.Li C. S., Modeling trace gas emissions from agricultural ecosystems. Nutr. Cycl. Agroecosyst. 58, 259–276 (2000). [Google Scholar]
  • 19.Gaillard R. K., et al. , Underestimation of N2O emissions in a comparison of the Daycent, DNDC, and EPIC models. Ecol. Appl. 28, 694–708 (2018). [DOI] [PubMed] [Google Scholar]
  • 20.Saha D., Basso B., Robertson G. P., Machine learning improves predictions of agricultural nitrous oxide (N2O) emissions from intensively managed cropping systems. Environ. Res. Lett. 16, 024004 (2021). [Google Scholar]
  • 21.Ehrhardt F., et al. , Assessing uncertainties in crop and pasture ensemble model simulations of productivity and N2O emissions. Glob. Change Biol. 24, e603–e616 (2018). [DOI] [PubMed] [Google Scholar]
  • 22.Hamrani A., Akbarzadeh A., Madramootoo C. A., Machine learning for predicting greenhouse gas emissions from agricultural soils. Sci. Total Environ. 741, 140338 (2020). [DOI] [PubMed] [Google Scholar]
  • 23.Kim T., et al. , Quantifying nitrogen loss hotspots and mitigation potential for individual fields in the US Corn Belt with a metamodeling approach. Environ. Res. Lett. 16, 075008 (2021). [Google Scholar]
  • 24.Liu L., et al. , KGML-ag: A modeling framework of knowledge-guided machine learning to simulate agroecosystems: A case study of estimating N2O emission using data from mesocosm experiments. Geosci. Model Dev. 15, 2839–2858 (2022). [Google Scholar]
  • 25.Karpatne A., et al. , Theory-guided data science: A new paradigm for scientific discovery from data. IEEE Trans. Knowl. Data Eng. 29, 2318–2331 (2017). [Google Scholar]
  • 26.Liu L., et al. , Knowledge-guided machine learning can improve carbon cycle quantification in agroecosystems. Nat. Commun. 15, 357 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Yang Q., et al. , A flexible and efficient knowledge-guided machine learning data assimilation (KGML-DA) framework for agroecosystem prediction in the US Midwest. Remote Sens. Environ. 299, 113880 (2023). [Google Scholar]
  • 28.ElGhawi R., et al. , Hybrid modeling of evapotranspiration: Inferring stomatal and aerodynamic resistances using combined physics-based and machine learning. Environ. Res. Lett. 18, 034039 (2023). [Google Scholar]
  • 29.Aderele M. O., Srivastava A. K., Butterbach-Bahl K., Rahimi J., Integrating machine learning with agroecosystem modelling: Current state and future challenges. Eur. J. Agron. 168, 127610 (2025). [Google Scholar]
  • 30.Breiman L., Random forests. Mach. Learn. 45, 5–32 (2001). [Google Scholar]
  • 31.Friedman J. H., Greedy function approximation: A gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001). [Google Scholar]
  • 32.Drucker H., Burges C. J. C., Kaufman L., Smola A., Vapnik V., “Support vector regression machines” in Proceedings of the 10th International Conference on Neural Information Processing Systems, NIPS’96, Jordan M. I., Petsche T., Eds. (MIT Press, 1996), pp. 155–161. [Google Scholar]
  • 33.Chen T., Guestrin C., “XGBoost: A scalable tree boosting system” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’16 (Association for Computing Machinery, 2016), pp. 785–794. [Google Scholar]
  • 34.Vega García M., Aznarte J. L., Shapley additive explanations for NO2 forecasting. Ecol. Inform. 56, 101039 (2020). [Google Scholar]
  • 35.Del Grosso S. J., et al. , Introducing the GRACEnet/REAP data contribution, discovery, and retrieval system. J. Environ. Qual. 42, 1274–1280 (2013). [DOI] [PubMed] [Google Scholar]
  • 36.Hammerbeck A. L., Stetson S. J., Osborne S. L., Schumacher T. E., Pikul J. L., Corn residue removal impact on soil aggregates in a no-till corn/soybean rotation. Soil Sci. Soc. Am. J. 76, 1390–1398 (2012). [Google Scholar]
  • 37.Jin V. L., et al. , Soil greenhouse gas emissions in response to corn stover removal and tillage management across the U.S. Corn Belt. Bioenerg. Res. 7, 517–527 (2014). [Google Scholar]
  • 38.Johnson J. M. F., Archer D., Barbour N., Greenhouse gas emission from contrasting management scenarios in the northern corn belt. Soil Sci. Soc. Am. J. 74, 396–406 (2010). [Google Scholar]
  • 39.Smith D. R., Hernandez-Ramirez G., Armstrong S. D., Bucholtz D. L., Stott D. E., Fertilizer and tillage management impacts on non-carbon-dioxide greenhouse gas emissions. Soil Sci. Soc. Am. J. 75, 1070–1082 (2011). [Google Scholar]
  • 40.Venterea R. T., Coulter J. A., Dolan M. S., Evaluation of intensive “4R” strategies for decreasing nitrous oxide emissions and nitrogen surplus in rainfed corn. J. Environ. Qual. 45, 1186–1195 (2016). [DOI] [PubMed] [Google Scholar]
  • 41.Skinner R. H., Corson M. S., Rotz C. A., Comparison of two pasture growth models of differing complexity. Agric. Syst. 99, 35–43 (2008). [Google Scholar]
  • 42.Bremner J. M., Breitenbeck G. A., Blackmer A. M., Effect of nitrapyrin on emission of nitrous oxide from soil fertilized with anhydrous ammonia. Geophys. Res. Lett. 8, 353–356 (1981). [Google Scholar]
  • 43.Guzman J., Al-Kaisi M., Parkin T., Greenhouse gas emissions dynamics as influenced by corn residue removal in a continuous corn system. Soil Sci. Soc. Am. J. 79, 612–625 (2015). [Google Scholar]
  • 44.Oates L. G., et al. , Nitrous oxide emissions during establishment of eight alternative cellulosic bioenergy cropping systems in the North Central United States. GCB Bioenergy 8, 539–549 (2016). [Google Scholar]
  • 45.Gelfand I., et al. , Empirical evidence for the potential climate benefits of decarbonizing light vehicle transport in the U.S. with bioenergy from purpose-grown biomass with and without BECCS. Environ. Sci. Technol. 54, 2961–2974 (2020). [DOI] [PubMed] [Google Scholar]
  • 46.Robertson G. P., Hamilton S. K., “Long-term ecological research in agricultural landscapes at the Kellogg Biological Station LTER site: Conceptual and experimental framework” in The Ecology of Agricultural Landscapes, Hamilton S. K., Doll J. E., Robertson G. P., Eds. (Oxford University Press, 2015), pp. 1–32.
  • 47.Dungan R. S., Leytem A. B., Tarkalson D. D., Ippolito J. A., Bjorneberg D. L., Greenhouse gas emissions from an irrigated dairy forage rotation as influenced by fertilizer and manure applications. Soil Sci. Soc. Am. J. 81, 537–545 (2017). [Google Scholar]
  • 48.Barsotti J. L., Sainju U. M., Lenssen A. W., Montagne C., Hatfield P. G., Net greenhouse gas emissions affected by sheep grazing in dryland cropping systems. Soil Sci. Soc. Am. J. 77, 1012–1025 (2013). [Google Scholar]
  • 49.Ren T., Ukalska-Jaruga A., Smreczak B., Cai A., Dissolved organic carbon in cropland soils: A global meta-analysis of management effects. Agric. Ecosyst. Environ. 371, 109080 (2024). [Google Scholar]
  • 50.Millar N., Robertson G. P., “Nitrogen transfers and transformations in row-crop ecosystems” in The Ecology of Agricultural Landscapes, Hamilton S. K., Doll J. E., Robertson G. P., Eds. (Oxford University Press, 2015), pp. 213–251.
  • 51.Sistani K. R., Warren J. G., Lovanh N., Higgins S., Shearer S., Greenhouse gas emissions from swine effluent applied to soil by different methods. Soil Sci. Soc. Am. J. 74, 429–435 (2010). [Google Scholar]
  • 52.Mosier A. R., Halvorson A. D., Reule C. A., Liu X. J., Net global warming potential and greenhouse gas intensity in irrigated cropping systems in northeastern Colorado. J. Environ. Qual. 35, 1584–1598 (2006). [DOI] [PubMed] [Google Scholar]
  • 53.Basso B., et al. , A multi model ensemble reveals net climate benefits from regenerative practices in US Midwest croplands. Sci. Rep. 15, 24881 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Gelfand I., Shcherbak I., Millar N., Kravchenko A. N., Robertson G. P., Long-term nitrous oxide fluxes in annual and perennial agricultural and unmanaged ecosystems in the upper Midwest USA. Glob. Change Biol. 22, 3594–3607 (2016). [DOI] [PubMed] [Google Scholar]
  • 55.Butterbach-Bahl K., Baggs E. M., Dannenmann M., Kiese R., Zechmeister-Boltenstern S., Nitrous oxide emissions from soils: How well do we understand the processes and their controls? Phil. Trans. R. Soc. Lond. B Biol. Sci. 368, 20130122 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Timilsina A., et al. , Plants mitigate ecosystem nitrous oxide emissions primarily through reductions in soil nitrate content: Evidence from a meta-analysis. Sci. Total Environ. 949, 175115 (2024). [DOI] [PubMed] [Google Scholar]
  • 57.Grundmann G. L., Renault P., Rosso L., Bardin R., Differential effects of soil water content and temperature on nitrification and aeration. Soil Sci. Soc. Am. J. 59, 1342–1349 (1995). [Google Scholar]
  • 58.Dhaliwal J. K., Panday D., Robertson G. P., Saha D., Machine learning reveals dynamic controls of soil nitrous oxide emissions from diverse long-term cropping systems. J. Environ. Qual. 54, 132–146 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Parton W. J., et al. , Generalized model for N2 and N2O production from nitrification and denitrification. Glob. Biogeochem. Cycles 10, 401–412 (1996). [Google Scholar]
  • 60.Weier K. L., et al. , Denitrification and the dinitrogen/nitrous oxide ratio as affected by soil water, available carbon, and nitrate. Soil Sci. Soc. Am. J. 57, 66–72 (1993). [Google Scholar]
  • 61.Linn D. M., Doran J. W., Effect of water-filled pore space on carbon dioxide and nitrous oxide production in tilled and nontilled soils. Soil Sci. Soc. Am. J. 48, 1267–1272 (1984). [Google Scholar]
  • 62.Wang H., et al. , Quantifying nitrous oxide production rates from nitrification and denitrification under various moisture conditions in agricultural soils: Laboratory study and literature synthesis. Front. Microbiol. 13, 1110151 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Davidson E. A., Keller M., Erickson H. E., Verchot L. V., Veldkamp E., Testing a conceptual model of soil emissions of nitrous and nitric oxides: Using two functions based on soil nitrogen availability and soil water content, the hole-in-the-pipe model characterizes a large fraction of the observed variation of nitric oxide and nitrous oxide emissions from soils. BioScience 50, 667–680 (2000). [Google Scholar]
  • 64.Bateman E. J., Baggs E. M., Contributions of nitrification and denitrification to N2O emissions from soils at different water-filled pore space. Biol. Fertil. Soils 41, 379–388 (2005). [Google Scholar]
  • 65.Zentgraf I., et al. , How scale affects N2O emissions in heterogeneous fields of a diversified agricultural landscape. Sci. Rep. 15, 11013 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Li C., Frolking S., Butterbach-Bahl K., Carbon sequestration in arable soils is likely to increase nitrous oxide emissions, offsetting reductions in climate radiative forcing. Clim. Change 72, 321–338 (2005). [Google Scholar]
  • 67.Kelley L. A., et al. , Changes in soil N2O emissions and nitrogen use efficiency following long-term soil carbon storage: Evidence from a mesocosm experiment. Agric. Ecosyst. Environ. 370, 109054 (2024). [Google Scholar]
  • 68.Mathieu O., et al. , Emissions and spatial variability of N2O, N2 and nitrous oxide mole fraction at the field scale, revealed with 15N isotopic techniques. Soil Biol. Biochem. 38, 941–951 (2006). [Google Scholar]
  • 69.Schaufler G., et al. , Greenhouse gas emissions from European soils under different land use: Effects of soil moisture and temperature. Eur. J. Soil Sci. 61, 683–696 (2010). [Google Scholar]
  • 70.Cui P., et al. , Long-term organic and inorganic fertilization alters temperature sensitivity of potential N2O emissions and associated microbes. Soil Biol. Biochem. 93, 131–141 (2016). [Google Scholar]
  • 71.Wagner-Riddle C., Congreves K. A., Brown S. E., Helgason W. D., Farrell R. E., Overwinter and spring thaw nitrous oxide fluxes in a northern prairie cropland are limited but a significant proportion of annual emissions. Glob. Biogeochem. Cycles 38, e2023GB008051 (2024). [Google Scholar]
  • 72.Ruan L., Robertson G. P., Reduced snow cover increases wintertime nitrous oxide (N2O) emissions from an agricultural soil in the upper U.S. Midwest. Ecosystems 20, 917–927 (2017). [Google Scholar]
  • 73.Chen H., Li X., Hu F., Shi W., Soil nitrous oxide emissions following crop residue addition: A meta-analysis. Glob. Change Biol. 19, 2956–2964 (2013). [DOI] [PubMed] [Google Scholar]
  • 74.Abalos D., et al. , A review and meta-analysis of mitigation measures for nitrous oxide emissions from crop residues. Sci. Total Environ. 828, 154388 (2022). [DOI] [PubMed] [Google Scholar]
  • 75.Van Groenigen J. W., Velthof G. L., Oenema O., Van Groenigen K. J., Van Kessel C., Towards an agronomic assessment of N2O emissions: A case study for arable crops. Eur. J. Soil Sci. 61, 903–913 (2010). [Google Scholar]
  • 76.McSwiney C. P., Robertson G. P., Nonlinear response of N2O flux to incremental fertilizer addition in a continuous maize (Zea mays L.) cropping system. Glob. Change Biol. 11, 1712–1719 (2005). [Google Scholar]
  • 77.Kim N., et al. , Spatial variability of agricultural soil carbon dioxide and nitrous oxide fluxes: Characterization and recommendations from spatially high-resolution, multi-year dataset. Agric. Ecosyst. Environ. 387, 109636 (2025). [Google Scholar]
  • 78.Zhang Z., Eddy W. C., Stuchiner E. R., DeLucia E. H., Yang W. H., A conceptual model explaining spatial variation in soil nitrous oxide emissions in agricultural fields. Commun. Earth Environ. 5, 1–11 (2024). [Google Scholar]
  • 79.Turner P. A., Griffis T. J., Mulla D. J., Baker J. M., Venterea R. T., A geostatistical approach to identify and mitigate agricultural nitrous oxide emission hotspots. Sci. Total Environ. 572, 442–449 (2016). [DOI] [PubMed] [Google Scholar]
  • 80.Kravchenko A. N., et al. , Hotspots of soil N2O emission enhanced through water absorption by plant residue. Nat. Geosci. 10, 496–500 (2017). [Google Scholar]
  • 81.Dungan R. S., et al. , Growing and non-growing season nitrous oxide emissions from a manured semiarid cropland soil under irrigation. Agric. Ecosyst. Environ. 348, 108413 (2023). [Google Scholar]
  • 82.Wagner-Riddle C., Baggs E. M., Clough T. J., Fuchs K., Petersen S. O., Mitigation of nitrous oxide emissions in the context of nitrogen loss reduction from agroecosystems: Managing hot spots and hot moments. Curr. Opin. Environ. Sustain. 47, 46–53 (2020). [Google Scholar]
  • 83.Thaler E. A., Larsen I. J., Yu Q., The extent of soil loss across the US Corn Belt. Proc. Natl. Acad. Sci. U.S.A. 118, e1922375118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Soil Survey Staff, “Gridded Soil Survey Geographic (gSSURGO) Database for the Conterminous United States” (United States Department of Agriculture, Natural Resources Conservation Service; ). https://www.nrcs.usda.gov/resources/data-and-reports/gridded-soil-survey-geographic-gssurgo-database. Deposited 16 November 2020. [Google Scholar]
  • 85.NASA, Langley Research Center (LaRC), POWER data. https://power.larc.nasa.gov/docs/services/. Deposited 2 February 2024.
  • 86.McDonald G. C., Ridge regression. wires. Comput. Stat. 1, 93–100 (2009). [Google Scholar]
  • 87.Kalule R., Abderrahmane H. A., Alameri W., Sassi M., Stacked ensemble machine learning for porosity and absolute permeability prediction of carbonate rock plugs. Sci. Rep. 13, 9855 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Lin C., Yeap T., Kiringa I., “Stacked bidirectional LSTM for predicting emission of nitrous oxide” in Proceedings of the Canadian Conference on Artificial Intelligence (Canadian Artificial Intelligence Association, 2022), 10.21428/594757db.daad1be1. [DOI] [Google Scholar]
  • 89.Ghasemian A., Hosseinmardi H., Galstyan A., Airoldi E. M., Clauset A., Stacking models for nearly optimal link prediction in complex networks. Proc. Natl. Acad. Sci. U.S.A. 117, 23393–23400 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Park C., et al. , Unifying machine learning and interpolation theory via interpolating neural networks. Nat. Commun. 16, 8753 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Takura T., Hirano Goto K., Honda A., Development of a predictive model for integrated medical and long-term care resource consumption based on health behaviour: Application of healthcare big data of patients with circulatory diseases. BMC Med. 19, 15 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Tayebi Arasteh S., et al. , Large language models streamline automated machine learning for clinical studies. Nat. Commun. 15, 1603 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Rezaei Mirghaed B., Dehghan Monfared A., Ranjbar A., Enhanced petrophysical evaluation through machine learning and well logging data in an Iranian oil field. Sci. Rep. 14, 28941 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Dhanaraj R. K., Maragatharajan M., Sureshkumar A., Balakannan S. P., On-device AI for climate-resilient farming with intelligent crop yield prediction using lightweight models on smart agricultural devices. Sci. Rep. 15, 31195 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Hou H., et al. , The response of meteorological drought to extreme climate in the water-receiving area of the Tao river diversion project in China. Sci. Rep. 15, 42077 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Baruah A., et al. , A novel spatiotemporal prediction approach to fill air pollution data gaps using mobile sensors, machine learning and citizen science techniques. NPJ Clim. Atmos. Sci. 7, 310 (2024). [Google Scholar]
  • 97.Wani O. A., et al. , Predicting rainfall using machine learning, deep learning, and time series models across an altitudinal gradient in the North-Western Himalayas. Sci. Rep. 14, 27876 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Khosravi Y., Ouarda T. B. M. J., Homayouni S., Developing an ensemble machine learning framework for enhanced climate projections using CMIP6 data in the Middle East. NPJ Clim. Atmos. Sci. 8, 174 (2025). [Google Scholar]
  • 99.Balasubramaniam T., Mohotti W. A., Sabir K., Nayak R., Feature engineering on climate data with machine learning to understand time-lagging effects in pasture yield prediction. Ecol. Inform. 86, 103011 (2025). [Google Scholar]
  • 100.Dahal D., et al. , Analyzing climate dynamics and developing machine learning models for flood prediction in Sacramento, California. Hydroecol. Eng. 1, 10003 (2024). [Google Scholar]
  • 101.Khanal R., Dhungel S., Brewer S. C., Barber M. E., Statistical modeling to predict climate change effects on watershed-scale evapotranspiration. Atmosphere 12, 1565 (2021). [Google Scholar]
  • 102.Wang Z., Wilby R. L., Yu D., Forecasting global rainfall in a changing climate: A machine learning approach using Köppen-Geiger zones. Earth Syst. Environ. 10.1007/s41748-025-00876-9 (2025). [DOI] [Google Scholar]
  • 103.Bofa A., Zewotir T., Machine learning analysis of greenhouse gas sources impacting Africa’s food security nexus. Sci. Rep. 15, 28665 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Gnisia G., et al. , Machine learning-based prediction of nitrous oxide emissions from arable farming: Exploring management practices as predictor variables. Ecol. Indic. 172, 113233 (2025). [Google Scholar]
  • 105.Yang J., Fast TreeSHAP: Accelerating SHAP value computation for trees. arXiv [Preprint] (2022). http://arxiv.org/abs/2109.09847 (Accessed 24 June 2025).
  • 106.Lundberg S. M., Lee S.-I., “A unified approach to interpreting model predictions” in Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Guyon I., et al. Eds. (Curran Associates Inc., 2017), pp. 4768–4777. [Google Scholar]
  • 107.Gentle J. E., “Monte Carlo simulation” in Wiley StatsRef: Statistics Reference Online, Balakrishnan N., et al. Eds. (John Wiley & Sons Ltd, 2015), pp. 1–11. [Google Scholar]
  • 108.Rodrigues G. C., Braga R. P., Evaluation of NASA POWER reanalysis products to estimate daily weather variables in a hot summer Mediterranean climate. Agronomy 11, 1207 (2021). [Google Scholar]
  • 109.Tan M. L., et al. , Evaluation of NASA POWER and ERA5-Land for estimating tropical precipitation and temperature extremes. J. Hydrol. 624, 129940 (2023). [Google Scholar]
  • 110.White J. W., Hoogenboom G., Wilkens P. W., Stackhouse P. W. Jr., Hoel J. M., Evaluation of satellite-based, model-derived daily solar radiation data for the continental United States. Agron. J. 103, 1242–1251 (2011). [Google Scholar]
  • 111.Sharma P., et al. , Data from “Coupled machine-learning ensemble models substantially improve predictions of nitrous oxide (N2O) fluxes from US cropland” [Dataset]. Dryad. 10.5061/dryad.pvmcvdnzx. Deposited 9 September 2025. [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix 01 (PDF)

Dataset S01 (XLSX)

pnas.2524808123.sd01.xlsx (33.7KB, xlsx)

Dataset S02 (XLSX)

Dataset S03 (XLSX)

pnas.2524808123.sd03.xlsx (859.4KB, xlsx)

Data Availability Statement

Numeric data have been deposited in datadryad.org (10.5061/dryad.pvmcvdnzx) (111).


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES