Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2024 Jun 26;14:14699. doi: 10.1038/s41598-024-64269-0

Securing China’s rice harvest: unveiling dominant factors in production using multi-source data and hybrid machine learning models

Ali Mokhtar 1,2, Hongming He 1,, Mohsen Nabil 3, Saber Kouadri 4, Ali Salem 5,6,, Ahmed Elbeltagi 7
PMCID: PMC11208568  PMID: 38926368

Abstract

Ensuring the security of China’s rice harvest is imperative for sustainable food production. The existing study addresses a critical need by employing a comprehensive approach that integrates multi-source data, including climate, remote sensing, soil properties and agricultural statistics from 2000 to 2017. The research evaluates six artificial intelligence (AI) models including machine learning (ML), deep learning (DL) models and their hybridization to predict rice production across China, particularly focusing on the main rice cultivation areas. These models were random forest (RF), extreme gradient boosting (XGB), conventional neural network (CNN) and long short-term memory (LSTM), and the hybridization of RF with XGB and CNN with LSTM based on eleven combinations (scenarios) of input variables. The main results identify that hybrid models have performed better than single models. As well, the best scenario was recorded in scenarios 8 (soil variables and sown area) and 11 (all variables) based on the RF-XGB by decreasing the root mean square error (RMSE) by 38% and 31% respectively. Further, in both scenarios, RF-XGB generated a high correlation coefficient (R2) of 0.97 in comparison with other developed models. Moreover, the soil properties contribute as the predominant factors influencing rice production, exerting an 87% and 53% impact in east and southeast China, respectively. Additionally, it observes a yearly increase of 0.16 °C and 0.19 °C in maximum and minimum temperatures (Tmax and Tmin), coupled with a 20 mm/year decrease in precipitation decline a 2.23% reduction in rice production as average during the study period in southeast China region. This research provides valuable insights into the dynamic interplay of environmental factors affecting China’s rice production, informing strategic measures to enhance food security in the face of evolving climatic conditions.

Keywords: Climate change, Vegetation indices, Food security, Hybrid machine learning models, Rice production

Subject terms: Mathematics and computing, Climate change, Plant sciences

Introduction

Early and accurate crop production forecasting is essential for policymakers to make timely decisions for export–import commerce, which is the foundation for a country's food security1. It is also necessary for agricultural producers to avoid bad crop selection, which could cause incalculable losses in profits due to over-production and under-production24. Moreover, the cropland loss observed in various nations over the past years with high food demand owing to population growth requires accurate and up-to-date crop yield forecasting to maintain food security5. To prevent these losses, predicting crop production is required. However, human predictions are not effective with increasing amounts of agricultural data. Instead, machine learning has been raised as a promising option for this goal6.

Machine learning was created in data mining as a methodology for teaching computer concepts711. This model uses the learning idea to predict new sets of data given big data sets through training and testing. The present study selected rice as one of the world's three major crops extensively farmed and consumed, along with wheat and maize1214. Nearly 88% of the world's rice is grown in Asian nations, where 2.4 billion people eat rice daily15.

Given the importance of rice to national food security, several studies implemented various machine-learning techniques for forecasting rice yield. Jabjone and Jiamrum16 developed an artificial neural network (ANN) model to predict rice production in the Phimai district, Thailand. The developed ANN model achieved highly accurate estimation with low errors (low RMSE) in rice yield forecasting using meteorological factors, including rainfall, water distribution, evapotranspiration, temperature, humidity, and wind speed16. Marndi, Ramesh17 applied long short-term memory (LSTM) for predicting rice yield using different input scenarios. The best LSTM model was achieved using rainfall as an input variable for rice yield forecasting. Sultana and Khanam18 compared the performance of Auto-regressive Integrated Moving Average (ARIMA) and Artificial Neural Network (ANN) on univariate time series data of yearly rice production from 1972 to 2013. According to this study, the ARIMA model outperforms the ANN model since the estimated error of ANN was significantly higher than ARIMA errors. In addition, Balakrishnan and Muthukumarasamy1 suggested an ensemble model to predict crop production over time based on the Ada support vector machine (SVM) and Ada and Naive Bayes (Naive), where Ada SVM and Ada Naive performed better than SVM and Naive Bayes.

Multiple input variables were used in rice yield estimation, including climatic data, remote sensing data, and statistical data (e.g. sowing area). Climatic variables showed a significant relationship with rice yield in several studies16,17,19,20. For example, the temperature increases by 1–2 °C during the paddy earring stage causing a decrease in paddy rice production by 10–20%21. Compared to technology, input, and social and economic factors, climate factors individually explain 84% of the variation in paddy rice production22. Moreover, remote sensing vegetation indices such as normalized difference vegetation index (NDVI) and radar vegetation index (RVI) were found to be highly efficient in evaluating rice production since they quantify the crop photosynthetic activity responsible for biomass formation23. NDVI derived from Moderate Resolution Imaging Spectroradiometer (MODIS) (AQUA/TERRA) imageries achieved a high correlation (R2 = 0.85) with rice production as estimated by Faisal, Rahman24, and R2 of 0.76 to 0.86 as estimated by Mosleh and Hassan15. SAR data captured by RADARSAT has also proved a high accuracy (97.4% and 96.6%) in estimating rice production based on back-scatter25.

Although several studies have discussed the use of machine learning in rice yield prediction, hybrid models that integrate two models are still poorly documented. In addition, integrating multi-data sources such as climate data, remote sensing, and agricultural statistics in rice yield estimation is poorly tested. Therefore, the present study aims to (1) Develop multiple single and hybrid machine learning models for predicting rice production across China, the world's biggest rice producer, producing 211 million tons26 to test multi-input scenarios (climatic variables, remote sensing, agriculture statistics and soil properties) to define the optimal combination of input variables to generate the most accurate rice production model. (2) Select the main dominant factors (climate, soil, remote sensing and sown area) that influence the rice production in each zonal scale. (3) Introduce optimal solutions for improving rice production across China. This research is critical in determining the best approach (optimal model and input variables) that could be used as a simple, rapid, and inexpensive approach for timely and reliable rice production prediction at regional scales across China. Therefore, the main contributions of the research paper are as follows.

  1. This study attempts to model and predict rice production using multi-source data and hybrid machine-learning algorithms.

  2. This study provides an in-depth comparative analysis of the proposed hybrid model with single machine learning models such as random forest (RF), extreme gradient boosting (XGB), conventional neural network (CNN) and long short-term memory (LSTM), and the hybrid RF-XGB and CNN-LSTM algorithms with eleven combinations (scenarios) of input variables across China.

  3. This study investigates and figures out the main dominant factor for rice production across China’s main rice counties based on multi-input scenarios (climatic variables, remote sensing, agriculture statistics and soil properties).

Materials

Study area

In this study, we focused on the main cultivation areas of rice in mainland China, dominated by single-rice system (i.e. one rice harvest per year in a given field) and double-rice system (i.e. two rice harvests per year in a given field) (Fig. 1). The study area covers approximately 29 million hectares in nine provinces. This region, between 20° 10′ N ~ 53° 33′ N and 105° 54′ E ~ 135° 05′ E, is the most important food basket in China, accounting for ~ 96% of the total rice cultivation area and ~ 94% of the total rice production in China2729. China, the world’s largest rice producer (about 206 million metric tons of annual production), accounts for 28% of the world’s rice production30. Rice occupies 41% of total grain production with only 35% of the cropland areas in China, which feeds roughly 65% of Chinese people31. The nine provinces are Heilongjiang, Shaanxi, Liaoning, Hainan, Anhui, Hebei, Henan, Guangdong and Shandong. The large difference in latitude leads to a pronounced variation in illumination conditions during the year: in South China, the minimum and maximum daily sunshine duration are 11 and 13 h while in North China they are 7 and 17 h, respectively. Due to its location at the eastern margin of the Eurasian continent, the climate of the eastern part of China is monsoonal with warm and humid summers and temperate, dry winters.

Figure 1.

Figure 1

(a) China’s rice districts and distribution of meteorological stations, (b) the flowchart of methodology. The map in Fig. 1a was generated with the ArcGIS10.8 software and (b) was generated based on Microsoft PowerPoint.

Datasets

The monthly meteorological datasets over the rice districts in nine provinces across China were retrieved from the China National Meteorological Data Sharing Platform3235. The data on rice production and sown area of 64 rice districts from 2000 to 2017 were extracted from the National Bureau of Statistics of China (Table 1). Moreover, for the remote sensing datasets, three vegetation indices (VIs) and two biophysical parameters (BPs) were used in the present study to estimate rice production. These five parameters are available on Google Earth Engine (GEE, https://developers.google.com/earth-engine/datasets/) with a spatial resolution of 500 m. The VIs were widely used in earlier studies as production estimators due to their relevance to vegetation health18,36,37. BPs were also used in wheat yield prediction3. Compared to VIs, the BPs are usually more reliable in estimating crop production since they more adequately reflect the state of the crops and thus could be more accurate in predicting crop yield and production. The present study used GEE to estimate the average annual value of all five parameters over the 64 rice districts in China. In addition to weather data, soil properties including soil depth, soil organic matter, pH, cation exchange capacity, porosity, bulk density, NPK and soil texture for the topsoil layer (0–30 cm) and the subsoil layer (30–100 cm) at 0.00833° (~ 1 km) were also collected and detailed in http://globalechange.bnu.edu.cn Ref.38.

Table 1.

Summary of the collected datasets.

Category Variables Spatial resolution Temporal resolution Time coverage Source
Climate data Tmax, Tmin, Tave, Pre, RH, WS, SS) 1 km Daily 2000–2017 China National Meteorological Data Sharing Platform (http://data.cma.cn/en)
Remote sensing data

NDVI, EVI, LAI, NDWI

NPP

500 m

16-day

Yearly

2000–2017 MODIS Terra Daily (https://lpdaac.usgs.gov/data/)
Production Rice (Pro) and Swon area (SA) County Year 2000–2017 National Bureau of Statistics of China (www.epSchinadata.com/)
Soil data SOM, pH, DEP, POR, BD, NPK, Texture, CEC 1 km Ref.38 (http://globalechange.bnu.edu.cn)

Tmax maximum temperature, Tmin minimum temperature, Pre precipitation, RH relative humidity, WS wind speed, SS sunshine, NDVI normalized difference vegetation index, EVI Enhanced vegetation index, LAI leaf area index, NPP net primary productivity and NDWI normalized difference water index, DEP soil depth, SOM soil organic matter, pH POR porosity, BD bulk density, N nitrogen, P phosphorus, K potassium, CEC cation exchange capacity.

Methodology

The general methodology of the present study is shown in Fig. 1b. The study used multi-data sources, including remote sensing, climate data, agriculture statistics and soil properties data, as input variables to single and hybrid algorithms to predict the rice production. Description of the developed single and hybrid models in this work was presented as follows:

Single models

Extreme gradient boosting (XGB)

The XGB algorithm suggested by Ref.39 is a novel improvement of the Gradient Boosting Machine based on regression trees. The algorithm is based on the idea of “boosting”, which combines all the predictions of a set of “weak” learners to develop a “strong” learner through additive training strategies, for more detailed information and the computation procedures of the XGB algorithm can be found in Ref.39. We applied the XGB by using the grid search method for different n estimators (number of trees) and max depth.

Random forest (RF)

The RF model, developed by Breiman40, is based on an ensemble of decision trees with controlled variance. The RF model has been widely used for regression and classification problems Such as land use/cover mapping41 and water quality field42,43. The detailed data and computation procedure of the RF model can be found in Refs.40,44.

Long short-term memory (LSTM)

LSTM is a special type of recurrent neural network (RNN)45 used to handle sequential data with advantages over traditional RNN. An LSTM network contains different memory blocks, which are linked through layers. Each layer includes a set of frequently connected memory pixels and three multiplicative units, namely the input, forget, and output gates46,47. The Adam training algorithm was used; the learning rate was set to 0. 0001 and the batch size was set to 548.

Conventional neural network (CNN)

The convolution layers are the main difference between CNN and conventional ANN. These layers can perform automatic feature extraction, capturing features of the input data, which are key to figuring out the relationship between the inputs and output parameters. In this study, CNN with one-dimensional (1D) conventional filters (1D CNN) was used44,49. Detailed information about the CNN architecture and specification can be found in Ref.33,50,51.

Hybrid models

Hybrid RF and XGB

The hybridization between the RF model and the XGB aimed to improve the performance of single models. Every single model was described in the previous sections. The use of RF-XGB reported high accuracy compared to other ML models (e.g. ANN and SVM) in agricultural applications, such as determining irrigation timing52 and detecting plant diseases53. Hence, the present study aims to test the performance of the RF-XGB hybrid model in predicting rice yield compared to single models.

Hybrid LSTM and CNN

LSTM and CNN were trained with the same input and hybrid to forecast results. The proposed hybrid CNN-LSTM model uses CNN layers for feature extraction from the input data with LSTM layers for sequence learning. CNN and LSTM are the most commonly used deep learning models. The present study aimed to test the efficiency of the hybrid LSTM–CNN model in rice yield forecasting. The hyper-parameters of the hybrid LSTM–CNN model, including the training algorithm, learning rate, batch size, and the number of training epochs, were set to be similar to the single CNN and LSTM models' hyper-parameters, as explained earlier.

Input scenarios and performance evaluation

This study investigated eleven input scenarios, including various combinations of climatic, soil, agricultural and remote sensing variables. To accurately predict rice production and evaluate each variable’s contribution, the multi-data sources were divided into eleven scenarios to figure out different solutions to predict rice production based on the available data (Table 2). There are two main methods for selecting the inputs combination: based on previous studies which trained and tested multi scenarios to achieve the best combination to arrive at the optimal combination with high accuracy, performance, and less error. The second approach depends on training and testing various variable combinations as we followed in the study to select the best scenarios in the prediction of rice production. For each scenario, we tried to apply some parameters to figure out the weight and the significance of each scenario, for example, in scenario 1, we applied only the sown area as one of the main variables affecting the rice production based on the previous studies. For other scenarios such as scenarios 3, 4 and 6 to illustrate the impact on the soil, climate remote sensing parameters on the rice production in order to figure out some best management for ensure food security in China. Other scenarios are a combination of the important parameters from climate, soil, and remote sensing together. The input datasets were divided as 70% for training and 30% for testing. Performance statistics such as the root mean square error (RMSE), Nash–Sutcliffe model efficiency coefficient (NSE), the mean absolute error (MAE), and coefficient of determination (R2) were used to assess the performance of applied models. The performance statistics equations are defined as:

RMSE=1nPi-Oi2, 1
NSE=1-Pi-Oi(o¯-Oi)22, 2
MAE=1ni=1nOi-Pi, 3
R2=i=1n(Oi-o¯)(Pi-P¯)i=1n(Oi-o¯i)2i=1n(Pi-P¯)22, 4

where Oi and Pi are the actual and the predicted production, respectively, O- representing the average values of the actual production, and i is the number of observations.

Table 2.

Input combinations (scenarios) for the applied models.

Scenario Inputs
Sc1 SA
Sc2 Sunshine, Tmin, Tmax and SW
Sc3 Soil variables (pH, BD, porosity, CEC, DEP, SOM, clay, sand, TN, TP, TK)
Sc4 Climate variables (Pre, Sunshine, Tave, Tmin, Tmax, WS and RH)
Sc5 Climate variables + SA
Sc6 Remote sensing (EVI, LAI, NDVI, NDWI, NPP)
Sc7 Remote sensing + SA
Sc8 Soil variables + SA
Sc9 Pre, Sunshine, SA, NDVI, NDWI
Sc10 Pre, Sunshine, Tave, Tmin, Tmax, WS, RH, Kc, ETc, EVI, LAI, NDVI, NDWI and NPP
Sc11 Climate + Soil + Remote sensing + (Kc, ETc and SA)

The standardized yield residuals series (SYRS)

Crop yield is affected by many variables besides climate, and shows a positive trend54,55. Moreover, mechanization and innovation in agriculture have increased in the last century due to the following factors55. To remove bias introduced by non-climate factors, the original yield timeseries were transformed to standardized yield residuals series (SYRS)56,57. The indicator of agricultural drought risk is given by the residuals of the detrended yield yiT as Ref.55:

yiT=yi0-yi(τ), 5

where yi0 is the observed crop yield and yi(τ) is the value of the fitted quadratic polynomial regression model. The SYRS is computed as:

SYRS=yi(T)-μσ, 6

where μ is the mean of the yield residuals and σ is the standard deviation of the yield residuals55.

The percentage of annual yield loss was based on Eq. (7). SPEI-3 and SPEI-6 were analyzed to assess the effect of drought severity and to evaluate the vegetation response to drought58. To assess the impacts of drought on crop yields, changes in the percentage of annual yield loss (YL%) was estimated as:

YL=Yi0-Yi(τ)Yi(τ)×100, 7

Results

Model performance

Performance of the single and hybrid models

To compare the accuracy of the single and the hybrid models, this study tested the performance of the four single models (RF, XGB, CNN, and LSTM) against the two hybrid models (RF-XGB and CNN-LSTM). Overall, hybrid models have performed better in estimating rice production than single models as the average of all input scenarios (Table 3). It is also notable that the use of the sowing area alone achieved a relatively high-performance estimation with an average R2 of 0.825, NSE = 0.823, and RMSE = 35.592 × 104 ton, among all ML methods. Without SA, the integration of both climatic and remote sensing achieved a moderate performance (Sc10, R2 = 0.533 (Table 3). The highest R2 (0.8593) and NSE (0.8556), and the lowest RMSE (26.6903 × 104 ton) were achieved by the hybrid RF-XGB model, followed by LSTM-CNN. In contrast, the lowest model performance was the LSTM model by 0.6786, 0.6693 and 43.9143 × 104 ton for R2, NSE and RMSE respectively.

Table 3.

The performance evaluation of applied models in rice production.

Models R2 NSE RMSE (× 104 ton)
RF 0.767 0.762 38.359
XGB 0.760 0.756 38.374
RF-XGB 0.859 0.856 26.690
LSTM 0.757 0.755 39.200
CNN 0.679 0.669 43.914
LSTM-CNN 0.851 0.850 29.052
Scenario
Sc1 0.825 0.823 35.59
Sc2 0.895 0.900 26.92
Sc3 0.883 0.872 28.68
Sc4 0.498 0.489 58.30
Sc5 0.899 0.898 26.31
Sc6 0.362 0.340 68.66
Sc7 0.894 0.892 27.32
Sc8 0.950 0.950 18.69
Sc9 0.881 0.879 28.69
Sc10 0.533 0.529 56.80
Sc11 0.948 0.948 19.30

The values in the table were estimated as averages for the applied models and input scenarios. Significant values are in bold.

Optimum input scenario for rice production

According to the performance’s results of the applied models, The tested models showed variant performance among the various input scenarios. On average, the best scenario was observed in scenario 8 (soil variables and sown area) and 11 (All variables) as inputs to the prediction models (Table 3). In both scenarios 8 and 11, the R2 and NSE were 0.95 and the RMSE was 19.69 × 104 ton and 19.3 × 104 ton for respectively. On the other hand, the use of remote sensing indices alone achieved the lowest performed scenario (Sc6) for rice production estimation (R2 = 0.362, NSE = 0.340, RMSE = 68.659 × 104 ton), while the use of sown area with remote sensing (scenario 7), the performance of the models was enhanced significantly (R2 = 0.899, NSE = 0.898, RMSE = 27.32 × 104 ton).

To investigate the performance of each model (single and hybrid models) under the eleven scenarios, R2, NSE and MAE indices were calculated for the different scenarios in the applied models (Table 4). The lowest single model was LSTM in scenarios 10 and 4 by MAE (51.38 × 104 and 50.35 × 104 ton) respectively. Meanwhile, the highest performance model was RF-XGB in scenarios 8 (soil variables and SA) and 5 (climate variables and SA) by MAE (5.85 × 104 and 7.70 × 104 ton), respectively. In contrast, the highest R2 values were recorded in scenarios 8 and 11 by 0.97 for RF-XGB and LSTM-CNN and the lowest R2 values were in scenario 4 (climate variables) in the LSTM model followed by scenario 10 by 0.11 and 0.13. Moreover, the NSE index indicates that the highest model was RF-XGB and LSTM-CNN by 0.97 for both models in scenarios 8 and 11. The lowest NSE values were 0.27 and 0.32 in scenario 6 (remote sensing) with XGB and RF models respectively. The scenario 3 (soil variables), the NSE was higher 0.82 for all models, while the NSE was enhanced in scenario 11 to be higher than 0.92 for all models. The highest NSE values were recorded in scenarios 8 and 11 by 0.97 for RF-XGB and LSTM-CNN. In contrast, the Radar chart shows the RMSE for the applied models in the different scenarios (Fig. 2a), the lowest single model was LSTM in scenario 4 (climate variables) by RMSE (81.85 × 104 ton), followed by the XGB in scenario 6 (remote sensing) by RMSE (73.13 × 104 ton) (Fig. 2a). However, the performance accuracy in these two scenarios was enhanced when applying the hybrid model, for example, scenarios 4 and 5 with the RF-XGB model achieved RMSE 38.45 × 104 ton and 6.45 × 104 ton by respectively, which enhanced by model by RMSE 13.65 × 104 ton, followed by scenario 11 (All variables) with LSTM-CNN and RF-XGB models by RMSE 14.90 × 104 ton. Based on the results, it is clear that the hybrid models performed better in rice production estimation than single models. On one hand, the lowest performance in all scenarios on the hybrid models was in scenarios 6 (remote sensing) and 4 (climate) respectively. On the other hand, the highest performance in all scenarios was in scenarios 8 and 11 respectively.

Table 4.

The performance evaluation of applied models in rice production.

Sc1 Sc2 Sc3 Sc4 Sc5 Sc6 Sc7 Sc8 Sc9 Sc10 Sc11
R2
 RF 0.81 0.88 0.89 0.47 0.89 0.35 0.90 0.93 0.90 0.50 0.92
 XGB 0.80 0.91 0.89 0.44 0.91 0.29 0.88 0.95 0.81 0.52 0.95
 RF-XGB 0.87 0.94 0.87 0.69 0.94 0.52 0.94 0.97 0.94 0.80 0.97
 LSTM 0.82 0.84 0.88 0.54 0.84 0.34 0.82 0.96 0.83 0.51 0.96
 CNN 0.82 0.84 0.88 0.11 0.86 0.25 0.89 0.92 0.84 0.13 0.92
 LSTM-CNN 0.82 0.96 0.89 0.74 0.96 0.42 0.95 0.97 0.95 0.74 0.97
NSE
 RF 0.81 0.88 0.89 0.46 0.88 0.32 0.90 0.93 0.90 0.48 0.92
 XGB 0.80 0.91 0.89 0.43 0.91 0.27 0.87 0.95 0.81 0.52 0.95
 RF-XGB 0.87 0.94 0.87 0.68 0.94 0.49 0.94 0.97 0.94 0.80 0.97
 LSTM 0.82 0.84 0.87 0.53 0.84 0.33 0.82 0.96 0.83 0.51 0.96
 CNN 0.82 0.88 0.82 0.09 0.86 0.20 0.89 0.92 0.84 0.13 0.92
 LSTM-CNN 0.82 0.96 0.89 0.74 0.96 0.42 0.94 0.97 0.95 0.73 0.97
MAE
 RF 15.3 11.4 14.3 35.1 11.7 39.7 11.5 9.4 11.8 33.1 9.8
 XGB 16.3 11.1 14.2 36.9 12.8 43.8 14.1 7.9 16.8 32.6 10.7
 RF-XGB 11.6 7.4 13.8 19.1 7.7 30.9 8.2 5.9 8.5 17.5 6.8
 LSTM 14.9 15.4 16.8 37.9 15.0 41.1 15.1 8.4 15.9 37.9 9.8
 CNN 14.7 15.9 16.1 50.4 14.9 41.7 13.2 9.8 14.0 51.4 12.3
 LSTM-CNN 15.7 10.1 12.6 26.8 9.3 38.5 9.9 9.4 10.3 23.8 8.3
Figure 2.

Figure 2

Radar chart for the RMSE of the applied models (a), the boxplot of the RF-XGB and LSTM-CNN models (Sc: scenario), (b) The boxplot of error distribution of the developed RF-XGB and LSTM-CNN models at scenarios 8 and 11. The figures were generated with the Origin 2023b software.

Therefore, to select the best hybrid models and scenario, the box plot was developed for scenarios 8 and 11 in RF-XGB and LSTM-CNN to compare the models based on the residuals (estimation error). Positive and negative estimation errors show under- and over-estimations, respectively. The RF-XGB model in scenarios 8 and 11 appears to be the best model having the lowest error by 53% and 23% in comparison with applying LSTM and XGB models, respectively in comparison with the others. On the other hand, the lowest scenario was scenario 8 (soil + SA) with RF-XGB The RF-XGB model in scenarios 8 and 11 appears to be the best model having the lowest error in comparison with the others. For scenario 8, it has a lower quartile (Q1) value of − 3.32 and for the LSTM-CNN (Q1 =  − 9.47), also, for scenario 11, the Q1 was − 3.59 and in the LSTM-CNN (Q1 =  − 4.45). Moreover, the smaller interquartile range (IQR = Q3-Q1) by the RF-XGB model compared with the LSTM-CNN model clearly shows that its distribution of error is much better than the LSTM-CNN model (Fig. 2b), it was 1.41 and 1.45 for scenario 8 and 11 respectively, however, it was 10.37 and 8.46 for LSTM-CNN model. Therefore, the RF-XGB model shows a clear superiority in scenarios 8 and 11.

Importance of predictor variables in rice production estimation

Based on the results obtained from the single RF and XGB models, it is the superiority of the XGB model in comparison with the RF model, thus, the XGB model was applied to analyse the joint contributions of subsets of features while maintaining a fast convergence during iterations. The predictor variables in the XGB model were used to investigate the importance of these predictor variables. The importance ranking of predictor variables for the regional and zonal scale showed that it had different effects or importance on rice production estimation (Fig. 3). For the regional scale, the most important feature in the rice estimation was sown area by 53%, followed by soil properties (32%), and climate (7%) (Fig. 3a). The importance of the sown area decreased to by 8% and 27% respectively. On the other hand, the sown area was very significantly important in the rice production estimation in northeast China and southeast China by 90% and 27% respectively. Therefore, to separately analyze the factors of climate, soil and remote sensing, Fig. 3c–e were developed. For example, the importance of the soil texture contributed 18% of the total contribution of the soil properties (32%) for rice production estimation across China. While the percentage of the contribution increased significantly by 82% in East China from the total contribution of the soil properties (87%), however, the contribution of texture was 24% in South China. In contrast, the contribution of climate change was low in all zones, the relative humidity contributed 3.5% of the total contribution of the climate on the regional scale, however, in southeast China, the temperature contributed almost half of the total contribution of the climate (2.95%) (Fig. 3e), evapotranspiration was at the bottom of the importance ranking due to the low importance of the climate factors. Meanwhile, for the zonal scale, in northeast China, the importance of sown area increased to be the main dominant factor for rice production estimation reaching 90% followed by soil properties by 4% (Fig. 3b). On the other hand, the soil properties were the main dominant factor impacting on rice production in east and southeast China by 87% and 57% respectively.

Figure 3.

Figure 3

Relative importance ranking of the features in rice production estimation for the regional and zonal scale. The figures were generated with the Origin 2023b software.

Solution for improving rice production

To improve the rice production in each zone, we exchanged and alignment of the soil properties from northeast China to southeast China and from east to southeast China. Figure 4a shows the variation of changing the soil properties in scenario 8, the RMSE decreased by 38% in northeast China when changed the soil properties to southeast China. In contrast, when the soil properties in southeast, China changed to the northeast, China, the RMSE did not significantly decrease (0.6%). In the same manner, the MAE was significantly decreased when changed the soil properties of northeast China to southeast China by 20%. Scenario 11 was consistent with scenario 8, the RMSE significantly decreased when the soil properties of northeast China to southeast China changed by 26% (Fig. 4b). On the other hand, when simulating the soil properties in east China by using the soil properties from southeast China, the performance of the model decreased, for example, the RMSE and MAE increased by 6% and 31% respectively. In contrast, one of the major suggested solutions is to increase the soil organic matter to enhance rice production. Therefore, we simulated the effect of increasing the soil organic matter by 15% on rice production (scenario 8). Figure 4c shows the performance of the hybrid RF-XGB model was enhanced significantly when increasing the SOM in northeast and southeast China by 15%, the RMSE declined by 16% and 10% respectively in comparison with the current SOM. However, increasing the SOM in East China resulted in a negative effect on the rice production estimation, the RMSE increased by 21%.

Figure 4.

Figure 4

Changing the soil properties (a,b) and increasing SOM by 15% (c) in each zone. The figures were generated with the Origin 2023b software.

On the other hand, as shown in Fig. 5, the decreasing trend of precipitation and increasing temperature in southeast China impacted negatively rice production. The maximum and minimum temperatures increased by 0.16 and 0.19 °C/ year, while precipitation decreased by 20 mm/year which resulted in decreasing the rice production by 2.23% as average in southeast China. In contrast, the production increase in northeast China may be the reason back to the non-significant decreasing and increasing trend in precipitation and temperature (maximum and minimum) and improving irrigation that will positively affect rice production even during dry years59,60. Therefore, the SPEI drought index was analyzed to investigate the drought situation during the period and how it is related to the production anomaly.

Figure 5.

Figure 5

Time series of precipitation, maximum and minimum temperature), sunshine, sown area and production across zones. The figures were generated with the Origin 2023b software.

The temporal evolution of SPEI series at 3- and 6-month timescales fluctuated during the study period (Fig. 6a and b). In Northeast China, during the period from 2009 to 2012, the drought (SPEI-3) was classified as extreme drought, especially in 2009, it was during the months (May, June and July) of the rice season. However, in East China, during the period from 2009 to 2013, the drought can be classified as severe drought. Meanwhile, in southeast China, during the period from 2011 to 2015, the drought can be classified as severe drought, however, the extreme drought was found only in 2011 for June and September months.

Figure 6.

Figure 6

The temporal evolution of SPEI-3 and SPEI-6 (a and b), the Pearson correlation coefficient (r) of the linear regression between the SPEIs at 3- and 6-month timescale and the SYRS of rice yield in the three zones (c) and yield losses across the three regions (d). The figures were generated with the Sigma plot software.

On the other hand, in the period from 2002 to 2008, there was no drought event happened during this period. In contrast, Fig. 6c shows the correlation analysis between the SPEI-3/6 and SYRS of rice yield across the three zones. The correlation coefficient between SYRS of rice in southeast China and SPEI in April and May (initial stage) is the highest among all months, revealing that rice yield is more prone to drought in the initial stage. Meanwhile, in northeast and east China, the rice yield is less correlated with drought than in southeast China, which may be the reason back to the improving irrigation will positively affect rice yield even during dry years. It is observed that the degree of yield losses varies during the study period across the three regions due to drought/wet impact on the various crop stages. In East China, 2003 ranked as the year with the highest failure of rice, the yield losses reached to 60%. In contrast, in southeast China, the highest losses occurred in 2001, 2002 and 2003 by 20%, 27% and 18%, with average losses during the whole study period by 2.23% (Fig. 6d). Besides the climate variables, soil properties play a vital role in improving rice production. The results from this study indicated that the clay (30–100 cm) was positively correlated with the rice production in the three zones, especially in northeast China (Fig. 7). It was the same in the sand (0–30 cm) in southeast China, however, it was negatively in the northeast and east China.

Figure 7.

Figure 7

Variations of sand (0–30), clay (30–100), soil organic matter and porosity in each district. The figures were generated with the Origin 2023b software.

Discussion

Hybrid method importance in rice yield estimation

The results from this study documented that the hybrid models RF-XGB and LSTM-CNN models are more flexible and robust with noisy data than single models, significantly enhancing their prediction accuracy. Previous researches have documented that both climate variables and remote sensing data could exert non-linear and complicated effects on production variations61, which however could be less captured by the single methods. For example, the RMSE was reduced by more than 30% when applying hybrid the RF-XGB and LSTM-CNN models compared to the single models, which agrees with the findings of Chiu, Wen61. The underlying reason may be that using a single machine learning aggressor may result in over-fitting and difficulty with generalization. This is because the regressor may become too complex and fit the noise in the training data, rather than the underlying patterns6264. Further, Huang et al.65 developed, trained, and tested a back-propagation neural network (BP-ANN) model for fiber-reinforced polymer (FRP) reinforced concrete at high temperatures using 151 sets of FRP-reinforced concrete pullout test data at different temperatures reported in the literature. The results showed that the BP-ANN model exhibited greater generality than existing mathematical models. Furthermore, Wang et al.63 combined ANN with genetic algorithm (GA) or particle swarm optimization (PSO) for model training and testing. The findings indicated that the accuracy of the developed hybrid machine learning model in predicting bond strength in CES structures exceeded that of conventional ANN models and existing empirical equations. In addition, both DL and ML models are black boxes. It is difficult to produce testable hypotheses that could potentially provide biological insights because of their complex model structure. In contrast, in comparison with traditional production estimation methods (i.e. crop models simulation and statistical regression), the ML and DL methods provide new opportunities for yield predictions27. However, combining crop models and DL/ML models for yield estimation, forecasts, and disaster monitoring in large regions is recommended. This might encourage running the models of rice production estimation at the local scale to consider the variation among rice districts in their agro-environmental conditions and the relative correlation of various factors with rice production.

Analysis of driving mechanisms on rice production

The global warming phenomenon has undoubtedly brought unprecedented challenges to rice production, vital for food security in southeast Asian countries and China. The excessively high temperature will increase the risk of heat stress, which will not only make others challenging to crack but contribute to the reduction of pollen, thus affecting the normal process of pollination and fertilization. Meanwhile, excessive heat will inhibit rice from synthetic organic matter and accumulate dry matter, leading to reduced seed setting rate, grain mass, and seed weight6668. A reduction in rainfall will decrease the stomatal conductance and inter-cellular CO2 flux, which will slow down the transpiration rate and restrict photosynthesis69. As a result, the uptake of nutrients will be reduced, and respiration consumption will increase oppositely. Therefore, the increase of precipitation in a moderate range can promote rice yield. Our findings agree with the findings of Liu et al.70, who reported that the individual contribution of climate change, soil improvement to rice yield differed with respected factors. Compared with the 1980s, the yield in the 2000s decreased by 19.5% from climate change, while the yield increased by 12.7% due to soil improvement. In contrast, the increase in rice production in northeast China may be the reason back to the non-significant decreasing and increasing trend in precipitation and temperature (maximum and minimum) and adequate irrigation and adjusting sowing dates that will positively affect rice production even during dry years59,7173 and also, the appropriate application of chemical fertilizers, providing ample nutrients to the growth of rice74. As shown in Fig. 6a and b), in southeast China, during the period from 2011 to 2015, the drought can be classified as severe drought, however, the extreme drought was found only in 2011 for June and September months. Furthermore, the correlation coefficient between SYRS of rice in southeast China and SPEI in April and May (initial stage) is the highest among all months, revealing that rice yield is more prone to drought in the initial stage. Meanwhile, in northeast and east China, the rice yield is less correlated with drought than in southeast China, which may be the reason back to the improving irrigation will positively affect rice yield even during dry years72,73. Furthermore, the role of climatic variables in rice yield variation was not significant in some regions in China, these results are supported by some previous studies75,76. The underlying reasons may be that sown area and soil properties represent comprehensive features or information of a county or a field over a long time, while climate factors represent a part of the information related to crop production for a specific period. In contrast, high production can be characterized by healthy soils, well water conditions, farmer's experiences, agricultural practices such as applying mulches, well-equipped irrigation facilities, fertilizers and suitable climate conditions75. All these features can be comprehensively represented by spatial location. Furthermore, climatic variables derived from meteorological data were better in rice production estimation than vegetation parameters derived from remote sensing data. This agrees with earlier studies that the fluctuation in precipitation and temperature proved a strong correlation with rice production21,22. Although remote sensing vegetation indices (VIs) performed less than climatic variables in rice production estimation at the regional scale, VIs were more important than the climate in some rice districts. The explanation may be that the satellite indices can reflect the effects not only of abiotic factors but also biotic factors (e.g. plant disease, irrigation, and fertilization)77,78, which agree with the conclusion of Cao, Zhang27. Moreover, we speculate that monthly EVI and weather data cannot accurately reflect crop growth and development. The EVI at the 8-day or 16-day period might better incorporate crop growth and weather information75. Moreover, a subset of climatic variables in scenario 2 (Sunshine, Tmin, Tmax, and sown area) achieved comparable rice production estimation results to using full climatic variables as in scenario 5. The reason may be due to the highly significant between the sown area and rice production as shown in Fig. 7 in the three zones. In contrast, soil health is one of the major factors affecting rice production79. Increasing the clay content could improve soil fertility80. A higher biomass was recorded in rice grown in high clay soil than in rice grown in low clay soil80,81. Southern China accounts for 88% of national rice production82. Continuous flooding irrigation is practiced by Chinese farmers in lowland rice, threatening rice production83. Moreover, in regions of southern China, clay-textured soils offer the highest potassium-supplying potential84. The results from this study indicated that the clay (30–100 cm) was positively correlated with the rice production in the three zones, especially in northeast China, however, it was negative in northeast and east China. The main reason is that soil texture affects plant growth and nutrient uptake because it alters the availability of water in the soil. When the soil has high clay contents, often with a large proportion of 2:1 clay, it is classified as Vertisol79. In flooded rice soil, soil swelling is dominant because clay absorbs water, then the soil is allowed to dry out before irrigation is applied again85; as such, cracks are dominant in paddy soils86 due to the removal of water from within and between clay micro structures.

Conclusion

In this study, the key issue was finding the best approach to predict rice production across China’s main rice counties by testing multiple single and hybrid models and input scenarios at various study scales. Based on the results, the main findings of the present study can be summarized as follows;

  • Hybrid models performed better than single models in rice production estimation which significantly improves the prediction accuracy.

  • For the zonal scale, the soil properties were the most dominant factors in rice production, it was 87 and 53% in east and southeast China respectively.

  • The increase in temperature and decrease in precipitation restrain rice production by decreasing rice production by 2.2% as average in southeast China.

  • At the regional scale, climatic variables showed a strong relationship with rice production than vegetation parameters. However, remote sensing outperformed climatic factors in some local districts. The paper's innovation lies in its holistic approach to predicting rice production using multi-source data and hybrid machine learning algorithms, offering high-resolution insights into a critical aspect of China's agriculture. Furthermore, one of the main innovative points of this study was to investigate the dominant factor for rice production across China’s main rice counties. In contrast, future research will focus on predicting rice production using agronomic datasets (crop phenology, growing degree days, full grain, panic number, and plant height) as well as management datasets in addition to the existing datasets.

Acknowledgements

The research work of this article was financially supported by Projects of the Second Tibetan Plateau Scientific Expedition and Research Program(Grant No.2019QZKK040303), National Natural Science Foundation of China (No. 42271007), The National Key R&D Program of China (2022YFF1302401), Comprehensive Scientific Investigation Program of the Gaoligong Mountain National Park in Yunnan Province, Project for Wetland Ecological Processes and Impact Assessment of Wetland Birds in the Huanghe Yangqu Hydropower Station Engineering Project (No. 1161-GCJS-FY-[2022]).

Author contributions

Ali Mokhtar: Conceptualization, Methodology, Formal analysis, Software, Validation, Writing – original draft. Hongming He: Conceptualization, Methodology, Formal analysis, Investigation, Software, Writing – review & editing. Mohsen Nabil: Investigation, Resources, Writing – review & editing. Saber Kouadri: Investigation, Resources, Writing – review & editing. Ali Salem: Investigation, Resources, Funding acquisition, Writing – review & editing. Ahmed Elbeltagi: Conceptualization, Investigation, Funding acquisition, Writing – review & editing.

Funding

Open access funding provided by University of Pécs.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Hongming He, Email: hongming.he@yahoo.com.

Ali Salem, Email: salem.ali@mik.pte.hu.

References

  • 1.Balakrishnan N, Muthukumarasamy G. Crop production-ensemble machine learning model for prediction. Int. J. Comput. Sci. Softw. Eng. 2016;5(7):148. [Google Scholar]
  • 2.Mekonnen MM, Hoekstra AY. A global and high-resolution assessment of the green, blue and grey water footprint of wheat. Hydrol. Earth Syst. Sci. 2010;14:1259–1276. doi: 10.5194/hess-14-1259-2010. [DOI] [Google Scholar]
  • 3.Huang J, Xu C, Ridoutt BG, Chen F. Reducing agricultural water footprints at the farm scale: A case study in the Beijing region. Water. 2015;7:7066–7077. doi: 10.3390/w7126674. [DOI] [Google Scholar]
  • 4.Fan J, Jintrawet A, Sangchyoswat C. The relationships between extreme precipitation and rice and maize yields using machine learning in Sichuan Province, China. Curr. Appl. Sci. Technol. 2020;20:453–469. [Google Scholar]
  • 5.Gillani SA, et al. Appraisal of urban heat island over Gujranwala and its environmental impact assessment using satellite imagery (1995–2016) Int. J. Innov. Sci. Technol. 2019;1(01):1–14. doi: 10.33411/IJIST/2019010101. [DOI] [Google Scholar]
  • 6.Lee S-H, Bae J-Y. Predicting crop production for agricultural consultation service. J. Inf. Commun. Converg. Eng. 2019;17(1):8–13. [Google Scholar]
  • 7.Adnan RM, Mostafa RR, Elbeltagi A, Yaseen ZM, Shahid S, Kisi O. Development of new machine learning model for streamflow prediction: Case studies in Pakistan. Stoch. Environ. Res. Risk Assess. 2022;36:999–1033. doi: 10.1007/s00477-021-02111-z. [DOI] [Google Scholar]
  • 8.Kouadri S, Pande CB, Panneerselvam B, Moharir KN, Elbeltagi A. Prediction of irrigation groundwater quality parameters using ANN, LSTM, and MLR models. Environ. Sci. Pollut. Res. 2022;29:21067–21091. doi: 10.1007/s11356-021-17084-3. [DOI] [PubMed] [Google Scholar]
  • 9.Mohammed S, et al. A comparative analysis of data mining techniques for agricultural and hydrological drought prediction in the eastern Mediterranean. Comput. Electron. Agric. 2022;197:106925. doi: 10.1016/j.compag.2022.106925. [DOI] [Google Scholar]
  • 10.Sakaa B, et al. Water quality index modeling using random forest and improved SMO algorithm for support vector machine in Saf-Saf river basin. Environ. Sci. Pollut. Res. 2022 doi: 10.1007/s11356-022-18644-x. [DOI] [PubMed] [Google Scholar]
  • 11.Singh VK, et al. Novel genetic algorithm (GA) based hybrid machine learning-pedotransfer function (ML-PTF) for prediction of spatial pattern of saturated hydraulic conductivity. Eng. Appl. Comput. Fluid Mech. 2022;16:1082–1099. [Google Scholar]
  • 12.Carlson KM, et al. Greenhouse gas emissions intensity of global croplands. Nat. Clim. Change. 2017;7:63–68. doi: 10.1038/nclimate3158. [DOI] [Google Scholar]
  • 13.Naresh R, et al. Water footprint of rice from both production and consumption perspective assessment using remote sensing under subtropical India: A review. Int. J. Chem. Stud. 2017;5:343–350. [Google Scholar]
  • 14.Zheng J, et al. Assessment of climate change impact on the water footprint in rice production: Historical simulation and future projections at two representative rice cropping sites of China. Sci. Total Environ. 2020;709:136190. doi: 10.1016/j.scitotenv.2019.136190. [DOI] [PubMed] [Google Scholar]
  • 15.Mosleh MK, Hassan QK. Development of a remote sensing-based “Boro” rice mapping system. Remote Sens. 2014;6(3):1938–1953. doi: 10.3390/rs6031938. [DOI] [Google Scholar]
  • 16.Jabjone S, Jiamrum C. Artificial neural networks for predicting the rice yield in Phimai District of Thailand. Int. J. Electr. Energy. 2013;1(3):177–181. doi: 10.12720/ijoee.1.3.177-181. [DOI] [Google Scholar]
  • 17.Marndi A, Ramesh K, Patra G. Crop production estimation using deep learning technique. Curr. Sci. 2021;121(8):1073. doi: 10.18520/cs/v121/i8/1073-1079. [DOI] [Google Scholar]
  • 18.Sultana A, Khanam M. Forecasting rice production of Bangladesh using ARIMA and artificial neural network models. Dhaka Univ. J. Sci. 2020;68(2):143–147. doi: 10.3329/dujs.v68i2.54612. [DOI] [Google Scholar]
  • 19.Koide N, Robertson AW, Ines AV, Qian J-H, Dewitt DG, Lucero A. Prediction of rice production in the Philippines using seasonal climate forecasts. J. Appl. Meteorol. Climatol. 2013;52:552–569. doi: 10.1175/JAMC-D-11-0254.1. [DOI] [Google Scholar]
  • 20.Roberts MG, Dawe D, Falcon WP, Naylor RL. El Niño-Southern oscillation impacts on rice production in Luzon, the Philippines. J. Appl. Meteorol. Climatol. 2009;48:1718–1724. doi: 10.1175/2008JAMC1628.1. [DOI] [Google Scholar]
  • 21.Jianping Z, et al. Effect of climate change on the growth and yields of double-harvest rice in the Southern China. Adv. Clim. Change Res. 2005;1(04):151–156. [Google Scholar]
  • 22.Li W-J, et al. Climate change impact and its contribution share to paddy rice production in Jiangxi, China. J. Integr. Agric. 2014;13(7):1565–1574. doi: 10.1016/S2095-3119(14)60811-X. [DOI] [Google Scholar]
  • 23.Prasad A, et al. Use of vegetation index and meteorological parameters for the prediction of crop yield in India. Int. J. Remote Sens. 2007;28(23):5207–5235. doi: 10.1080/01431160601105843. [DOI] [Google Scholar]
  • 24.Faisal BR, et al. Relationship between boro rice production and MODIS derived NDVI for rice production forecasting: A case study on Bangladesh. Dhaka Univ. J. Earth Environ. Sci. 2019;8(1):33–40. doi: 10.3329/dujees.v8i1.50759. [DOI] [Google Scholar]
  • 25.Chen C, et al. Rice area mapping, yield, and production forecast for the province of Nueva Ecija using RADARSAT imagery. Can. J. Remote Sens. 2011;37(1):1–16. doi: 10.5589/m11-024. [DOI] [Google Scholar]
  • 26.Raza SMH, et al. Delineation of potential sites for rice cultivation through multi-criteria evaluation (MCE) using remote sensing and GIS. Int. J. Plant Prod. 2018;12(1):1–11. doi: 10.1007/s42106-017-0001-z. [DOI] [Google Scholar]
  • 27.Cao J, et al. Integrating multi-source data for rice yield prediction across China using machine learning and deep learning approaches. Agric. For. Meteorol. 2021;297:108275. doi: 10.1016/j.agrformet.2020.108275. [DOI] [Google Scholar]
  • 28.Sun W, Huang Y. Global warming over the period 1961–2008 did not increase high-temperature stress but did reduce low-temperature stress in irrigated rice across China. Agric. For. Meteorol. 2011;151(9):1193–1201. doi: 10.1016/j.agrformet.2011.04.009. [DOI] [Google Scholar]
  • 29.Zhang Z, et al. Global warming over 1960–2009 did increase heat stress and reduce cold stress in the major rice-planting areas across China. Eur. J. Agron. 2014;59:49–56. doi: 10.1016/j.eja.2014.05.008. [DOI] [Google Scholar]
  • 30.Deng N, et al. Closing yield gaps for rice self-sufficiency in China. Nat. Commun. 2019;10(1):1725. doi: 10.1038/s41467-019-09447-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Peng S, Tang Q, Zou Y. Current status and challenges of rice production in China. Plant Prod. Sci. 2009;12(1):3–8. doi: 10.1626/pps.12.3. [DOI] [Google Scholar]
  • 32.Mokhtar A, et al. Assessment of the effects of spatiotemporal characteristics of drought on crop yields in southwest China. Int. J. Climatol. 2022;42:3056–3075. doi: 10.1002/joc.7407. [DOI] [Google Scholar]
  • 33.Mokhtar A, et al. Estimation of SPEI meteorological drought using machine learning algorithms. IEEE Access. 2021;9:65503–65523. doi: 10.1109/ACCESS.2021.3074305. [DOI] [Google Scholar]
  • 34.Mokhtar A, et al. Ecosystem water use efficiency response to drought over Southwest China. Ecohydrology. 2021;15:e2317. doi: 10.1002/eco.2317. [DOI] [Google Scholar]
  • 35.Mokhtar A, et al. Estimation of the rice water footprint based on machine learning algorithms. Comput. Electron. Agric. 2021;191:106501. doi: 10.1016/j.compag.2021.106501. [DOI] [Google Scholar]
  • 36.Han H, Armaghani DJ, Tarinejad R, Zhou J, Tahir M. Random forest and bayesian network techniques for probabilistic prediction of flyrock induced by blasting in quarry sites. Nat. Resour. Res. 2020;29:655–667. doi: 10.1007/s11053-019-09611-4. [DOI] [Google Scholar]
  • 37.Salazar L, Kogan F, Roytman L. Use of remote sensing data for estimation of winter wheat yield in the United States. Int. J. Remote Sens. 2007;28:3795–3811. doi: 10.1080/01431160601050395. [DOI] [Google Scholar]
  • 38.Shangguan W, Dai Y, Duan Q, Liu B, Yuan H. A global soil data set for earth system modeling. J. Adv. Model. Earth Syst. 2014;6(1):249–263. doi: 10.1002/2013MS000293. [DOI] [Google Scholar]
  • 39.Chen, T. and Guestrin, C. Xgboost: A scalable tree boosting system. In Proc. of the 22nd acm sigkdd international conference on knowledge discovery and data mining (2016).
  • 40.Breiman L. Random forests. Mach. Learn. 2001;45(1):5–32. doi: 10.1023/A:1010933404324. [DOI] [Google Scholar]
  • 41.Magidi J, et al. Application of the random forest classifier to map irrigated areas using google earth engine. Remote Sens. 2021;13(5):876. doi: 10.3390/rs13050876. [DOI] [Google Scholar]
  • 42.Kouadri S, et al. Performance of machine learning methods in predicting water quality index based on irregular data set: Application on Illizi region (Algerian southeast) Appl. Water Sci. 2021;11(12):1–20. doi: 10.1007/s13201-021-01528-9. [DOI] [Google Scholar]
  • 43.Trabelsi F, Bel Hadj Ali S. Exploring machine learning models in predicting irrigation groundwater quality indices for effective decision making in Medjerda river Basin Tunisia. Sustainability. 2022;14(4):2341. doi: 10.3390/su14042341. [DOI] [Google Scholar]
  • 44.Ferreira LB, da Cunha FF. Multi-step ahead forecasting of daily reference evapotranspiration using deep learning. Comput. Electron. Agric. 2020;178:105728. doi: 10.1016/j.compag.2020.105728. [DOI] [Google Scholar]
  • 45.Hochreiter SSJ. Long short-term memory. Neural Comput. 1997;9(8):1735–1780. doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]
  • 46.Wu Q, Lin H. Daily urban air quality index forecasting based on variational mode decomposition, sample entropy and LSTM neural network. Sustain. Cities Soc. 2019;50:101657. doi: 10.1016/j.scs.2019.101657. [DOI] [Google Scholar]
  • 47.Zhu S, et al. Forecasting of water level in multiple temperate lakes using machine learning models. J. Hydrol. 2020;585:124819. doi: 10.1016/j.jhydrol.2020.124819. [DOI] [Google Scholar]
  • 48.Kingma, D.P. and Ba, J. Adam: A method for stochastic optimization. Preprint at http://arXiv.org//1412.6980 (2014).
  • 49.Ferreira LB, da Cunha FF. New approach to estimate daily reference evapotranspiration based on hourly temperature and relative humidity using machine learning and deep learning. Agric. Water Manag. 2020;234:106113. doi: 10.1016/j.agwat.2020.106113. [DOI] [Google Scholar]
  • 50.Barzegar R, Aalami MT, Adamowski J. Short-term water quality variable prediction using a hybrid CNN–LSTM deep learning model. Stoch. Environ. Res. Risk Assess. 2020;34:1–19. doi: 10.1007/s00477-020-01776-2. [DOI] [Google Scholar]
  • 51.Zuo R, Xiong Y, Wang J, Carranza EJM. Deep learning and its application in geochemical mapping. Earth-Sci. Rev. 2019;192:1–14. doi: 10.1016/j.earscirev.2019.02.023. [DOI] [Google Scholar]
  • 52.Glória A, Cardoso J, Sebastião P. Sustainable irrigation system for farming supported by machine learning and real-time sensor data. Sensors. 2021;21(9):3079. doi: 10.3390/s21093079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Aquil MAI, Ishak WHW. Evaluation of scratch and pre-trained convolutional neural networks for the classification of Tomato plant diseases. IAES Int. J. Artif. Intell. 2021;10(2):467. [Google Scholar]
  • 54.Vicente-Serrano SM, Beguería S, López-Moreno JI. A multiscalar drought index sensitive to global warming: The standardized precipitation evapotranspiration index. J. Clim. 2010;23(7):1696–1718. doi: 10.1175/2009JCLI2909.1. [DOI] [Google Scholar]
  • 55.Potopová V, et al. Impact of agricultural drought on main crop yields in the Republic of Moldova. Int. J. Climatol. 2016;36(4):2063–2082. doi: 10.1002/joc.4481. [DOI] [Google Scholar]
  • 56.Lobell DB, Asner GP. Climate and management contributions to recent trends in U. S. agricultural yields. Science. 2003;299:1032–1032. doi: 10.1126/science.1078475. [DOI] [PubMed] [Google Scholar]
  • 57.Wu H, Hubbard KG, Wilhite DA. An agricultural drought risk-assessment model for corn and soybeans. Int. J. Climatol. J. R. Meteorol. Soc. 2004;24:723–741. doi: 10.1002/joc.1028. [DOI] [Google Scholar]
  • 58.Tigkas D, Vangelis H, Tsakiris G. Drought characterisation based on an agriculture-oriented standardised precipitation index. Theor. Appl. Climatol. 2019;135(3–4):1435–1447. doi: 10.1007/s00704-018-2451-3. [DOI] [Google Scholar]
  • 59.Ding Y, Wang W, Zhuang Q, Luo Y. Adaptation of paddy rice in China to climate change: The effects of shifting sowing date on yield and irrigation water requirement. Agric. Water Manag. 2020;228:105890. doi: 10.1016/j.agwat.2019.105890. [DOI] [Google Scholar]
  • 60.Wang J, et al. Growing water scarcity, food security and government responses in China. Glob. Food Secur. 2017;14:9–17. doi: 10.1016/j.gfs.2017.01.003. [DOI] [Google Scholar]
  • 61.Chiu M-C, Wen C-Y, Hsu H-W, Wang W-C. Key wastes selection and prediction improvement for biogas production through hybrid machine learning methods. Sustain. Energy Technol. Assess. 2022;52:102223. [Google Scholar]
  • 62.Huang T, Liu T, Ai Y, Ren Z, Ou J, Li Y, Xu N. Modelling the interface bond strength of corroded reinforced concrete using hybrid machine learning algorithms. J. Build. Eng. 2023;74:106862. doi: 10.1016/j.jobe.2023.106862. [DOI] [Google Scholar]
  • 63.Wang P, Hu J, Chen W. A hybrid machine learning model to optimize thermal comfort and carbon emissions of large-space public buildings. J. Clean. Prod. 2023;400:136538. doi: 10.1016/j.jclepro.2023.136538. [DOI] [Google Scholar]
  • 64.Sulaiman R, et al. Hybrid ensemble-based machine learning model for predicting phosphorus concentrations in hydroponic solution. Spectrochim. Acta A Mol. Biomol. Spectrosc. 2024;304:123327. doi: 10.1016/j.saa.2023.123327. [DOI] [PubMed] [Google Scholar]
  • 65.Huang L, Chen J, Tan X. BP-ANN based bond strength prediction for FRP reinforced concrete at high temperature. Eng. Struct. 2022;257:114026. doi: 10.1016/j.engstruct.2022.114026. [DOI] [Google Scholar]
  • 66.Bai H, Tao F, Xiao D, Liu F, Zhang H. Attribution of yield change for rice-wheat rotation system in China to climate change, cultivars and agronomic management in the past three decades. Clim. Change. 2016;135:539–553. doi: 10.1007/s10584-015-1579-8. [DOI] [Google Scholar]
  • 67.Chen J, Theller L, Gitau MW, Engel BA, Harbor JM. Urbanization impacts on surface runoff of the contiguous United States. J. Environ. Manag. 2016;187:470–481. doi: 10.1016/j.jenvman.2016.11.017. [DOI] [PubMed] [Google Scholar]
  • 68.Chen X, Chen S. China feels the heat: negative impacts of high temperatures on China's rice sector. Aust. J. Agric. Resour. Econ. 2018;62:576–588. doi: 10.1111/1467-8489.12267. [DOI] [Google Scholar]
  • 69.Maricle BR, Adler PB. Effects of precipitation on photosynthesis and water potential in Andropogon gerardii and Schizachyrium scoparium in a southern mixed grass prairie. Environ. Exp. Bot. 2011;72(2):223–231. doi: 10.1016/j.envexpbot.2011.03.011. [DOI] [Google Scholar]
  • 70.Liu L, Zhu Y, Tang L, Cao W, Wang E. Impacts of climate changes, soil nutrients, variety types and management practices on rice yield in East China: A case study in the Taihu region. Field Crops Res. 2013;149:40–48. doi: 10.1016/j.fcr.2013.04.022. [DOI] [Google Scholar]
  • 71.Wang W, et al. Bayesian multi-model projection of irrigation requirement and water use efficiency in three typical rice plantation region of China based on CMIP5. Agric. For. Meteorol. 2017;232:89–105. doi: 10.1016/j.agrformet.2016.08.008. [DOI] [Google Scholar]
  • 72.Moseley WG. Agriculture on the brink: Climate change, labor and smallholder farming in Botswana. Land. 2016;5(3):21. doi: 10.3390/land5030021. [DOI] [Google Scholar]
  • 73.Sala OE, et al. Global biodiversity scenarios for the year 2100. Science. 2000;287(5459):1770–1774. doi: 10.1126/science.287.5459.1770. [DOI] [PubMed] [Google Scholar]
  • 74.Zare M, et al. Simulation of soil erosion under the influence of climate change scenarios. Environ. Earth Sci. 2016;75:1–15. doi: 10.1007/s12665-016-6180-6. [DOI] [Google Scholar]
  • 75.Cao J, et al. Wheat yield predictions at a county and field scale with deep learning, machine learning, and google earth engine. Eur. J. Agron. 2021;123:126204. doi: 10.1016/j.eja.2020.126204. [DOI] [Google Scholar]
  • 76.Liu Y, Li N, Zhang Z, Huang C, Chen X, Wang F. The central trend in crop yields under climate change in China: A systematic review. Sci. Total Environ. 2020;704:135355. doi: 10.1016/j.scitotenv.2019.135355. [DOI] [PubMed] [Google Scholar]
  • 77.Boken VK, Shaykewich CF. Improving an operational wheat yield model using phenological phase-based normalized difference vegetation index. Int. J. Remote Sens. 2002;23(20):4155–4168. doi: 10.1080/014311602320567955. [DOI] [Google Scholar]
  • 78.Jiang H, et al. A deep learning approach to conflating heterogeneous geospatial data for corn yield estimation: A case study of the US Corn Belt at the county level. Glob. Change Biol. 2020;26(3):1754–1766. doi: 10.1111/gcb.14885. [DOI] [PubMed] [Google Scholar]
  • 79.Alhaj Hamoud Y, et al. Effect of irrigation regimes and soil texture on the potassium utilization efficiency of rice. Agronomy. 2019;9(2):100. doi: 10.3390/agronomy9020100. [DOI] [Google Scholar]
  • 80.Dou F, et al. Soil texture and cultivar effects on rice (Oryza sativa, L.) grain yield, yield components and water productivity in three water regimes. PLoS One. 2016;11(3):e0150549. doi: 10.1371/journal.pone.0150549. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Rao PR, et al. Influence of boron on spikelet fertility under varied soil conditions in rice genotypes. J. Plant Nutr. 2013;36(3):390–400. doi: 10.1080/01904167.2012.744420. [DOI] [Google Scholar]
  • 82.Ma X, et al. Rice re-cultivation in southern China: An option for enhanced climate change resilience in rice production. J. Geogr. Sci. 2013;23:67–84. doi: 10.1007/s11442-013-0994-x. [DOI] [Google Scholar]
  • 83.Yao L, et al. Current situation and prospect of rice water-saving irrigation technology in China. Chin. J. Ecol. 2014;33(5):1381. [Google Scholar]
  • 84.Xie J, Luo J, Ma M. Potassium-supplying potential of different soils and the current potassium balance status in the farmland ecosystems in China. In: Xie J, Luo J, Ma M, editors. Proceedings of the International Symposium on Balanced Fertilization, Soil and Fertilizer Institute of the Chinese Academy of Agricultural Sciences. China Agriculture Press Beijing; 1990. [Google Scholar]
  • 85.Bouman B, Tuong TP. Field water management to save water and increase its productivity in irrigated lowland rice. Agric. Water Manag. 2001;49(1):11–30. doi: 10.1016/S0378-3774(00)00128-1. [DOI] [Google Scholar]
  • 86.Islam M, et al. Influence of cracking on rice seasons and irrigation in Bangladesh. J. Biol. Sci. 2004 doi: 10.3923/jbs.2004.11.14. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES