A tree based eXtreme Gradient Boosting (XGBoost) machine learning model to forecast the annual rice production in Bangladesh

Mst Noorunnahar; Arman Hossain Chowdhury; Farhana Arefeen Mila

doi:10.1371/journal.pone.0283452

. 2023 Mar 27;18(3):e0283452. doi: 10.1371/journal.pone.0283452

A tree based eXtreme Gradient Boosting (XGBoost) machine learning model to forecast the annual rice production in Bangladesh

Mst Noorunnahar ^1,^#, Arman Hossain Chowdhury ^2,^#, Farhana Arefeen Mila ^3,^*

Editor: Sathishkumar V E⁴

PMCID: PMC10042373 PMID: 36972270

Abstract

In this study, we attempt to anticipate annual rice production in Bangladesh (1961–2020) using both the Autoregressive Integrated Moving Average (ARIMA) and the eXtreme Gradient Boosting (XGBoost) methods and compare their respective performances. On the basis of the lowest Corrected Akaike Information Criteria (AICc) values, a significant ARIMA (0, 1, 1) model with drift was chosen based on the findings. The drift parameter value shows that the production of rice positively trends upward. Thus, the ARIMA (0, 1, 1) model with drift was found to be significant. On the other hand, the XGBoost model for time series data was developed by changing the tunning parameters frequently with the greatest result. The four prominent error measures, such as mean absolute error (MAE), mean percentage error (MPE), root mean square error (RMSE), and mean absolute percentage error (MAPE), were used to assess the predictive performance of each model. We found that the error measures of the XGBoost model in the test set were comparatively lower than those of the ARIMA model. Comparatively, the MAPE value of the test set of the XGBoost model (5.38%) was lower than that of the ARIMA model (7.23%), indicating that XGBoost performs better than ARIMA at predicting the annual rice production in Bangladesh. Hence, the XGBoost model performs better than the ARIMA model in predicting the annual rice production in Bangladesh. Therefore, based on the better performance, the study forecasted the annual rice production for the next 10 years using the XGBoost model. According to our predictions, the annual rice production in Bangladesh will vary from 57,850,318 tons in 2021 to 82,256,944 tons in 2030. The forecast indicated that the amount of rice produced annually in Bangladesh will increase in the years to come.

Introduction

There has been a fast expansion in the world population, which has put a strain on the agricultural sector [1]. Rice is considered the world’s third most common major crop, with more than 50% of the world’s population eating it as a staple diet [2, 3]. As one of the most nutrient-dense grains, rice is an excellent source of carbohydrate as well as vitamins (B, E, thiamine) and minerals (Ca, Mg, Fe) [4]. About 160 million Bangladeshis rely on rice as a basic meal for their daily diets and survival [5]. Bangladesh’s economy is heavily dependent on rice production, which means that the price of rice has a considerable impact on GDP growth, inflation, wages, employment, food security, and poverty [6]. The rice industry employs over 48% of the rural population, provides two-thirds of all caloric intake, and accounts for half of the average person’s protein intake [7]. For agricultural GDP and national income, the rice subsector alone contributes about 4.5% to the GDP [8]. Nearly all farming households in Bangladesh cultivate rice. It is produced on about 10.5 million hectares of land, which occupies about 75 and 80% of the total cropped and irrigated areas, respectively [9].

Accurate and timely estimates of crop production before harvest are essential for food security and administrative planning, especially in the current, ever-changing global environment and international scenario [10, 11]. Rice yield forecasting has been extensively examined using various methods all around the world. In order to forecast rice yield, Kumar and Kumar (2012) added fuzzy values to the time series [12]. Alam et al. (2018) applied two hybrid approaches including ARIMAX-ANN and ARIMAX-SVM for estimating rice yield in India [13]. Jing-feng (2011) used NOAA/AVHRR data to predict rice production in Zhejiang Province through ratio models and regression models [14]. Using a crop growth model, Yun (2003) forecasted regional rice production in South Korea [15]. Koide et al. (2013) employed precipitation hindcasts from one uncoupled general circulation model (GCM) and two coupled GCMs to examine the predictive abilities of retrospective seasonal climate forecasts (hindcasts) customized to Philippine rice production data [16]. A satellite remote sensing technique was used by Noureldin et al. (2013) to forecast the production of rice in Egypt [17]. However, to reveal the growth pattern and make the most accurate prediction of rice production in Bangladesh, it is necessary to use a suitable approach that can successfully describe the observed data. Different techniques have been taken to accurately estimate yield, and each method has its own strengths and limitations [18]. For example, Rahman (2010), Mahmud (2018), Rahman et al. (2016), and Sulatana and Khanam 2020 applied the autoregressive integrated moving average (ARIMA) and artificial neural network (ANN) for predicting rice production in Bangladesh [19–22].

Sensor technologies, big data, the Internet of Things, artificial intelligence (AI), and machine learning approaches have recently shown great potential to advance precision agriculture and obtain accurate predictions [23]. According to the aforementioned literature and to the best of the author’s knowledge, XGBoost is a machine learning algorithm that has not been widely deployed. The eXtreme Gradient Boosting (XGBoost) model is a supervised machine learning technique and an emerging machine learning method for time series forecasting in recent years [24, 25]. It is a novel gradient tree-boosting algorithm that offers efficient out-of-core learning and sparsity awareness. XGBoost is a supervised learning technique that ought to be particularly good for the problem of claim prediction with both big training data and missing values, even if the commonly used methods such as random forest and neural networks can handle missing values [26, 27]. The robustness of XGBoost results in increased usage of the method in many other applications. As an example, Aler et al. utilize XGBoost in the field of direct-diffuse solar radiation separation by creating two models [28]. Moreover, in infectious disease prediction such as COVID-19, the XGBoost achieved greater prediction accuracy [29, 30].

In contrast, the Autoregressive Integrated Moving Average (ARIMA) model developed by Box and Jenkins (1990) is most widely used for forecasting time series data because of its capacity to handle non-stationary data [31]. The ARIMA model is a suitable forecasting method in agriculture for different crops and has been extensively used in the fields of economics and finance [31–33]. Therefore, this study aimed to (a) compare the predictive accuracy of the autoregressive integrated moving average (ARIMA) and eXtreme gradient boosting (XGBoost) for accurate modeling the annual rice production data in Bangladesh; and (b) carry out the best model to forecast rice production for the next 10 years (Fig 1). Finally, the findings of this study will help government officials and development practitioners make more accurate short-term predictions of future rice production to boost administrative planning and ensure food security.

Materials and methods

Data source

The annual rice production data from 1961 to 2020 (60 years) used in this study were collected from the website of FAOSTAT [34]. The data were divided into training and test sets. The proportion of training and testing data was 90% and 10%, respectively. The ARIMA and XGBoost models were built using the training data sets. The test data were used to evaluate the predictive ability of the developed models. The data set does not contain any missing values.

ARIMA model

The autoregressive integrated moving average (ARIMA) is a technique for analyzing and predicting time series data that was initially introduced by Box and Jenkins in 1976 [35]. An ARIMA (p, d, q) time series model consists of its three components. The letters p of the ARIMA model denote the autoregressive (AR) order, d denotes the differencing order, and q denotes the moving average order (MA) [36, 37]. The autoregressive order AR(p) describes the linear combination of the observations that are p times earlier with the random shock term, which can be mathematically defined as

Y_{t} = C + \emptyset_{1} Y_{t - 1} + \emptyset_{2} Y_{t - 2} + \emptyset_{3} Y_{t - 3} + \emptyset_{4} Y_{t - 4} \dots . . \emptyset_{p} Y_{t - p} + ε_{t}

(1)

Where, Y_t and ε_t represent the observed value and the random shock terms at time t, ∅_i (i = 1,2,3,4….) indicates the model parameters, and c is the constant term. On the other hand, the moving average order MA(q) explains the dependent variable for previous random shock terms, which can be defined as

Y_{t} = μ + θ_{1} ε_{t - 1} + θ_{2} ε_{t - 2} + θ_{3} ε_{t - 3} + θ_{4} ε_{t - 4} + \dots + θ_{q} ε_{t - q} + ε_{t}

(2)

where, μ represents the mean of the series, θ_j (j = 1, 2, 3… q) denotes the model parameters, and q indicates the model’s order [38]. According to the above explanation, the ARMA (p, q) model can be defined mathematically as follows:

Y_{t} = C + μ + \emptyset_{1} Y_{t - 1} + \emptyset_{2} Y_{t - 2} + \emptyset_{3} Y_{t - 3} + \emptyset_{4} Y_{t - 4} \dots . . + \emptyset_{p} Y_{t - p} + θ_{1} ε_{t - 1} + θ_{2} ε_{t - 2} + θ_{3} ε_{t - 3} + θ_{4} ε_{t - 4} + \dots + θ_{q} ε_{t - q} + ε_{t}

(3)

The general form of the ARIMA (p, d, q) model with the differenced series may be defined mathematically as follows:

{y'}_{t} = {c + \emptyset_{1} y'}_{t - 1} + \emptyset_{2} {y'}_{t - 2} + \dots + \emptyset_{p} {y^{'}}_{t - p} + θ_{1} ε_{t - 1} + θ_{2} ε_{t - 2} + \dots + θ_{q} ε_{t - q} + ε_{t}

(4)

Where y′_t explains the difference between the series (the number of differences can be greater than 1);; ∅₁, ∅₂…∅_p indicate the coefficients of AR(p) terms and θ₁, θ₂…θ_q show the coefficients of the moving average, MA(q) term. More information regarding ARIMA model can be found in the literature [30, 39].

XGBoost model

The eXtreme Gradient Boosting (XGBoost) is a type of boosting application that combines several learning applications to produce higher prediction accuracy than any of the individual learning applications used in several fields [24]. It is a decision tree-based ensemble machine learning approach that is frequently employed in data science. After utilizing an internal approach that aggregates the outcomes from several individual trees, precise forecasts can be obtained [29]. XGBoost was first introduced by Chen Tianqi and Carlos in 2011, and since then several researchers have refined and enhanced it for the follow-up study [40]. The XGBoost model aims to execute a gradient descent optimization approach so that the loss function can be reduced [41]. Boosting is an ensemble technique that can assemble thousands of forecasting models with lower performance into a strong, high-performance model by repeatedly merging the models within permissible parameter values [40, 42]. The objective function can be written as follows:

o b j (θ) = \sum_{i} {L (\hat{y}}_{i}, y_{i}) + \sum_{k} Ω (f_{k})

(5)

As mentioned above, the objective function (5) consists of a loss function denoted by L and a regularization term Ω(f_k), that reduces the new tree’s output variation. ${\hat{y}}_{i}$ denotes the predicted value and y_i represents the observed value. A detailed information regarding the XGBoost model can be found in the literature [24, 39].

Evaluation parameter of models

One of the major criteria of model evaluation is the calculation of model accuracy. The accuracy of a model describes how the actual and predicted values are close to each other. Model accuracy can be calculated by using several measures [43]. This study used the four widely used model accuracy measures, such as mean absolute percentage error (MAPE), mean percentage error (MPE), mean absolute error (MAE), and root mean square error (RMSE). These measures can be defined mathematically as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} | \hat{y_{i}} - y_{i} |

(6)

M P E = \frac{1}{n} \sum_{i = 1}^{n} (\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}) \times 100 %

(7)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(\hat{y_{i}} - y_{i})}^{2}}

(8)

M A P E = \frac{1}{n} \sum_{i = 1}^{n} | \frac{\hat{y_{i}} - y_{i}}{y_{i}} | \times 100 %

(9)

Where n indicates the number of samples, ${\hat{y}}_{i}$ denotes the predicted value and y_i represents the observed value, and $\hat{y_{i}} - y_{i}$ indicates the error value. The MAPE measurement provides the percentage result of the errors. Better fitting results are achieved with less errors [41].

Statistical analyses

ARIMA and XGBoost predictive models and several statistical analyses were carried out using RStudio (Version 4.2.1) [44]. The ARIMA model was fitted using the "forecast" package [45]. The XGBoost model was constructed with the "forecastxgb" package. The "ggplot2" package was used for graphical visualization. All necessary codes and data are available at https://github.com/Arman-Hossain-Chowdhury/Rice-production.

Results

The highest amount of rice produced in Bangladesh was 54,905,891 tons in 2020, and the lowest was 13,304,520 tons in 1962. The average amount of rice produced annually in Bangladesh is 29,960,847.08 tons. And the boxplot indicates that the data have no outliers (Fig 2).

We plotted the time series of the annual rice production data from 1961 to 2020 in Bangladesh. The data vary considerably and show a linear pattern. The Augmented Dickey Fuller (ADF) test confirmed that the data are not smooth (Fig 3).

To reduce variation and stabilize the actual data, Box & Cox (1964) presented a parametric power transformation technique [46]. We applied this technique to make the data stable and exhibit less variation (Fig 4) [47].

We performed the ADF test to see the stationarity of the data and found the data non-stationary (p-value = 0.57) at level. To compensate for the trend shift observed in (Fig 4), we used first-order differencing of the transformed sequence (Fig 5). The differenced time series was found stationary using the ADF test (p-value = 0.01). So, the parameter (d) of the ARIMA model was 1.

In the ACF diagram, there was an evident peak at lag 1 indicating that the MA may become 1 and an evident spike at lags 0 in the PACF diagram, suggesting that the AR may become 0 (Fig 6). Therefore, the maximum p and q values are 0 and 1, respectively.

Fig 6 — ACF, autocorrelation function; PACF, partial autocorrelation function.

The ARIMA model was built with the "auto.arima" function to list all possible models and then selected the model ARIMA (0,1,1) with drift on the basis of the lowest Corrected Akaikes Information Criteria (AICc) value. The drift parameter value indicates that the rice production drifts upward positively (Table 1).

Table 1. Estimated parameters of the ARIMA (0,1,1) with drift model.

Parameters	Estimate	Std. Error	z value	Pr(>\|z\|)
ma1	-0.32448	0.15445	-2.1008	0.03566*
drift	0.62942	0.14259	4.4142	0.00001***
AICc	201.54

Open in a new tab

AICc: Corrected Akaikes Information Criteria

Std. Error: Standard Error

ARIMA: Autoregressive Integrated Moving Average

Asterisk (*) indicates significant at 1% and (***) indicates significant at 0% level.

After that, the residual diagram, the ACF diagram of the residual, and the residual histogram were drawn, indicating a normal distribution (Fig 7). Hence, the ARIMA (0, 1,1) with drift model proved significant.

The XGBoost model was developed after adjusting several parameters. The adjusted parameters for the model were shown in S4 Table in S1 File. If a feature significantly affects the predicting performance when random noise takes its place, it is considered to be important. The feature importance of the XGBoost model was computed to see how each feature contributed to the prediction accuracy in the training set. And it was found that lag 5 of the training data contribute greatly to the model (Fig 8).

The curve of actual, fitted, and forecast values of the annual rice production in Bangladesh by ARIMA (0,1,1) with drift and the XGBoost model has been illustrated in Fig 9. The forecasted values of the XGBoost model were quite close to the actual values.

Fig 9 — ARIMA, autoregressive integrated moving average; XGBoost, eXtreme Gradient Boosting.

Model comparison

The ARIMA (0,1,1) with drift model was built using the difference of the time series data. As a result, we lost a value in the training set; therefore, we compared the remaining 53 values. We used a maximum of eight time-lagged variables as input features for XGBoost. Because the maximum lag of 8 of the rice production data can contribute precisely to improve the XGBoost model prediction accuracy. Hence, the remaining 46 values were compared for the XGBoost model. The prediction accuracy for the ARIMA and XGBoost models is shown in Table 2.

Table 2. Evaluation of parameters for the ARIMA and XGBoost model for rice production in Bangladesh.

Models	Training set				Test set
Models	MAE	MPE	RMSE	MAPE	MAE	MPE	RMSE	MAPE
ARIMA(0,1,1)	1109886	-0.30	1496325	4.55	3755137	-7.23	4093961	7.23
XGBoost	2817876	-5.91	3209634	10.39	2779742	-5.39	3195985	5.38

Open in a new tab

ARIMA: Autoregressive Integrated Moving Average

MAE: Mean Absolute Error

MPE: Mean Percentage Error

MAPE: Mean Absolute Percentage Error

RMSE: Root Mean Square Error

XGBoost: eXtreme Gradient Boosting.

The MAPE value of the test set of the XGBoost model was comparatively lower than the ARIMA model, which indicates that XGBoost performs better than ARIMA in predicting the annual rice production in Bangladesh. The detailed information regarding XGBoost model fitting can be found in S1 File.

Finally, based on our preferred XGBoost model, we predicted the annual rice production for the next 10 years (S1 File). According to our forecasts, during the next 10 years, the amount of rice produced annually in Bangladesh will vary between 57,850,318 and 82,256,940 tons, as illustrated in Fig 10.

Fig 10 — XGBoost: eXtreme Gradient Boosting.

Discussion

In our study, we found a linear upward pattern in the annual rice production data in Bangladesh. The primary goal of this study was to compare and contrast the predictive accuracy of the ARIMA and XGBoost forecasting models and make a short-term prediction with the best model. In this research, we examined the annual rice production in Bangladesh as a whole from 1961 to 2020. It is commonly known that Bangladesh has a subtropical tropical monsoon, which is distinguished by significant seasonal changes in precipitation, high temperatures, and humidity. In Bangladesh, there are three different seasons: a warm, humid summer from March to June; a chilly, wet monsoon season from June to October; and a cool, dry winter from October to March. In the past, temperatures in Bangladesh have ranged from 15°C to 34°C annually, with an average temperature of roughly 26°C [48, 49]. Food production (e.g. rice, wheat) is particularly vulnerable to climate change because the agricultural productions are severely impacted by the climate patterns. Several previous studies examined that mean temperature can negatively impact the rice production [50, 51]. Precipitation had a positive impact on rice production, which was also determined by a previous study [52]. To know the actual pattern of the annual rice production in Bangladesh and forecast it accurately, time series modeling is very crucial [53].

The ARIMA model for the annual rice production data was established based on the concept of linear regression to forecast future data points. Without using any other explanatory variable, the ARIMA model is capable of understanding the pattern of the historical data and making accurate forecasts. So, it is simple to establish the ARIMA model [24]. Since ARIMA is a well-known and most widely used time series forecasting model, this study compared the ARIMA model with the robust XGBoost machine learning model. The ARIMA model can be well fitted to non-stationary data after the Box-Cox transformation and differencing of the original data [39]. But differencing can cause data lose. In order to differencing the data, this study lost one-year data. We built the ARIMA models using the auto.arima function by adjusting the power transformation parameter (lambda) and selected the appropriate model based on the lowest AICc value. Based on the lowest AICc value, we finally selected the optimal ARIMA (0,1,1) with the drift model.

On the other hand, we used the tree-based ensemble XGBoost supervised machine learning technique on our data. Several previous studies used several machine learning models, such as the artificial neural network [22], the random forest [26, 54, 55], and the support vector machine [56, 57] to predict rice production and obtained effective predicting results. The eXtreme gradient boosting is a robust machine learning technique for precisely modeling, analyzing, and forecasting time series data [25]. The XGBoost model provides a variety of advantages regarding model forecasting. For example, it does not require any preprocessing of the data. It has a rapid processing speed, robust feature selection, good fitting, greater predictive performance and late scaling penalty than a typical Gradient boosting decision tree which removes the model from the occurrences of overfitting [25, 58]. As a result, we compared the predictive performance of the ARIMA model with the XGBoost model. From the result, it is clear that XGBoost performs better than the ARIMA model. In the meantime, the XGBoost model may also be utilized for cross-validation and has the ability to automatically identify significant feature vectors. The MAPE value of the XGBoost model for the test set is comparatively lower than the ARIMA model, which indicates XGBoost performs better than the ARIMA model in predicting the annual rice production in Bangladesh. Therefore, we used the XGBoost model to make a short-term prediction for the next 10 years. The prediction reveals that the amount of rice produced annually will grow in the following years in Bangladesh.

According to our study, the fitting and forecasting accuracy of the XGBoost model is much better than the traditional time-series ARIMA model. Without requiring any influencing factor, our proposed model can feasibly predict the annual rice production in Bangladesh.

Limitations

In this study, we identified a model by comparing the ARIMA and XGBoost models that could accurately predict the annual rice production in Bangladesh. There are several machine learning models such as Decision Tree, LightGBM, and so on that are more robust and might have greater prediction accuracy. These models need to be applied in the future to find the best one. We mainly concentrated on the effect of time on rice production, which made it simpler to develop and predict our model. As a result, one of the limitations is that some climatic and econometric factors like temperature, rainfall, consumption, and so on, which are well known to affect rice production, were not taken into account in this study. These should be investigated further in light of the data’s availability.

Conclusion

We built an ARIMA and XGBoost model for forecasting the annual rice production in Bangladesh. These models were applied to generate a short-term prediction in this study. The XGBoost model performed better than the ARIMA model in predicting the annual rice production in Bangladesh. Finally, the government and development practitioners can employ XGBoost models over ARIMA to make more accurate short-term predictions of future crop production.

Supporting information

S1 File

(DOCX)

Click here for additional data file.^{(24.9KB, docx)}

Acknowledgments

We are very much grateful to the reviewers for providing valuable instructions and suggestions to make the study more appealing.

Data Availability

All necessary codes and data are available on GitHub (https://github.com/Arman-Hossain-Chowdhury/Rice-production).

Funding Statement

The author(s) received no specific funding for this study.

References

1.Godfray HCJ, Beddington JR, Crute IR, Haddad L, Lawrence D, Muir JF, et al. Food Security: The Challenge of Feeding 9 Billion People. Science (80-). 2010;327: 812–818. doi: 10.1126/SCIENCE.1185383 [DOI] [PubMed] [Google Scholar]
2.Rahman MC, Islam MA, Rahaman MS, Sarkar MAR, Ahmed R, Kabir MS. Identifying the Threshold Level of Flooding for Rice Production in Bangladesh: An Empirical Analysis. J Bangladesh Agric Univ. 2021;19: 243–250. doi: 10.5455/JBAU.53297 [DOI] [Google Scholar]
3.Khush GS. What it will take to Feed 5.0 Billion Rice consumers in 2030. Plant Mol Biol 2005 591. 2005;59: 1–6. doi: 10.1007/s11103-005-2159-5 [DOI] [PubMed] [Google Scholar]
4.Dawe D. The contribution of rice research to poverty alleviation. Stud Plant Sci. 2000;7: 3–12. doi: 10.1016/S0928-3420(00)80003-8 [DOI] [Google Scholar]
5.Siddique MAB, Sarkar MAR, Rahman MC, Chowdhury A, Rahman MS, Deb L. Rice farmers’ technical efficiency under abiotic stresses in Bangladesh. Asian J Agric Rural Dev. 2017;7: 219–232. doi: 10.18488/JOURNAL.1005/2017.7.11/1005.11.219.232 [DOI] [Google Scholar]
6.Sayeed KA, Yunus MM. Rice prices and growth, and poverty reduction in Bangladesh. 2018; 1–39. Available: http://www.fao.org/publications/card/en/c/I8332EN [Google Scholar]
7.BBS 2015. Statistical Yearbook of Bangladesh, Ministry of Planning, Government of the People’s Republic of Bangladesh, Dhaka. [Google Scholar]
8.BBS 2020. Statistical Yearbook of Bangladesh, Ministry of Planning, Government of the People’s Republic of Bangladesh, Dhaka. [Google Scholar]
9.Bangladesh Economic Review 2020. Economic Adviser’s Wing, Finance Division, Ministry of Finance, Government of the People’s Republic of Bangladesh.
10.Gebbers R, Adamchuk VI. Precision Agriculture and Food Security. Science (80-). 2010;327: 828–831. doi: 10.1126/science.1183899 [DOI] [PubMed] [Google Scholar]
11.Ji Z, Pan Y, Zhu X, Wang J, Li Q. Prediction of Crop Yield Using Phenological Information Extracted from Remote Sensing Vegetation Index. Sensors 2021, Vol 21, Page 1406. 2021;21: 1406. doi: 10.3390/s21041406 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Kumar N. A Novel Method for Rice Production Forecasting Using Fuzzy Time Series. Int J Comput Sci Issues. 2012;9: 455–459. [Google Scholar]
13.Alam W, Mrinmoy RAY, Kumar RR, Sinha K, Rathod S, Singh KN. Improved ARIMAX modal based on ANN and SVM approaches for forecasting rice yield using weather variables. Indian J Agric Sci. 2018;88: 1909–1913. [Google Scholar]
14.Jing-feng HU, Zhong-en YA, Ren-chao WA, Hong-wei XU HJ. The rice production forecasting models using NOAA/AVHRR data based on GIS. Remote Sens Technol Appl. 2011;17: 125–128. [Google Scholar]
15.Yun JI. Predicting regional rice production in South Korea using spatial data and crop-growth modeling. Agric Syst. 2003;77: 23–38. doi: 10.1016/S0308-521X(02)00084-7 [DOI] [Google Scholar]
16.Koide N, Robertson AW, Ines AVM, Qian JH, Dewitt DG, Lucero A. Prediction of rice production in the Philippines using seasonal climate forecasts. J Appl Meteorol Climatol. 2013;52: 552–569. doi: 10.1175/JAMC-D-11-0254.1 [DOI] [Google Scholar]
17.Noureldin NA, Aboelghar MA, Saudy HS, Ali AM. Rice yield forecasting models using satellite imagery in Egypt. Egypt J Remote Sens Sp Sci. 2013;16: 125–131. doi: 10.1016/j.ejrs.2013.04.005 [DOI] [Google Scholar]
18.Bandumula N. Rice Production in Asia: Key to Global Food Security. Proc Natl Acad Sci India Sect B Biol Sci 2017 884. 2017;88: 1323–1328. doi: 10.1007/S40011-017-0867-7 [DOI] [Google Scholar]
19.Rahman NMF, Hasan MM, Hossain MI, Baten MA, Hosen S, Ali MA, et al. Forecasting Aus Rice Area and Production in Bangladesh using Box-Jenkins Approach. Bangladesh Rice J. 2016;20: 1–10. doi: 10.3329/BRJ.V20I1.30623 [DOI] [Google Scholar]
20.Mahmud S. Predicting the Rice Production of Bangladesh by Machine Learning Technique. 2018;7: 7–13. [Google Scholar]
21.Rahman N. Forecasting of boro rice production in Bangladesh: An ARIMA approach. J Bangladesh Agric Univ. 1970;8: 103–112. doi: 10.3329/JBAU.V8I1.6406 [DOI] [Google Scholar]
22.Sultana A, Khanam M. Forecasting Rice Production of Bangladesh Using ARIMA and Artificial Neural Network Models. Dhaka Univ J Sci. 2020;68: 143–147. doi: 10.3329/DUJS.V68I2.54612 [DOI] [Google Scholar]
23.Rodríguez JP, Corrales DC, Griol D, Callejas Z, Corrales JC. A Non-Destructive Time Series Model for the Estimation of Cherry Coffee Production. C Mater Contin. 2022;70: 4725–4743. doi: 10.32604/CMC.2022.019135 [DOI] [Google Scholar]
24.Lv CX, An SY, Qiao BJ, Wu W. Time series analysis of hemorrhagic fever with renal syndrome in mainland China by using an XGBoost forecasting model. BMC Infect Dis. 2021;21: 1–13. doi: 10.1186/S12879-021-06503-Y/TABLES/5 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Alim M, Ye GH, Guan P, Huang DS, Zhou B Sen, Wu W. Comparison of ARIMA model and XGBoost model for prediction of human brucellosis in mainland China: A time-series study. BMJ Open. 2020;10: 1–8. doi: 10.1136/bmjopen-2020-039676 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Narasimhamurthy V. Rice Crop Yield Forecasting Using Random Forest Algorithm SML. Int J Res Appl Sci Eng Technol. 2017;V: 1220–1225. doi: 10.22214/ijraset.2017.10176 [DOI] [Google Scholar]
27.Anitha P, Chakravarthy T. Agricultural Crop Yield Prediction using Artificial Neural Network with Feed Forward Algorithm. Int J Comput Sci Eng. 2018;6: 178–181. doi: 10.26438/ijcse/v6i11.178181 [DOI] [Google Scholar]
28.Aler R, Galván IM, Ruiz-Arias JA, Gueymard CA. Improving the separation of direct and diffuse solar radiation components using machine learning by gradient boosting. Sol Energy. 2017;150: 558–569. doi: 10.1016/J.SOLENER.2017.05.018 [DOI] [Google Scholar]
29.Fang ZG, Yang SQ, Lv CX, An SY, Wu W. Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study. BMJ Open. 2022;12: 1–8. doi: 10.1136/bmjopen-2021-056685 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Rahman MS, Chowdhury AH. A data-driven eXtreme gradient boosting machine learning model to predict COVID-19 transmission with meteorological drivers. 2022; 1–14. doi: 10.1371/journal.pone.0273319 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Khashei M, Bijari M, Raissi Ardali GA. Hybridization of autoregressive integrated moving average (ARIMA) with probabilistic neural networks (PNNs). Comput Ind Eng. 2012;63: 37–45. doi: 10.1016/J.CIE.2012.01.017 [DOI] [Google Scholar]
32.Pai PF, Lin CS. A hybrid ARIMA and support vector machines model in stock price forecasting. Omega. 2005;33: 497–505. doi: 10.1016/J.OMEGA.2004.07.024 [DOI] [Google Scholar]
33.Kabir MS, Salam MU, Chowdhury A, Rahman MF, Iftekharuddaula KM, Rahman MS, et al. Rice Vision for Bangladesh: 2050 and Beyond. Bangladesh Rice J. 2015;19: 1–18. doi: 10.3329/BRJ.V19I2.28160 [DOI] [Google Scholar]
34.FAOSTAT. Annaul Rice Production data of Bangladesh. [cited 8 Dec 2022]. Available: https://www.fao.org/faostat/en/#data
35.Helfenstein U. Box-Jenkins modelling in medical research. 2016;5: 3–22. doi: 10.1177/096228029600500102 [DOI] [PubMed] [Google Scholar]
36.Amin M, Amanullah M, Akbar A. Time series modeling for forecasting wheat production of Pakistan. J Anim Plant Sci. 2014;24: 1444–1451. [Google Scholar]
37.Alzahrani SI, Aljamaan IA, Al-Fakih EA. Forecasting the spread of the COVID-19 pandemic in Saudi Arabia using ARIMA prediction model under current public health interventions. J Infect Public Health. 2020;13: 914–919. doi: 10.1016/j.jiph.2020.06.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Sahai AK, Rath N, Sood V, Singh MP. ARIMA modelling & forecasting of COVID-19 in top five affected countries. Diabetes Metab Syndr Clin Res Rev. 2020;14: 1419–1427. doi: 10.1016/J.DSX.2020.07.042 [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Rahman MS, Chowdhury AH, Amrin M. Accuracy comparison of ARIMA and XGBoost forecasting models in predicting the incidence of COVID-19 in Bangladesh. Plos Glob Public Heal. 2022;2019: 1–13. doi: 10.1371/journal.pgph.0000495 [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Li W, Yin Y, Quan X, Zhang H. Gene Expression Value Prediction Based on XGBoost Algorithm. Front Genet. 2019;10: 1–7. doi: 10.3389/fgene.2019.01077 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Luo J, Zhang Z, Fu Y, Rao F. Time series prediction of COVID-19 transmission in America using LSTM and XGBoost algorithms. Results Phys. 2021;27: 104462. doi: 10.1016/j.rinp.2021.104462 [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Paliari I, Karanikola A, Kotsiantis S. A comparison of the optimized LSTM, XGBOOST and ARIMA in Time Series forecasting. IISA 2021 - 12th Int Conf Information, Intell Syst Appl. 2021. doi: 10.1109/IISA52424.2021.9555520 [DOI] [Google Scholar]
43.Prajapati S, Swaraj A, Lalwani R, Narwal A, Verma K, Singh G. Comparison of Traditional and Hybrid Time Series Models for Forecasting COVID-19 Cases. 2019;8. [Google Scholar]
44.RStudio: Integrated Development Environment for R RStudio Team. In: RStudio, PBC, Boston, MA (2022) [Internet]. [cited 18 Dec 2022]. Available: https://www.rstudio.com/ [Google Scholar]
45.Hyndman RJ, Khandakar Y. Automatic Time Series Forecasting: The forecast Package for R. J Stat Softw. 2008;27: 1–22. doi: 10.18637/JSS.V027.I03 [DOI] [Google Scholar]
46.Sakia RM. The Box-Cox Transformation Technique: A Review. Stat. 1992;41: 169. doi: 10.2307/2348250 [DOI] [Google Scholar]
47.Curran-Everett D. Explorations in statistics: The log transformation. Adv Physiol Educ. 2018;42: 343–347. doi: 10.1152/advan.00018.2018 [DOI] [PubMed] [Google Scholar]
48.Bangladesh - Climatology | Climate Change Knowledge Portal. [cited 13 Dec 2022]. Available: https://climateknowledgeportal.worldbank.org/country/bangladesh/climate-data-historical [Google Scholar]
49.Climate of the World: Bangladesh | weatheronline.co.uk. [cited 18 Dec 2022]. Available: https://www.weatheronline.co.uk/reports/climate/Bangladesh.htm [Google Scholar]
50.Stuecker MF, Tigchelaar M, Kantar MB. Climate variability impacts on rice production in the Philippines. PLoS One. 2018;13. doi: 10.1371/journal.pone.0201426 [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Pickson RB, He G, Boateng E. Impacts of climate change on rice production: evidence from 30 Chinese provinces. Environ Dev Sustain 2021 243. 2021;24: 3907–3925. doi: 10.1007/S10668-021-01594-8 [DOI] [Google Scholar]
52.Mahmood N, Ahmad B, Hassan S, Bakhsh K. Impact of temperature ADN precipitation on rice productivity in rice-wheat cropping system of Punjab province. J Anim Plant Sci. 2012;22: 993–997. [Google Scholar]
53.Reddy PCS, Sureshbabu A. An Applied Time Series Forecasting Model for Yield Prediction of Agricultural Crop. Adv Intell Syst Comput. 2020;1118: 177–187. doi: 10.1007/978-981-15-2475-2_16/COVER/ [DOI] [Google Scholar]
54.Kim J, Lee J, Sang W, Shin P, Cho H, Seo M. Random Forest를 이용한 남한지역 쌀 수량 예측 연구 Rice yield prediction in South Korea by using random forest. 2019;21: 75–84. doi: 10.5532/KJAFM.2019.21.2.75 [DOI] [Google Scholar]
55.Choudhary K, Shi W, Dong Y, Paringer R. Random Forest for rice yield mapping and prediction using Sentinel-2 data with Google Earth Engine. Adv Sp Res. 2022;70: 2443–2457. doi: 10.1016/J.ASR.2022.06.073 [DOI] [Google Scholar]
56.Fegade TK, Pawar B V. Crop Prediction Using Artificial Neural Network and Support Vector Machine. Adv Intell Syst Comput. 2020;1016: 311–324. doi: 10.1007/978-981-13-9364-8_23/COVER [DOI] [Google Scholar]
57.Gandhi N, Petkar O, Armstrong LJ, Tripathy AK. Rice crop yield prediction in India using support vector machines. 2016 13th Int Jt Conf Comput Sci Softw Eng JCSSE 2016. 2016. doi: 10.1109/JCSSE.2016.7748856 [DOI] [Google Scholar]
58.Wu W, Guo J, An S, Guan P, Ren Y, Xia L, et al. Comparison of two hybrid models for forecasting the incidence of hemorrhagic fever with renal syndrome in Jiangsu Province, China. PLoS One. 2015;10: 1–13. doi: 10.1371/journal.pone.0135492 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0283452.r001

Decision Letter 0

Sathishkumar V E

12 Dec 2022

PONE-D-22-20989Accuracy Performance of Time Series and Machine Learning Models for Predicting Rice Production in Bangladesh: A Comparative AnalysisPLOS ONE

Dear Dr. Mila,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Title of the manuscript should be changed highlighting the core idea of the study. Results and comparitive analysis should be improved.

Please submit your revised manuscript by Jan 26 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Sathishkumar V E

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We noticed you have some minor occurrence of overlapping text with the following previous publication(s), which needs to be addressed:

Alim M, Ye GH, Guan P, Huang DS, Zhou BS, Wu W. Comparison of ARIMA model and XGBoost model for prediction of human brucellosis in mainland China: a time-series study. BMJ Open. 2020 Dec 7;10(12):e039676. doi: 10.1136/bmjopen-2020-039676. PMID: 33293308; PMCID: PMC7722837.

Rahman MS, Chowdhury AH, Amrin M (2022) Accuracy comparison of ARIMA and XGBoost forecasting models in predicting the incidence of COVID-19 in Bangladesh. PLOS Glob Public Health 2(5): e0000495. https://doi.org/10.1371/journal.pgph.0000495

In your revision ensure you cite all your sources (including your own works), and quote or rephrase any duplicated text outside the methods section. Further consideration is dependent on these concerns being addressed.

3. We suggest you thoroughly copyedit your manuscript for language usage, spelling, and grammar. If you do not know anyone who can help you do this, you may wish to consider employing a professional scientific editing service.

Whilst you may use any professional scientific editing service of your choice, PLOS has partnered with both American Journal Experts (AJE) and Editage to provide discounted services to PLOS authors. Both organizations have experience helping authors meet PLOS guidelines and can provide language editing, translation, manuscript formatting, and figure formatting to ensure your manuscript meets our submission guidelines. To take advantage of our partnership with AJE, visit the AJE website (http://learn.aje.com/plos/) for a 15% discount off AJE services. To take advantage of our partnership with Editage, visit the Editage website (www.editage.com) and enter referral code PLOSEDIT for a 15% discount off Editage services. If the PLOS editorial team finds any language issues in text that either AJE or Editage has edited, the service provider will re-edit the text for free.

Upon resubmission, please provide the following:

The name of the colleague or the details of the professional service that edited your manuscript

A copy of your manuscript showing your changes by either highlighting them or using track changes (uploaded as a *supporting information* file)

A clean copy of the edited manuscript (uploaded as the new *manuscript* file)”

4. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Line number is missing which made the review difficult. The language of the manuscript needs fine tuning.

“The common previously used methods such as random forest and neural network still cannot handle missing values.” Please provide supporting literature for this statement.

Rice yield prediction has been widely studied all over the world. Please improve the introduction section by discussing about the various studies done over the world on rice yield forecasting.

“The XGBoost model with the greatest result for time series data was developed by changing the parameters frequently.” What are the ranges of the parameters tried to get the best result? Please mention that.

Results

“We found the presence of heteroscedasticity and non-normality in the data.” Please present the results of statistical analysis done to test the heteroscedasticity and non-normality of the data. At the same time, present the analysis results after boxcox transformation.

Discussion

“we found an increasing linear trend for the annual rice production data from 1961 to 2020 in Bangladesh.” I suggest the authors to analyse the trend in the rice yield data using Mann-Kendall or linear trend analysis.

Delete “To train these models, we used 90% of our data as training set and test the performance of the model using the remaining 10% of the data.”

Discussion is merely summary of the study. You should compare your results with previously published literature.

Reviewer #2: The study “Accuracy Performance of Time Series and Machine Learning Models for Predicting Rice Production in Bangladesh: A Comparative Analysis” is interesting. The study is well organized and executed properly, however, attention should be given to the following highlighted points before resubmission.

1. The abstract is verbose and does not highlight the results. It should report results and main findings instead of being generic.

2. The authors may provide some more detailed information regarding the XGBoost model which will be helpful for readers.

3. The authors mentioned that they used auto.arima function for selecting the best ARIMA model. “The ARIMA models were built with the 'forecast' package using auto.arima function for choosing the best model based on the AICc values [34]”. while the authors also mentioned this statement as well. “We performed the ADF test to see the stationarity of the data and found the data non stationary (p>0.01). To compensate for the trend shift observed in (Fig 3), we used first-order differencing of the sequence (Fig 4). The differenced time series was found stationary using the ADF test (p<0.01). So, the parameter of the ARIMA model d was 1”. In the ACF diagram, there was an evident peak at lag 1 indicating that the MA may become 1 and an evident spike at lags 0 in the PACF diagram, suggesting that the AR may become 0 (Fig 5). The authors may clearly state which procedure they used to choose the best ARIMA model.

4. What are the reasons that the authors may choose the ARIMA model with drift? This means defining the characteristics of the data.

5. Replace Table 1 and provide the P-values of the parameters and also the complete statistics. Secondly, the authors may also update the information for XGBoost model. The tuning parameters, etc.

6. “We used 8 time-lagged variables as input features for XGBoost; hence, the remaining 46 values were compared for the XGBoost model”. How the 8-time lagged is selected for XGBoost.

7. In Table 2 I feel some doubt about reporting the results. For the testing set, the results are consistent for all accuracy criteria. While for the training set the MAE value for XGBoost is very high. In the majority of cases, the MAE value is less than the RMSE value. Please check and rectify.

8. In the goodness of fit criteria, the authors may also include the directional statistics (DS) and Diebold Marino test (DM). Secondly, the MAPE results can be explained within its theoretical bounds. The authors may take help from the following studies. https://doi.org/10.1155/2020/1325071 and 10.1109/ACCESS.2019.2946992

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Bappa Das

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Mar 27;18(3):e0283452. doi: 10.1371/journal.pone.0283452.r002

Author response to Decision Letter 0

19 Jan 2023

Dear Reviewers,

Greetings of the day. We are appreciative to the reviewers and editors for their insightful advice on how to improve our paper. We have meticulously reworked each portion of the article based on the reviewers' and editors' feedback. According to the authors' decision, we have also altered the title of the manuscript. We have substantially revised the entire manuscript. We believe that the modifications made to the new version will be acceptable.

Please let me know If you need any other necessary documents or corrections.

I am looking forward to hearing from you soon.

Thank you once again.

Best Regards

Farhana Arefeen MIla

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(34.3KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0283452.r003

Decision Letter 1

Sathishkumar V E

8 Mar 2023

A tree based eXtreme Gradient Boosting (XGBoost) machine learning model to forecast the Annual Rice Production in Bangladesh

PONE-D-22-20989R1

Dear Dr. Mila,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Sathishkumar V E

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0283452.r004

Acceptance letter

Sathishkumar V E

14 Mar 2023

PONE-D-22-20989R1

A tree based eXtreme Gradient Boosting (XGBoost) machine learning model to forecast the Annual Rice Production in Bangladesh

Dear Dr. Mila:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Sathishkumar V E

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File

(DOCX)

Click here for additional data file.^{(24.9KB, docx)}

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(34.3KB, docx)}

Data Availability Statement

All necessary codes and data are available on GitHub (https://github.com/Arman-Hossain-Chowdhury/Rice-production).

[pone.0283452.ref001] 1.Godfray HCJ, Beddington JR, Crute IR, Haddad L, Lawrence D, Muir JF, et al. Food Security: The Challenge of Feeding 9 Billion People. Science (80-). 2010;327: 812–818. doi: 10.1126/SCIENCE.1185383 [DOI] [PubMed] [Google Scholar]

[pone.0283452.ref002] 2.Rahman MC, Islam MA, Rahaman MS, Sarkar MAR, Ahmed R, Kabir MS. Identifying the Threshold Level of Flooding for Rice Production in Bangladesh: An Empirical Analysis. J Bangladesh Agric Univ. 2021;19: 243–250. doi: 10.5455/JBAU.53297 [DOI] [Google Scholar]

[pone.0283452.ref003] 3.Khush GS. What it will take to Feed 5.0 Billion Rice consumers in 2030. Plant Mol Biol 2005 591. 2005;59: 1–6. doi: 10.1007/s11103-005-2159-5 [DOI] [PubMed] [Google Scholar]

[pone.0283452.ref004] 4.Dawe D. The contribution of rice research to poverty alleviation. Stud Plant Sci. 2000;7: 3–12. doi: 10.1016/S0928-3420(00)80003-8 [DOI] [Google Scholar]

[pone.0283452.ref005] 5.Siddique MAB, Sarkar MAR, Rahman MC, Chowdhury A, Rahman MS, Deb L. Rice farmers’ technical efficiency under abiotic stresses in Bangladesh. Asian J Agric Rural Dev. 2017;7: 219–232. doi: 10.18488/JOURNAL.1005/2017.7.11/1005.11.219.232 [DOI] [Google Scholar]

[pone.0283452.ref006] 6.Sayeed KA, Yunus MM. Rice prices and growth, and poverty reduction in Bangladesh. 2018; 1–39. Available: http://www.fao.org/publications/card/en/c/I8332EN [Google Scholar]

[pone.0283452.ref007] 7.BBS 2015. Statistical Yearbook of Bangladesh, Ministry of Planning, Government of the People’s Republic of Bangladesh, Dhaka. [Google Scholar]

[pone.0283452.ref008] 8.BBS 2020. Statistical Yearbook of Bangladesh, Ministry of Planning, Government of the People’s Republic of Bangladesh, Dhaka. [Google Scholar]

[pone.0283452.ref009] 9.Bangladesh Economic Review 2020. Economic Adviser’s Wing, Finance Division, Ministry of Finance, Government of the People’s Republic of Bangladesh.

[pone.0283452.ref010] 10.Gebbers R, Adamchuk VI. Precision Agriculture and Food Security. Science (80-). 2010;327: 828–831. doi: 10.1126/science.1183899 [DOI] [PubMed] [Google Scholar]

[pone.0283452.ref011] 11.Ji Z, Pan Y, Zhu X, Wang J, Li Q. Prediction of Crop Yield Using Phenological Information Extracted from Remote Sensing Vegetation Index. Sensors 2021, Vol 21, Page 1406. 2021;21: 1406. doi: 10.3390/s21041406 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283452.ref012] 12.Kumar N. A Novel Method for Rice Production Forecasting Using Fuzzy Time Series. Int J Comput Sci Issues. 2012;9: 455–459. [Google Scholar]

[pone.0283452.ref013] 13.Alam W, Mrinmoy RAY, Kumar RR, Sinha K, Rathod S, Singh KN. Improved ARIMAX modal based on ANN and SVM approaches for forecasting rice yield using weather variables. Indian J Agric Sci. 2018;88: 1909–1913. [Google Scholar]

[pone.0283452.ref014] 14.Jing-feng HU, Zhong-en YA, Ren-chao WA, Hong-wei XU HJ. The rice production forecasting models using NOAA/AVHRR data based on GIS. Remote Sens Technol Appl. 2011;17: 125–128. [Google Scholar]

[pone.0283452.ref015] 15.Yun JI. Predicting regional rice production in South Korea using spatial data and crop-growth modeling. Agric Syst. 2003;77: 23–38. doi: 10.1016/S0308-521X(02)00084-7 [DOI] [Google Scholar]

[pone.0283452.ref016] 16.Koide N, Robertson AW, Ines AVM, Qian JH, Dewitt DG, Lucero A. Prediction of rice production in the Philippines using seasonal climate forecasts. J Appl Meteorol Climatol. 2013;52: 552–569. doi: 10.1175/JAMC-D-11-0254.1 [DOI] [Google Scholar]

[pone.0283452.ref017] 17.Noureldin NA, Aboelghar MA, Saudy HS, Ali AM. Rice yield forecasting models using satellite imagery in Egypt. Egypt J Remote Sens Sp Sci. 2013;16: 125–131. doi: 10.1016/j.ejrs.2013.04.005 [DOI] [Google Scholar]

[pone.0283452.ref018] 18.Bandumula N. Rice Production in Asia: Key to Global Food Security. Proc Natl Acad Sci India Sect B Biol Sci 2017 884. 2017;88: 1323–1328. doi: 10.1007/S40011-017-0867-7 [DOI] [Google Scholar]

[pone.0283452.ref019] 19.Rahman NMF, Hasan MM, Hossain MI, Baten MA, Hosen S, Ali MA, et al. Forecasting Aus Rice Area and Production in Bangladesh using Box-Jenkins Approach. Bangladesh Rice J. 2016;20: 1–10. doi: 10.3329/BRJ.V20I1.30623 [DOI] [Google Scholar]

[pone.0283452.ref020] 20.Mahmud S. Predicting the Rice Production of Bangladesh by Machine Learning Technique. 2018;7: 7–13. [Google Scholar]

[pone.0283452.ref021] 21.Rahman N. Forecasting of boro rice production in Bangladesh: An ARIMA approach. J Bangladesh Agric Univ. 1970;8: 103–112. doi: 10.3329/JBAU.V8I1.6406 [DOI] [Google Scholar]

[pone.0283452.ref022] 22.Sultana A, Khanam M. Forecasting Rice Production of Bangladesh Using ARIMA and Artificial Neural Network Models. Dhaka Univ J Sci. 2020;68: 143–147. doi: 10.3329/DUJS.V68I2.54612 [DOI] [Google Scholar]

[pone.0283452.ref023] 23.Rodríguez JP, Corrales DC, Griol D, Callejas Z, Corrales JC. A Non-Destructive Time Series Model for the Estimation of Cherry Coffee Production. C Mater Contin. 2022;70: 4725–4743. doi: 10.32604/CMC.2022.019135 [DOI] [Google Scholar]

[pone.0283452.ref024] 24.Lv CX, An SY, Qiao BJ, Wu W. Time series analysis of hemorrhagic fever with renal syndrome in mainland China by using an XGBoost forecasting model. BMC Infect Dis. 2021;21: 1–13. doi: 10.1186/S12879-021-06503-Y/TABLES/5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283452.ref025] 25.Alim M, Ye GH, Guan P, Huang DS, Zhou B Sen, Wu W. Comparison of ARIMA model and XGBoost model for prediction of human brucellosis in mainland China: A time-series study. BMJ Open. 2020;10: 1–8. doi: 10.1136/bmjopen-2020-039676 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283452.ref026] 26.Narasimhamurthy V. Rice Crop Yield Forecasting Using Random Forest Algorithm SML. Int J Res Appl Sci Eng Technol. 2017;V: 1220–1225. doi: 10.22214/ijraset.2017.10176 [DOI] [Google Scholar]

[pone.0283452.ref027] 27.Anitha P, Chakravarthy T. Agricultural Crop Yield Prediction using Artificial Neural Network with Feed Forward Algorithm. Int J Comput Sci Eng. 2018;6: 178–181. doi: 10.26438/ijcse/v6i11.178181 [DOI] [Google Scholar]

[pone.0283452.ref028] 28.Aler R, Galván IM, Ruiz-Arias JA, Gueymard CA. Improving the separation of direct and diffuse solar radiation components using machine learning by gradient boosting. Sol Energy. 2017;150: 558–569. doi: 10.1016/J.SOLENER.2017.05.018 [DOI] [Google Scholar]

[pone.0283452.ref029] 29.Fang ZG, Yang SQ, Lv CX, An SY, Wu W. Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study. BMJ Open. 2022;12: 1–8. doi: 10.1136/bmjopen-2021-056685 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283452.ref030] 30.Rahman MS, Chowdhury AH. A data-driven eXtreme gradient boosting machine learning model to predict COVID-19 transmission with meteorological drivers. 2022; 1–14. doi: 10.1371/journal.pone.0273319 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283452.ref031] 31.Khashei M, Bijari M, Raissi Ardali GA. Hybridization of autoregressive integrated moving average (ARIMA) with probabilistic neural networks (PNNs). Comput Ind Eng. 2012;63: 37–45. doi: 10.1016/J.CIE.2012.01.017 [DOI] [Google Scholar]

[pone.0283452.ref032] 32.Pai PF, Lin CS. A hybrid ARIMA and support vector machines model in stock price forecasting. Omega. 2005;33: 497–505. doi: 10.1016/J.OMEGA.2004.07.024 [DOI] [Google Scholar]

[pone.0283452.ref033] 33.Kabir MS, Salam MU, Chowdhury A, Rahman MF, Iftekharuddaula KM, Rahman MS, et al. Rice Vision for Bangladesh: 2050 and Beyond. Bangladesh Rice J. 2015;19: 1–18. doi: 10.3329/BRJ.V19I2.28160 [DOI] [Google Scholar]

[pone.0283452.ref034] 34.FAOSTAT. Annaul Rice Production data of Bangladesh. [cited 8 Dec 2022]. Available: https://www.fao.org/faostat/en/#data

[pone.0283452.ref035] 35.Helfenstein U. Box-Jenkins modelling in medical research. 2016;5: 3–22. doi: 10.1177/096228029600500102 [DOI] [PubMed] [Google Scholar]

[pone.0283452.ref036] 36.Amin M, Amanullah M, Akbar A. Time series modeling for forecasting wheat production of Pakistan. J Anim Plant Sci. 2014;24: 1444–1451. [Google Scholar]

[pone.0283452.ref037] 37.Alzahrani SI, Aljamaan IA, Al-Fakih EA. Forecasting the spread of the COVID-19 pandemic in Saudi Arabia using ARIMA prediction model under current public health interventions. J Infect Public Health. 2020;13: 914–919. doi: 10.1016/j.jiph.2020.06.001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283452.ref038] 38.Sahai AK, Rath N, Sood V, Singh MP. ARIMA modelling & forecasting of COVID-19 in top five affected countries. Diabetes Metab Syndr Clin Res Rev. 2020;14: 1419–1427. doi: 10.1016/J.DSX.2020.07.042 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283452.ref039] 39.Rahman MS, Chowdhury AH, Amrin M. Accuracy comparison of ARIMA and XGBoost forecasting models in predicting the incidence of COVID-19 in Bangladesh. Plos Glob Public Heal. 2022;2019: 1–13. doi: 10.1371/journal.pgph.0000495 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283452.ref040] 40.Li W, Yin Y, Quan X, Zhang H. Gene Expression Value Prediction Based on XGBoost Algorithm. Front Genet. 2019;10: 1–7. doi: 10.3389/fgene.2019.01077 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283452.ref041] 41.Luo J, Zhang Z, Fu Y, Rao F. Time series prediction of COVID-19 transmission in America using LSTM and XGBoost algorithms. Results Phys. 2021;27: 104462. doi: 10.1016/j.rinp.2021.104462 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283452.ref042] 42.Paliari I, Karanikola A, Kotsiantis S. A comparison of the optimized LSTM, XGBOOST and ARIMA in Time Series forecasting. IISA 2021 - 12th Int Conf Information, Intell Syst Appl. 2021. doi: 10.1109/IISA52424.2021.9555520 [DOI] [Google Scholar]

[pone.0283452.ref043] 43.Prajapati S, Swaraj A, Lalwani R, Narwal A, Verma K, Singh G. Comparison of Traditional and Hybrid Time Series Models for Forecasting COVID-19 Cases. 2019;8. [Google Scholar]

[pone.0283452.ref044] 44.RStudio: Integrated Development Environment for R RStudio Team. In: RStudio, PBC, Boston, MA (2022) [Internet]. [cited 18 Dec 2022]. Available: https://www.rstudio.com/ [Google Scholar]

[pone.0283452.ref045] 45.Hyndman RJ, Khandakar Y. Automatic Time Series Forecasting: The forecast Package for R. J Stat Softw. 2008;27: 1–22. doi: 10.18637/JSS.V027.I03 [DOI] [Google Scholar]

[pone.0283452.ref046] 46.Sakia RM. The Box-Cox Transformation Technique: A Review. Stat. 1992;41: 169. doi: 10.2307/2348250 [DOI] [Google Scholar]

[pone.0283452.ref047] 47.Curran-Everett D. Explorations in statistics: The log transformation. Adv Physiol Educ. 2018;42: 343–347. doi: 10.1152/advan.00018.2018 [DOI] [PubMed] [Google Scholar]

[pone.0283452.ref048] 48.Bangladesh - Climatology | Climate Change Knowledge Portal. [cited 13 Dec 2022]. Available: https://climateknowledgeportal.worldbank.org/country/bangladesh/climate-data-historical [Google Scholar]

[pone.0283452.ref049] 49.Climate of the World: Bangladesh | weatheronline.co.uk. [cited 18 Dec 2022]. Available: https://www.weatheronline.co.uk/reports/climate/Bangladesh.htm [Google Scholar]

[pone.0283452.ref050] 50.Stuecker MF, Tigchelaar M, Kantar MB. Climate variability impacts on rice production in the Philippines. PLoS One. 2018;13. doi: 10.1371/journal.pone.0201426 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283452.ref051] 51.Pickson RB, He G, Boateng E. Impacts of climate change on rice production: evidence from 30 Chinese provinces. Environ Dev Sustain 2021 243. 2021;24: 3907–3925. doi: 10.1007/S10668-021-01594-8 [DOI] [Google Scholar]

[pone.0283452.ref052] 52.Mahmood N, Ahmad B, Hassan S, Bakhsh K. Impact of temperature ADN precipitation on rice productivity in rice-wheat cropping system of Punjab province. J Anim Plant Sci. 2012;22: 993–997. [Google Scholar]

[pone.0283452.ref053] 53.Reddy PCS, Sureshbabu A. An Applied Time Series Forecasting Model for Yield Prediction of Agricultural Crop. Adv Intell Syst Comput. 2020;1118: 177–187. doi: 10.1007/978-981-15-2475-2_16/COVER/ [DOI] [Google Scholar]

[pone.0283452.ref054] 54.Kim J, Lee J, Sang W, Shin P, Cho H, Seo M. Random Forest를 이용한 남한지역 쌀 수량 예측 연구 Rice yield prediction in South Korea by using random forest. 2019;21: 75–84. doi: 10.5532/KJAFM.2019.21.2.75 [DOI] [Google Scholar]

[pone.0283452.ref055] 55.Choudhary K, Shi W, Dong Y, Paringer R. Random Forest for rice yield mapping and prediction using Sentinel-2 data with Google Earth Engine. Adv Sp Res. 2022;70: 2443–2457. doi: 10.1016/J.ASR.2022.06.073 [DOI] [Google Scholar]

[pone.0283452.ref056] 56.Fegade TK, Pawar B V. Crop Prediction Using Artificial Neural Network and Support Vector Machine. Adv Intell Syst Comput. 2020;1016: 311–324. doi: 10.1007/978-981-13-9364-8_23/COVER [DOI] [Google Scholar]

[pone.0283452.ref057] 57.Gandhi N, Petkar O, Armstrong LJ, Tripathy AK. Rice crop yield prediction in India using support vector machines. 2016 13th Int Jt Conf Comput Sci Softw Eng JCSSE 2016. 2016. doi: 10.1109/JCSSE.2016.7748856 [DOI] [Google Scholar]

[pone.0283452.ref058] 58.Wu W, Guo J, An S, Guan P, Ren Y, Xia L, et al. Comparison of two hybrid models for forecasting the incidence of hemorrhagic fever with renal syndrome in Jiangsu Province, China. PLoS One. 2015;10: 1–13. doi: 10.1371/journal.pone.0135492 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A tree based eXtreme Gradient Boosting (XGBoost) machine learning model to forecast the annual rice production in Bangladesh

Mst Noorunnahar

Arman Hossain Chowdhury

Farhana Arefeen Mila

Roles

Abstract

Introduction

Fig 1. Theoretical framework for the study.

Materials and methods

Data source

ARIMA model

XGBoost model

Evaluation parameter of models

Statistical analyses

Results

Fig 2. Boxplot of the annual rice production data in Bangladesh from 1961 to 2020.

Fig 3. A time series plot for rice production in Bangladesh from 1961 to 2020.

Fig 4. A comparison between the Box-Cox transformed sequence and the original sequence of annual rice production in Bangladesh.

Fig 5. First-order differencing of the rice production of the training data set shows stationarity.

Fig 6. The ACF and PACF diagram of rice production in Bangladesh after first order differencing.

Table 1. Estimated parameters of the ARIMA (0,1,1) with drift model.

Fig 7. A time series plot of the residuals with corresponding ACF diagram, and a histogram for the ARIMA (0,1,1) model with drift.

Fig 8. Important characteristic features of the XGBoost model.

Fig 9. ARIMA and XGBoost model show the actual, fitted and forecasted data for rice production in Bangladesh.

Model comparison

Table 2. Evaluation of parameters for the ARIMA and XGBoost model for rice production in Bangladesh.

Fig 10. Ten years’ prediction of annual rice production in Bangladesh using XGBoost model.

Discussion

Limitations

Conclusion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Sathishkumar V E

Roles

Author response to Decision Letter 0

Decision Letter 1

Sathishkumar V E

Roles

Acceptance letter

Sathishkumar V E

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases