Abstract
This study uses three distinct models to analyse a univariate time series of data: Holt's exponential smoothing model, the autoregressive integrated moving average (ARIMA) model, and the neural network autoregression (NNAR) model. The effectiveness of each model is assessed using in-sample forecasts and accuracy metrics, including mean absolute percentage error, mean absolute square error, and root mean square log error. The area under cultivation in India for the following 5 years is predicted using the model whose fitted values are most like the observed values. This is determined by performing a residual analysis. The time series data used for the study was initially found to be non-stationary. It is then transformed into stationary data using differencing before the models can be used for analysis and prediction.
Keywords: Time series analysis, Univariate analysis, ARIMA, Holt's exponential smoothing, NNAR models
Introduction
Rice is one of the leading food crops in India. India comes in behind China as the world's second-largest producer and consumer of rice. It ranks first among all countries regarding the area under rice cultivation. More than half of the nation’s population consumes rice, and the kharif crop is the staple food in eastern and southern India. In 2021, 129.66 million tons of rice were produced on 464 lakh hectares of land, which is higher than any other food crop. India also is the world’s largest grain exporter, with 150 countries depending on India for their rice needs. 21.5 million tons of rice was exported from India alone in 2021, more than that exported by the following four largest rice-exporting countries. To continue being a significant player in the rice market, India needs to maintain the quantity of land under cultivation for rice. It is essential to ensure the country has enough acreage to satisfy the demands of the growing domestic population and other countries. Time series analysis analyses a set of observations, each being recorded at a specific time [1].
Related Work
Based on data from 1956 to 1994, D. Balanagammal et al. (2000) utilised the ARIMA model to forecast the cultivable area, production, and productivity of several crops in the Indian state of Tamil Nadu for the following 5 years [2]. In 2014 Rahul Tripathi et al. attempted to predict rice productivity and production in Odisha using the ARIMA model [3]. Then in 2014, Zahra N. et al. attempted to analyse the trends in rice area and yield in Punjab, Pakistan using the linear model, quadratic model and the exponential model [4]. Celik et al. (2017) aimed to predict the groundnut production in Turkey for 15 years by studying six different ARIMA models [5]. In the same year, Karadas et al. used three exponential smoothing methods including Holt, Brown and Damped Trend to predict the production of oil seed crops in Turkey. Holt’s exponential smoothing was found to be the most accurate for forecasting [6]. M. Hemavathi et al. (2018) used the ARIMA model to a time series data to forecast the area, production and productivity of rice in Thanjavur, Tamil Nadu [7]. Shastri et al. (2018) have written a paper that examines the use of Exponential smoothing method in predicting variables belonging to a time series data [8].
Later, we could see that machine learning and deep learning techniques came into the picture and Aashiq Reza and Tanmoy Debnath (2020) compared the ARIMA and NNAR models for the prediction of prices of wheat and rice. In their study, they found that both models performed well in forecasting. However, the NNAR model was slightly more accurate in predicting the prices.
Senthamarai Kannan. K et al. (2020) used the ARIMA model to predict the paddy production in four south Indian states. An appropriate ARIMA model is chosen for each of the states based on their accuracy [9]. Milton Soto-Ferrari et al. (2020) have used State Space alongside Neural Networks (NN) and ARIMA to predict values in a time series. ARIMA and NN models performed well and ARIMA gave the most accurate results for all patterns of time series data [10]. Bhardwaj et al. (2020) employed the NNAR model and classical time series approaches such as double moving average method and exponential smoothing to forecast the rice yield in Karnal, Haryana. Their paper found the NNAR model to be the most suitable [11].
Mgale, Y. J., Yan, Y., and Timothy, S. (2021), in their paper, have tried to forecast the prices of rice in Tanzania using the ARAAIMA and Holt's Exponential Smoothing models. They compared the two models to find the most accurate model for predicting rice prices. The predictions made by employing Holt-Winters Exponential Smoothing Model were more accurate [12]. Abotaleb et al. (2021) applied the BATS and TBATS models, Holt's Linear Trend, NNAR model and ARIMA model to forecast rice production in SAARC nations [13] and Iran. M Miller et al. (2021) attempted to predict the returns on ten cryptocurrencies using methods including recurrent neural networks, deep learning neural networks, Holt’s exponential smoothing, ARIMA, ForecastX, and long short-term memory networks [14]. Rguibi et al. (2022), in their paper, have used the ARIMA and Long short-term memory (LSTM) models to forecast the spread of COVID-19 in Morocco for the next 2 months based on time series data about the disease [15].
Research Gap
Based on the literature review, it is observed that not many attempts have been made to forecast the area under rice cultivation in India. Hence, in our study, we have attempted to analyse and predict the variable above based on its time series data.
This article explores the univariate time series data of the area under cultivation of rice in India for the past 72 years. Univariate time series data records a single variable over some time at equal intervals. Univariate analysis is carried out when there is only one variable involved. A time series analysis is used to study a variable concerning time. A univariate time series analysis refers to analysing and forecasting a variable based on its past values and error terms.
The statistical methods of Holt Exponential Smoothing and ARIMA and NNAR models are applied to the data to predict the area under cultivation of rice for the next 5 years. Initially, the models are used to predict values for the in-sample data of the last 4 years. The predicted values are then compared with the actual observations to determine the accuracy of each model. Based on the accuracy of these predictions, the best model is selected to forecast the target variable for future years.
Data Description
The dataset consists of yearly observations of the area under rice cultivation in India from 1950 to 2021. The dataset was obtained from the Reserve Bank of India (RBI) website. The source is the Ministry of Agriculture & Farmers Welfare, Government of India.
Methodology
The study starts with illustrating a univariate technique to demonstrate the time series component. Figure 1 provides a time series plot of the variable.
Fig. 1.
Time series plot
As shown in Fig. 1, the plot appears to be non-stationary. However, it is vital to examine whether the data are non-stationary or not using appropriate statistical techniques. The Dickey-Fuller test, which statisticians David Dickey and Wayne Fuller created in 1979, has been expanded upon with the creation of the Augmented Dickey–Fuller test (ADF test). It is used to determine whether a particular time series data are stationary at the unit root or not.
The null (H0) and alternative (H1) hypotheses are:
H 0 : The data is non-stationary at the unit root.
H 1 : The data is stationary.
As per the hypothesis, to establish non-stationarity, a p-value greater than 0.05 needs to be obtained. The estimated p-value of 0.545 exceeds the significance level cutoff of α = 0.05; the null hypothesis H0 cannot be ruled out. This means that the data are non-stationary at this stage. Hence, the difference method is employed to make the data stationary to eliminate the trend in the absence of seasonality. In the first step, first-order differencing is performed; if the data are not stationary, second-order differencing is performed until the data becomes stationary. The below equation mentions first-order differencing.
| 1 |
After differencing the data once, the p-value is calculated to be 0.01. Hence, we can conclude from Fig. 2 that the data is now stationary.
Fig. 2.
Stationary time series plot
In the next phase of the analysis, the prediction of the area under cultivation of rice in India is obtained by considering the traditional statistical methods like Holt’s Exponential Smoothing and Autoregressive Integrated Moving Average (ARIMA) and using the deep learning technique Neural network auto regression (NNAR) model.
Holt’s Exponential Smoothing helps in the prediction of data that has a trend component. This method uses exponentially weighted moving averages to smooth the values in the time series data. This allows for better forecasting of the target value. It is also referred to as double exponential smoothing as it has two parameters: level and trend. From Fig. 1, the trend component is present in this data; thus, we perform an in-sample forecast for Holt exponential smoothing. The time series observations with trend mt and error component et can be represented as:
| 2 |
ARIMA is another statistical method that can predict the target variable based on the given time series data. It has three components, namely Autoregression (AR), Integrated (I) and Moving Average (MA). Autoregression forecasts by regressing the variable on itself, i.e., it assumes that the future values are correlated to past values in the time series. The Integrated concept involves differencing the time series data till it becomes stationary. The number of times differencing is done on the data is known as the degree of differencing. The MA model, unlike the AR model, which uses past values of the variable, uses errors from past forecasts to predict future values. The ARIMA model can be expressed as (p, d, q) where p is the number of autoregressive terms, d is the degree of differencing, and q is the number of forecast errors [1]. The general form for the ARIMA model with these parameters is defined as:
| 3 |
Neural Network Autoregression (NNAR) is a model that uses lagged values in the time series data as input for prediction. NNAR (p, k) denotes that the hidden layer has k nodes and p-lagged inputs. It is a deep learning model that simulates the human brain’s neural network [14]. Unlike the earlier two models, which assume that the time series data involves only a linear relationship, the NNAR model can study non-linear relationships between the predictor and predictand.
The accuracy measures are obtained based on the above-discussed models and in-sample forecast. The accuracy measures used in this study are Root mean Square Logarithmic Error (RMSLE), Mean Absolute Square Error (MASE), and Mean Absolute Percentage Error (MAPE). The formulas for measures of accuracies are:
| 4 |
| 5 |
| 6 |
and here n is the given number of time periods, Gt is the original value for time t, Mt is the predicted value for time period t, and et is the prediction error equal to (Gt-Mt).
Next, the best model is chosen based on these accuracy metrics and model assumptions. The extent of land planted with rice in India during the next five years is forecast using the most accurate model.
Analysis
The traditional statistical models—Holt’s Exponential Smoothing and ARIMA are used for predictions. Another deep learning model, the NNAR model, is also used. All three models permit the forecasting of a variable that is in a time series.
Then, the Augmented Dickey–Fuller Test (ADF) test is applied to determine whether the time series data are stationary. Since the calculated p-value is 0.545, which is greater than the threshold significance level α = 0.05, the null hypothesis (H0) cannot be rejected. Thus, the data are found to be non-stationary. This can also be seen in Fig. 3, where the first Autocorrelation Function (ACF) plot shows the correlation of the variable with a lagged version of itself. Lag refers to the values in the data but at an earlier time. Since the first lag is 1, the ACF plots the correlation between the time series data with itself. Hence, the correlation, as in the plot, is 1. The blue dotted lines represent the confidence interval.
Fig. 3.

ACF plot before differencing
To eliminate the non-stationarity, differencing is performed on the time series data. The following ACF plot (Fig. 4) has been plotted after the data goes through first-order differencing. At this stage, the data have been transformed to become stationary. This will make the predictions more accurate and reliable.
Fig. 4.

ACF plot after differencing
The Partial Autocorrelation Function (PACF) plot (Fig. 5) depicts the partial correlation between time series data and its lagged version. Partial correlation is the correlation between two variables when the influence of other predictor variables is eliminated from the relationship. This ensures that the correlation is not spurious as the effect of variance of other variables is removed.
Fig. 5.

PACF plot
Now that the data have been made stationary, the next step is to use the models to predict in-sample observations, which is mentioned in Table 1. This will help determine the correctness of each model in predicting the target variable.
Table 1.
In-sample Forecast of area under cultivation of rice in India
| Actual values | ARIMA | Holt exponential smoothing | NNAR |
|---|---|---|---|
| 464 | 448.5960 | 440.5980 | 435.0326 |
| 458 | 446.5456 | 440.0890 | 433.1187 |
| 437 | 444.4952 | 437.5800 | 431.8492 |
| 442 | 442.448 | 439.0709 | 430.9917 |
The residual analysis is performed to assess the adequacy of the fitted model’s Residual terms refer to the difference between the observed and the fitted or predicted values. The main two assumptions carried by the residual analysis are:
Residuals are uncorrelated
Residuals are normally distributed
If the correlation between the residuals is not significant, the model is said to be a good fit. The p-value is calculated, and if it is more significant than 0.05, the residuals are said to be uncorrelated. The hypothesis are:
H 0 : residuals are uncorrelated.
H 1 : residuals are correlated.
We also check for normality, i.e., whether the residuals are normally distributed or not, as it is one of the assumptions of a linear model. The hypothesis are:
H 0 : residuals are normally distributed.
H 1 : residuals are not normally distributed.
The null hypothesis is rejected if the p-value is less than the significance level of 0.05. In this study, the Box–Ljung test is employed for residual analysis and the Shapiro–Wilk normality test is used for normality.
First, Holt’s exponential smoothing model is applied to predict the area under cultivation of rice in India for the years 2018 to 2021. These predictions will be based on the data from the previous 68 years, i.e., from 1950 to 2017. In Fig. 6, the fitted values and observed values are plotted in the same graph. When testing for normality, the residuals are distributed normally, and the alternative hypothesis is that they are not distributed normally.
Fig. 6.
Residual plot fitted and ACF plot of area under cultivation using Holt's exponential smoothing
From Fig. 6. it is observed that in Holt’s Exponential Smoothing model, the p-value for residual analysis is obtained as 0.8509, which is greater than 0.05. Hence, the residuals are uncorrelated. The p-value in the case of normality is calculated to be 0.054, which is greater than 0.05. Therefore, the null hypothesis that the residuals are normally distributed is accepted.
Similarly, for ARIMA also, we perform an in-sample forecast, a residual analysis, and a normality test on the data. The p-value calculated for the Box–Ljung test is 0.9709. As it is more significant than 0.05, the residuals are uncorrelated. The p-value for the Shapiro–Wilk normality test is calculated to be 0.051. As this value is more significant than 0.05, the null hypothesis is accepted, and the residuals are deemed to be a normal distribution (Fig. 7).
Fig. 7.
Residual plot ACF plot and Normality plot of area under cultivation Using ARIMA
The data is also tested for correlation among residuals and normality under the NNAR model, which can be observed in Fig. 8 and Table 2. The p-value in the Box–Ljung test is 0.01932, which is less than 0.05. Hence, the residuals are correlated. The p-value in the case of the Shapiro–Wilk normality test is 0.05136. Since it is more significant than 0.05, the null hypothesis is rejected.
Fig. 8.
Residual plot ACF plot and Normality plot of area under cultivation price using NNAR
Table 2.
Residual analysis
| Models | Box–ljung test | Result | Shapiro–wilk normality test | Result |
|---|---|---|---|---|
| Holt’s Exponential Smoothing | 0.8509 | Uncorrelated | 0.054 | Normally Distributed |
| ARIMA | 0.9709 | Uncorrelated | 0.051 | Normally Distributed |
| NNAR | 0.01932 | Correlated | 0.05136 | Normally Distributed |
Thus based on the assumptions made about errors the suitability of the models for forecasting was determined. In-sample forecasting was then performed, and the accuracy measures were obtained in order to identify the best model. As seen in Table 3, the ARIMA model had the least errors and was thus chosen for future prediction.
Table 3.
Measures of accuracy
| Models | RMSLE | MAPE | MASE |
|---|---|---|---|
| ARIMA | 0.0227 | 0.0190 | 0.8155 |
| Holt’s exponential smoothing | 0.0328 | 0.0255 | 1.0973 |
| NNAR | 0.0447 | 0.03836 | 1.640 |
Result and discussion
From a univariate analysis of the data, ARIMA is the best model for predicting the future values of the time series. This concludes that the traditional time series model is more accurate in this area under cultivation of rice dataset than the deep learning technique. Using the ARIMA model, the area under rice cultivation in India for the next 5 years is predicted and observed in Fig. 9 and Table 4.
Fig. 9.

Forecasted area under rice cultivation in India
Table 4.
The forecasted area under rice cultivation in India using ARIMA Model
| Year | ARIMA model |
|---|---|
| 2022 | 459.7745 |
| 2023 | 461.9537 |
| 2024 | 464.1330 |
| 2025 | 466.3123 |
| 2026 | 468.4915 |
The ARIMA model that provides the most accurate forecasts is the (0,1,1) model. It is expected that the area under rice cultivation in the nation will increase over the next 5 years, following a similar trend since 1950.
Conclusion
This study was conducted to predict the area under cultivation of rice in India for the next 5 years by applying statistical models, such as Holt’s Exponential Smoothing and ARIMA. The predictions are based on the data collected about the variable from 1950 to 2021. The data are first plotted to illustrate the trend component. Then the Augmented Dickey–Fuller test is applied to check for non-stationarity. Hence, differencing is carried out. The data becomes stationary after the first differencing. Each method is first tested on in-sample data. After testing for accuracy, ARIMA is the best method to forecast the target variable. The accuracy measures employed for this purpose were RMSLE, MAPE and MASE. Lastly, the values for 2022–2026 of the area under cultivation of rice in India are forecasted using the ARIMA model.
Data Availability
The dataset analysed during the current study is available on the Reserve Bank of India website at [https://dbie.rbi.org.in/DBIE/dbie.rbi?site=statistics].
Declarations
Conflict of interest
All authors declare that they have no conflicts of interest. This paper received no funding for the research, authorship, and/or publication.
Footnotes
This article is part of the topical collection “Advances in Computational Intelligence for Artificial Intelligence, Machine Learning, Internet of Things and Data Analytics” guest edited by S. Meenakshi Sundaram, Young Lee and Gururaj K S.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Box GE, Jenkins GM, Reinsel GC, Ljung GM. Time series analysis: forecasting and control. John Wiley & Sons; 2015. [Google Scholar]
- 2.Balanagammal D, Ranganathan CR, Sundaresan R. Forecasting of agricultural scenario in Tamil Nadu a time series analysis. J Indian Soc Agricult Stat. 2000;53(3):273–286. [Google Scholar]
- 3.Tripathi R, Nayak AK, Raja R, Shahid M, Kumar A, Mohanty S, Panda BB, Lal B, Gautam P. Forecasting rice productivity and production of Odisha, India, using autoregressive integrated moving average models. Adv Agricult. 2014;2014:1–9. doi: 10.1155/2014/621313. [DOI] [Google Scholar]
- 4.Zahra N, Akmal N., Siddiqui S, Raza I, Habib N, Naheed S (2015) Trend analysis of rice area and yield in Punjab, Pakistan. Pakistan J. Agric. Res., Vol. 28(4).
- 5.Celik K, Eyduran Forecasting the production of groundnut in Turkey using ARIMA model. J Animal Plant Sci. 2017;27(3):920–928. [Google Scholar]
- 6.Karadas, Celik, Eyduran, Hopoglu (2017) Forecasting Production Of Some Oil Seed Crops In Turkey Using Exponential Smoothing Methods. J Anim Plant Sci. 27(5).
- 7.Hemavathi M, Prabakaran K (2018) ARIMA model for forecasting of area, production and productivity of rice and its growth status in Thanjavur District of Tamil Nadu, India. 10.20546/ijcmas.2018.702.019.
- 8.Shastri S., Sharma A., Mansotra V., Sharma A., Singh Bhadwal A., Kumari M. (2018). A study on exponential smoothing method for forecasting. Int J Comput Sci Eng.
- 9.Senthamarai Kannan K, Karuppasamy KM (2020) Forecasting for agricultural production using arima model—Palarch’s Journal of Archaeology of Egypt/Egyptology 18(7). ISSN 1567–214x
- 10.Soto-Ferrari M, Chams-Anturi O, Escorcia-Caballero JP (2020) A time-series forecasting performance comparison for neural networks with state space and ARIMA models.
- 11.Mgale YJ, Yan Y, Timothy S. A comparative study of ARIMA and holt-winters exponential smoothing models for rice price forecasting in Tanzania. OALib. 2021;08(05):1–9. doi: 10.4236/oalib.1107381. [DOI] [Google Scholar]
- 12.Abotaleb M, Ray S, Mishra P, Karakaya K, Shoko C, Al Khatib A, Ray M, Fernando W, Lounis M, Balloo R (2021) Modelling and forecasting of rice production in south Asian countries. Ama, Agricultural Mechanization in Asia, Africa & Latin America. 51:1611-1627
- 13.Mohamed AR, Moussa N, Madani A, Aaroud A, Zine-dine K (2022). Forecasting Covid-19 Transmission with ARIMA and LSTM Techniques in Morocco. SN Computer Science (2022) 3:133 10.1007/s42979-022-01019-x [DOI] [PMC free article] [PubMed]
- 14.Forecasting: Principles and practice (2nd ed). 11.3 Neural network models. (n.d.). Retrieved October 13, 2022, from https://otexts.com/fpp2/nnetar.html
- 15.Biswal, A. (2022, July 12). Time Series forecasting in R: Step-by-step guide with examples [updated].Simplilearn.com. Retrieved October 20, 2022, from https://www.simplilearn.com/tutorials/data-science-tutorial/time-series-forecasting-in-r
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The dataset analysed during the current study is available on the Reserve Bank of India website at [https://dbie.rbi.org.in/DBIE/dbie.rbi?site=statistics].





