Abstract
In this study, stochastic gradient boosting (SGB), a commonly-adopted soft computing method, was used to estimate reference evapotranspiration (ETo) for the Adiyaman region of southeastern Türkiye. The FAO-56-Penman-Monteith method was used to calculate ETo, which we then estimated using SGB with maximum temperature, minimum temperature, relative humidity, wind speed, and solar radiation obtained from a meteorological station.
-
•
The calculated ETo time series values were decomposed into sub-series using Singular Spectrum Analysis (SSA) to enhance prediction accuracy.
-
•
Each sub-series was trained with the first 70% of observations and tested with the remaining 30% via SGB. Final prediction values were obtained by collecting all series predictions.
-
•
Three lag times were taken into account during the predictions, and both short-term and long-term ETo values were estimated using the proposed framework. The results were tested with respect to root mean square error (RMSE) and Nash-Sutcliffe efficiency (NSE) indicators for ensuring whether the model produced statically acceptable outcomes.
Keywords: Reference evapotranspiration, Estimation, Singular spectrum analysis, Stochastic gradient boosting
Method name: Singular Spectrum Analysis and Gradient Boosting Machine
Graphical abstract
Specifications table
| Subject area: | Engineering |
| More specific subject area: | Signal processing and machine learning applications in hydrology. |
| Name of your method: | Singular Spectrum Analysis and Gradient Boosting Machine |
| Name and reference of original method: | Golyandina N., Zhigljavsky A., 2013. Singular Spectrum Analysis for Time Series, Springer. https://doi.org/10.1007/978–3–642–34913–3 Friedman, J., 2001. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29 (5), 1189–1232. https://doi.org/10.1214/aos/1013203451. |
| Resource availability: | R software |
Introduction
Evapotranspiration (ET) – the combination of transpiration from plants and evaporation from non-stomatal surfaces – is a critical term in the hydrologic cycle [1]. ET plays a crucial role in the management of water resources, including agricultural planning and modeling of hydrologic processes [2]. There are many measurement techniques including lysimeters [3], eddy covariance [4], and Bowen ratio energy balance methods [5], that are used to determine ET. However, these techniques are site-specific, expensive, and require continuous calibration. In recent years, various models have been developed to estimate the reference ET (ETo), which refers to the ET of a reference crop under well-watered conditions for the purpose of estimating how much ET could occur under ideal conditions. The FAO-56-Penman-Monteith method [6] is the most widely used of these methods, but accurately estimating ETo across space and time is still a challenge due to the complex nonlinear relationships between evapotranspiration as well as the factors that control ETo [7].
In recent years, many prediction models have been developed to estimate ETo using classical linear models [8], artificial neural networks [9], fuzzy logic [1], support vector machines [10], multivariate adaptive regression splines [11], and random forests [12]. Incorporating signal processing methods into the prediction rationale tends to enhance the accuracies in ETo predictions. For instance, Kisi and Alizamir [13] used the wavelet transform to divide the ETo series into its sub-bands at different frequencies and estimated ETo values using the hybrid wavelet artificial neural network (WANN). They also benchmarked the WANN model with the results of the standalone extreme learning machine (ELM) in order to highlight the superiority of the WANN model. In addition, Lu et al. [14] hybridized the backpropagation neural network model (BPNN) with the empirical mode decomposition (EMD) and variational mode decomposition (VMD) methods to simulate ETo series. They also compared the attained results with the outcomes of two widely embraced ML algorithms, namely support vector regression (SVR) and gradient boosting regression tree (GBRT). The VMD-BPNN model had a prediction performance of 0.405 mm/day RMSE, while the SVR and GBRT single models were 1.119 mm/day and 1.132 mm/day, respectively. In addition, Zheng et al. [15] proposed an approach to predict 1-day ahead ETo values. They utilized a multivariate variational mode decomposition (MVMD) technique along with a soft feature filter (SSF) method in the pre-processing stage to decompose the original time series into its sub-series. For the predictions, the authors used Gated Recurrent Units (GRUs) due to its promising performance in capturing long-term dependencies within time series data and benchmarked the proposed hybrid model with standalone models. Their results showed that hybrid MVMD-SoFeFilterGRU model outperformed them in terms of prediction accuracy. Similarly, Araghi et al. [16] aimed to predict the daily evapotranspiration in three distinct climates in Iran using three different machine learning techniques, namely ANN, Adaptive neuro-fuzzy inference system (ANFIS), and multiple linear regression (MLR). The authors applied both standalone machine learning techniques and their hybridized variants with the integration of discrete wavelet transform (DWT). The results pointed out that the models developed using the DWT had better prediction performance than the models developed without a pre-processing technique. In another study, Kang et al. [17] investigated the effects of different preprocessing methods on the prediction of daily ETo values using four meteorological variables, including maximum temperature, minimum temperature, sunshine duration, and relative humidity. They further decomposed them into subseries using the VMD method by normalizing sub-series with the Box-Cox (BC) transformation method and compared the performance of different models, such as the standalone SVM model, the hybrid VMD-SVM model, the hybrid BC-SVM model, and the hybrid VMD-BC-SVM model. According to the results obtained, the hybrid VMD-BC-SVM model performed the best in predicting daily ETo values.
From these results, it is clear that signal processing and machine learning methods can improve ETo predictions. Here, this study combined both signal processing (i.e., singular spectrum analysis (SSA)) and machine learning technique (i.e., stochastic gradient boosting (SGB)) techniques to predict the ETo values. The ETo values were decomposed into different sub-bands having divergent frequency and time scales using the novel SSA approach. Then, the SGB model was adopted to perform the predictions. It is worth noting that the present research is unique as it combines the SSA method with SGB methods to predict ETo values for the first time (to our understanding) in ET estimation literature.
Study area and data
In this study, the meteorological parameters observed in the Adiyaman Province in southeastern Anatolia, Turkey were used to calculate the FAO-56-PM ETo values (Fig. 1). According to the Macro Climate Region Map of Turkey, Adiyaman is located in a continental climate zone with very hot and dry summers and very cold winters. The average temperature is 17 °C, total annual precipitation is 935 mm, and the average relative humidity is 49% [18]. For the prediction model, daily maximum temperature, minimum temperature, wind speed, relative humidity, sunshine duration and solar radiation values measured between 2009 and 2021 in Adiyaman meteorological observation station were used. The descriptive statistics of the meteorological variables used in the study are given in Table 1.
Fig. 1.
Study area.
Table 1.
Descriptive statistics of meteorological variables.
| (%) | (m/s) | (hour) | (MJ/m2/day) | |||
|---|---|---|---|---|---|---|
| Maximum | 43.8 | 29.8 | 98.8 | 6.7 | 14 | 29.33 |
| Minimum | −0.5 | −6.3 | 9.4 | 0.5 | 0 | 0 |
| Average | 25.9 | 12.33 | 49.87 | 1.53 | 6.93 | 15.79 |
| Standard Deviation | 11.59 | 8.06 | 19.15 | 0.85 | 3.59 | 7.42 |
| Skewness | −0.022 | −0.016 | −0.510 | 2.349 | −0.652 | −0.244 |
FAO-56 Penman-Monteith reference evapotranspiration model
The FAO-56 Penman-Monteith model is recognized as the most accurate empirical approach for calculating reference ETo in the absence of ETo measurements through lysimeters or other measurement techniques [6]. The Penman-Monteith FAO-56 (ETo) is given as:
| (1) |
where ETo is the reference crop evapotranspiration (mm d−1), is the slope of saturation vapor pressure versus air temperature curve (kPa ), Rn is the net solar radiation at the crop surface (MJ m−2 d−1), G is the soil heat flux (MJ m−2 d−1), is the psychrometric constant (kPa ), T is the mean air temperature at 2 m height (°C), u2 is the wind speed at 2 m height (m s−1), es is the saturation vapor pressure (kPa), and ea is the actual vapor pressure (kPa).
Fig. 2 illustrates the time series of the calculated ETo values. To train the proposed model, this study used 70% of the data, whereas the remaining 30% (called testing set) was used for validating the accuracy of the predictions. The distribution of the training and testing parts of the time series is shown in Fig. 2. The minimum, average, maximum ETo values in the corresponding time series are calculated as to be 0.261 mm/day, 3.46 mm/day, and 7.63 mm/day, respectively, while the standard deviation is 1.73 and the skewness coefficient is 0.199 mm/day.
Fig. 2.
FAO-56-PM ETo time series starting from January 1, 2009.
Method details
Singular spectrum analysis
Singular Spectrum Analysis (SSA) is a non-parametric statistical signal processing method used to analyze and decompose multivariate time series data. It is used to extract the intrinsic components of a time series, such as trend, seasonality, and noise, by singular value decomposition (SVD) of the data matrix. The main steps in SSA are:
-
1.Embedding: The first step in SSA is to embed the time series data into a 2-dimensional matrix. This is done by constructing a time-lagged matrix of the original time series, where each column is a time-lagged version of the original time series. For instance, if the original time series is of length N, the embedding matrix would have N-m + 1 rows and m columns, where m is the embedding dimension (or the number of lags). This embedding matrix captures the temporal dependencies within the time series data. is represented as the L-dimensional series, . The can be generated by converting into a trajectory matrix as Eq. (2) [19].
(2)
where, and all elements along the diagonals of the Y are equal.
-
2.Singular Value Decomposition (SVD): SVD is a matrix factorization technique that decomposes a matrix into its orthogonal singular vectors and singular values. The singular vectors and values provide information about the structure and importance of the underlying components of the matrix. Then the matrix Y is rewritten as follows:
(3) (4)
where U and V are the left eigenvector and right eigenvector of the matrix , respectively.
-
3.Grouping of Singular Vectors: After obtaining the singular vectors and values from SVD, the singular vectors are grouped into several components that describe the intrinsic structure of the time series data. The number of components depends on the number of singular values, and the objective is to identify the components that capture the important structures in the time series data. This is usually done by choosing a certain number of largest singular values (corresponding to the largest singular vectors) to represent the underlying components. Then the matrix Y is rewritten as Eq. (5):
(5)
where m [1,d], i varies from to m, and { are the useful components.
-
4.
Reconstruction: In the last step of SSA implementation, the original time series is reconstructed using a combination of the selected components. This stage is performed by multiplying the selected singular vectors with the corresponding singular values and summing the resulting products. The reconstructed time series provides a representation of the original time series that captures the important structures and patterns in the data.
SSA is a flexible and powerful method for analyzing time series data, as it can handle non-stationary data and missing data, and can extract underlying structures in the data [20]. Additionally, it can be used to perform various tasks such as time series forecasting [21], de-trend analysis [22], and anomaly detection [23], making this technique a valuable tool for many applications.
Stochastic gradient boosting
Stochastic Gradient Boosting (SGB) is a machine learning algorithm that operates by combining multiple weak decision tree models to create a stronger, more accurate predictive model proposed by Friedman [24]. The algorithm utilizes an iterative optimization process to continually improve the accuracy of the model. The optimization process begins by training a simple base decision tree model on the data. The model is then used to make predictions on the data, and the prediction errors are calculated. Subsequent decision tree models are then trained on the prediction errors made by the previous model, with each model focusing on minimizing the prediction errors. This process iteratively continues until a stopping criterion is met, such as a pre-defined number of iterations or when the improvement in accuracy reaches a specified threshold. The final SGB model is created by combining the predictions of all the decision tree models in the sequence using a weighted combination, where the weights are determined based on the performance of each individual decision tree model. The result is a single, more accurate model that is capable of making accurate predictions on new data. One of the key advantages of the SGB algorithm is its ability to identify important features in the data [25]. The iterative optimization process allows the algorithm to continually refine its understanding of the relationships between the features and the target variable, leading to a final model that accurately captures the key drivers of the target variable. Additionally, SGB has been shown to be effective in preventing overfitting since the iterative optimization process helps to control the complexity of the model. Another advantage of SGB is its ability to model non-linear relationships between data points [26]. The algorithm can handle complex systems by combining multiple decision tree models, each of which is capable of modeling non-linear relationships. This makes SGB a valuable tool for a variety of predictive modeling tasks, especially when the relationship between the features and the target variable is complex.
Method implementation
The flowchart of the study is presented in Fig. 3. In this study, a machine learning framework comprising the hybridized implementation of SSA and SGB was utilized to predict the ETo time series. Before establishing the predictive model, the ETo time series was decomposed into sub-series using the SSA signal processing, which resulted in 8 sub-bands (Fig. 4). Upon examination of Fig. 4, the first band represents the approximations sub-band which reflects the long-term trend indication of multi-annual components, the second band represents the seasonal component, and the eighth band denotes the white noise. The other bands contain various detailed components of the time series.
Fig. 3.
Flowchart of the current research.
Fig. 4.
Components of FAO-56-PM time series that decomposition by SSA.
Each sub-series was predicted using the SGB method. During the model setup, 70% of the data was used for training and the performance of the model was tested with remaining 30% (Fig. 2). The flowchart of the present study containing all the stages to perform the predictions is given in Fig. 3. It is worth mentioning that the partial autocorrelation function (PACF) was used to determine the input variables. Thus, both short-term and long-term predictions covering 1, 7, 14, 30, 45, and 60-days ahead were carried out based on the inputs determined through the PACF analysis. It is also important to note that the hyperparameters of the SGB algorithm were determined using a well-known 10-fold cross-validation approach [27].
The root mean square error (RMSE) and Nash-Sutcliffe efficiency (NSE) [28] indicators were used to evaluate the accuracy of the predictions. RMSE is a commonly used performance measure in regression attempts. The smaller the RMSE values, the higher the predictive power of the model. In addition, the NSE is another performance index used to evaluate the accuracy of ML models and is commonly used in the hydrology domain. NSE can also be considered as the normalized version of the coefficient of determination (R²) and ranges between -∞ and 1. Here, a NSE of 1 indicates perfect agreement between the observed and predicted values, while values closer to 0 indicate worse performance. The equations regarding the utilized performance indicators are presented in Eqs. (5) and (6).
| (6) |
| (7) |
where is the actual ETo values, is the modeled ETo values, is the average of actual ETo values, and denotes the number of observations.
Methods validation
In this study, two different models were developed for the estimation of reference ET values. In the first model, only the SGB method was used (called ‘STD’ to confirm the differentiation), whereas the second model comprises the hybrid use of SSA and SGB. The predictions were made for six different lead times by utilizing 3 lag values defined as inputs. It is worth mentioning that the original time series was subjected to the partial autocorrelation function (PACF) analysis and the inputs were determined in this regard. Fig. 5 illustrates the PACF analysis results. As can be seen from the figure that the first three lag times had the highest correlation with the original time series, and therefore, we included t-2, t-1, and t as inputs in order to perform the predictions. In addition, the predictions were conducted for six different lead times under the umbrella of two horizons, i.e., short-term and long-term. In accordance with the study of Lu et al. [14], predictions made for 1, 3, 5 and 7 days ahead were regarded as short-term, whereas the others (i.e., 30, 45, and 60 days ahead) were referred to as long-term predictions.
Fig. 5.
Partial Autocorrelation Function Results.
The train and test prediction performance results of the models are provided in Table 2. Upon examining Table 2 with respect to the test set results, acceptable predictions were achieved for 1, 7, and 14 lead times (i.e., short-term) in the STD models with NSE values of 0.877, 0.791, and 0.762, respectively. Compared to the short-term predictions, lower performances were obtained regarding 30 (0.633), 45 (NSE:0.442) and 60 (NSE: 0.264) lead-time values in the long-term predictions. As expected, overall results illustrated that the highest performance was achieved for the 1-day ahead prediction, while the lowest accuracy was obtained for 60-day ahead predictions. For the training set, one can also conclude with similar outcomes as the model showed decreasing accuracy trend when the horizon is extended.
Table 2.
Model performance results.
| Model | Inputs | Outputs | Train |
Test |
||
|---|---|---|---|---|---|---|
| RMSE (mm/day) | NSE | RMSE (mm/day) | NSE | |||
| STD | t-2, t-1, t | t + 1 | 0.623 | 0.868 | 0.629 | 0.877 |
| STD | t-2, t-1, t | t + 7 | 0.817 | 0.773 | 0.816 | 0.791 |
| STD | t-2, t-1, t | t + 14 | 0.834 | 0.764 | 0.870 | 0.762 |
| STD | t-2, t-1, t | t + 30 | 1.023 | 0.647 | 1.078 | 0.633 |
| STD | t-2, t-1, t | t + 45 | 1.223 | 0.499 | 1.321 | 0.442 |
| STD | t-2, t-1, t | t + 60 | 1.389 | 0.356 | 1.511 | 0.264 |
| SSA | t-2, t-1, t | t + 1 | 0.576 | 0.887 | 0.510 | 0.919 |
| SSA | t-2, t-1, t | t + 7 | 0.611 | 0.873 | 0.608 | 0.884 |
| SSA | t-2, t-1, t | t + 14 | 0.675 | 0.845 | 0.660 | 0.852 |
| SSA | t-2, t-1, t | t + 30 | 0.828 | 0.769 | 0.859 | 0.767 |
| SSA | t-2, t-1, t | t + 45 | 0.987 | 0.674 | 1.020 | 0.667 |
| SSA | t-2, t-1, t | t + 60 | 1.094 | 0.601 | 1.143 | 0.579 |
STD= Standalone Stochastic Gradient Boosting model, SSA= Hybrid Singular Spectrum Analysis and Stochastic Gradient Boosting model, t=lag time 1, t-1= lag time 2, t-2= lag time 3, t + 1=lead time 1, t + 7=lead time 7, t + 14=lead time 14, t + 30=lead time 30, t + 45=lead time 45, t + 60=lead time 60, RMSE= Root Mean Square Error, NSE= Nash-Sutcliffe Efficiency.
Once the SSA analysis was incorporated into the predictions, significant performance improvements have been achieved for all lead time values. Short-term predictions demonstrated promising performance as the NSE values were obtained as 0.919, 0.884, and 0.852 for 1-day, 7-day, and 14-day lead times. Despite unsatisfactory performance of the STD model with regard to the long-term assessments, the proposed hybrid model yielded promising improvements. Such that, the NSE results for long-term lead time values (e.g. 30, 45 and 60 days) were calculated as 0.767, 0.667, and 0.579, respectively. The table shows a remarkable similarity between the achieved results for the training and testing sets, indicating that the proposed model is highly resistant to overfitting. This finding is particularly striking, as it confirms the model's robustness and reliability in generalizing to new data. Furthermore, scatter plots are shown in Fig. 6 to visually evaluate the prediction performance of the models. When examining the relationship between observed values and predicted values, it is expected that the best predictions will be scattered on the 1:1 line. Based on this graph, it can be observed that the distribution of the points generated using the SSA method are closer to the 1:1 line compared to those attained by the STD model, which is the clearest indication that the incorporation of the SSA approach into the SGB method improves the prediction performance. As with the statistical performance evaluation metrics, the scatter plots also indicate that the model's performance tends to decrease as the lead time increases as they have greater variations concerning the 1:1 perfect line and contain more underestimated and overestimated predictive outcomes. This observation underscores the importance of carefully considering the impact of temporal dependencies when modeling time series data. Moreover, while the standalone SGB method performed a relatively uniform prediction for all lead time values, the monotony in the predictions of low values was eliminated by the SSA-SGB method.
Fig. 6.
Models scatter plots.
Conclusion
In this study, the ETo values were estimated using the SGB, which is one of the robust tree-based ensemble machine learning methods. To acquire higher performance, the SGB further reinforced with the time series decomposition technique, called SSA. The original time series was decomposed into different frequency components in the study, and the SGB method was used for the predictions. Compared to the predictions performed with the stand-alone SGB method, the hybrid SSA-SGB method yielded more accurate outcomes. It is worth mentioning that these results provide insightful contributions regarding the determination of future ETo values in promoting the effective solutions regarding the water resource management. Hence, the current research provided a new perspective to the body of knowledge with improved prediction accuracies. Despite this study attempted the perform improved predictions, this study has still some limitations. For instance, the models were built using data from a single station, and hence, future research new models using data from a large number of stations over a larger area can be built in order to validate the applicability of the proposed hybrid prediction framework. In addition, to explore the utility of other techniques, different machine learning algorithms and signal processing techniques can be adopted to the predictions in the follow-up attempts.
CRediT authorship contribution statement
Eyyup Ensar Başakın: Conceptualization, Methodology, Software, Validation, Writing – original draft. Ömer Ekmekcioğlu: Writing – review & editing. Paul C. Stoy: Writing – review & editing. Mehmet Özger: Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
We would like to thank Meteorological General Institution for providing meteorological data.
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Data availability
The data that has been used is confidential.
References
- 1.Muhammad R., Mostafa R.R., Reza A., Islam T., Kisi O., Kuriqi A., Heddam S. Estimating reference evapotranspiration using hybrid adaptive fuzzy inferencing coupled with heuristic algorithms. Comput. Electron. Agric. 2021;191 doi: 10.1016/j.compag.2021.106541. [DOI] [Google Scholar]
- 2.Kim H., Chandrasekara S., Kwon H., Lima C., Kim T. A novel multi-scale parameter estimation approach to the Hargreaves-Samani equation for estimation of Penman-Monteith reference evapotranspiration. Agric. Water Manag. 2023;275 doi: 10.1016/j.agwat.2022.108038. [DOI] [Google Scholar]
- 3.Dong Y., Hansen H. Development and design of an affordable field scale weighing lysimeter using a microcontroller system. Smart Agric. Technol. 2023;4 doi: 10.1016/j.atech.2022.100147. [DOI] [Google Scholar]
- 4.Zhou X., Bi S., Yang Y., Tian F., Ren D. Comparison of ET estimations by the three-temperature model, SEBAL model and eddy covariance observations. J. Hydrol. 2014;519:769–776. doi: 10.1016/j.jhydrol.2014.08.004. [DOI] [Google Scholar]
- 5.Xiong Y., Chen X., Tang L., Wang H. Comparison of surface renewal and Bowen ratio derived evapotranspiration measurements in an arid vineyard. J. Hydrol. 2022;613 doi: 10.1016/j.jhydrol.2022.128474. [DOI] [Google Scholar]
- 6.R.G. Allen, L.S. Pereira, D. Raes, M. Smith, FAO irrigation and drainage paper No. 56 - Crop evapotranspiration, 1998.
- 7.Adarsh S., Sanah S., Murshida K.K., Nooramol P. Scale dependent prediction of reference evapotranspiration based on Multi-Variate Empirical mode decomposition. Ain Shams Eng. J. 2018;9:1839–1848. doi: 10.1016/j.asej.2016.10.014. [DOI] [Google Scholar]
- 8.Anda A., Soos G., Teixeira J.A., Kozma-bognar V. Agricultural and Forest Meteorology Regional evapotranspiration from a wetland in Central Europe, in a 16-year period without human intervention. Agric. For. Meteorol. 2015;205:60–72. doi: 10.1016/j.agrformet.2015.02.010. [DOI] [Google Scholar]
- 9.Borges L., França F., Alves R., Oliveira D. Estimation of reference evapotranspiration in Brazil with limited meteorological data using ANN and SVM – a new approach. J. Hydrol. 2019;572:556–570. doi: 10.1016/j.jhydrol.2019.03.028. [DOI] [Google Scholar]
- 10.Mehdizadeh S., Behmanesh J., Khalili K. Using MARS, SVM, GEP and empirical equations for estimation of monthly mean reference evapotranspiration. Comput. Electron. Agric. 2017;139:103–114. doi: 10.1016/j.compag.2017.05.002. [DOI] [Google Scholar]
- 11.L.B. Ferreira, Exploring machine learning and multi-task learning to estimate meteorological data and reference evapotranspiration across Brazil, 259 (2022) 0–1. doi: 10.1016/j.agwat.2021.107281. [DOI]
- 12.Wang S., Lian J., Peng Y., Hu B., Chen H. Generalized reference evapotranspiration models with limited climatic data based on random forest and gene expression programming in Guangxi, China. Agric. Water Manag. 2019;221:220–230. doi: 10.1016/j.agwat.2019.03.027. [DOI] [Google Scholar]
- 13.Kisi O., Alizamir M. Agricultural and Forest Meteorology Modelling reference evapotranspiration using a new wavelet conjunction heuristic method : wavelet extreme learning machine vs wavelet neural networks. Agric. For. Meteorol. 2018;263:41–48. doi: 10.1016/j.agrformet.2018.08.007. [DOI] [Google Scholar]
- 14.Lu Y., Li T., Hu H., Zeng X. Short-term prediction of reference crop evapotranspiration based on machine learning with different decomposition methods in arid areas of China. Agric. Water Manag. 2023;279 doi: 10.1016/j.agwat.2023.108175. [DOI] [Google Scholar]
- 15.Zheng Z., Ali M., Jamei M., Xiang Y., Karbasi M., Yaseen Z.M., Farooque A.A. Design data decomposition-based reference evapotranspiration forecasting model: a soft feature filter based deep learning driven approach. Eng. Appl. Artif. Intell. 2023;121 doi: 10.1016/j.engappai.2023.105984. [DOI] [Google Scholar]
- 16.Araghi A., Adamowski J., Martinez C.J. Comparison of wavelet-based hybrid models for the estimation of daily reference evapotranspiration in different climates. J. Water Clim. Chang. 2020;11:39–53. doi: 10.2166/wcc.2018.113. [DOI] [Google Scholar]
- 17.Kang Y., Chen P., Cheng X., Zhang S., Song S. Novel hybrid machine learning framework with decomposition–transformation and identification of key modes for estimating reference evapotranspiration. Agric. Water Manag. 2022;273 doi: 10.1016/j.agwat.2022.107882. [DOI] [Google Scholar]
- 18.Bozali N. Assessment of the soil protection function of forest ecosystems using GIS-based Multi-Criteria Decision Analysis : a case study in Ad ı yaman, Turkey. Glob. Ecol. Conserv. 2020;24:e01271. doi: 10.1016/j.gecco.2020.e01271. [DOI] [Google Scholar]
- 19.Mi X., Zhao S. Wind speed prediction based on singular spectrum analysis and neural network structural learning. Energy Convers. Manag. 2020;216 doi: 10.1016/j.enconman.2020.112956. [DOI] [Google Scholar]
- 20.Wang F., Shen Y., Chen Q., Wang W. Bridging the gap between GRACE and GRACE follow-on monthly gravity field solutions using improved multichannel singular spectrum analysis. J. Hydrol. 2021;594 doi: 10.1016/j.jhydrol.2021.125972. [DOI] [Google Scholar]
- 21.Sulandari W., Subanar S., Lee M.H., Rodrigues P.C. Time series forecasting using singular spectrum analysis, fuzzy systems and neural networks. MethodsX. 2020;7 doi: 10.1016/j.mex.2020.101015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wei N., Yin L., Li C., Wang W., Qiao W., Li C., Zeng F., Fu L. Short-term load forecasting using detrend singular spectrum fluctuation analysis. Energy. 2022;256 doi: 10.1016/j.energy.2022.124722. [DOI] [Google Scholar]
- 23.Guo J., Shi K., Liu X., Sun Y., Li W., Kong Q. Singular spectrum analysis of ionospheric anomalies preceding great earthquakes: case studies of Kaikoura and Fukushima earthquakes. J. Geodyn. 2019;124:1–13. doi: 10.1016/j.jog.2019.01.005. [DOI] [Google Scholar]
- 24.Friedman J.H. Greedy function approximation: a gradient boosting machine. Ann. Stat. 2001;29:1189–1232. doi: 10.1214/aos/1013203451. [DOI] [Google Scholar]
- 25.Pasha S.J., Mohamed E.S. Advanced hybrid ensemble gain ratio feature selection model using machine learning for enhanced disease risk prediction. Informatics Med. Unlocked. 2022;32 doi: 10.1016/j.imu.2022.101064. [DOI] [Google Scholar]
- 26.Forkuor G., Hounkpatin O.K.L., Welp G., Thiel M. High resolution mapping of soil properties using Remote Sensing variables in south-western Burkina Faso: a comparison of machine learning and multiple linear regression models. PLoS ONE. 2017;12:1–21. doi: 10.1371/journal.pone.0170478. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Xu Q., Xiong Y., Dai H., Kumari K.M., Xu Q., Ou H.Y., Wei D.Q. PDC-SGB: prediction of effective drug combinations using a stochastic gradient boosting algorithm. J. Theor. Biol. 2017;417:1–7. doi: 10.1016/j.jtbi.2017.01.019. [DOI] [PubMed] [Google Scholar]
- 28.Nash E., Sutcliffe V. River flow forecasting Through conceptual models PART I- A Discussion of principles. J. Hydrol. 1970;10:282–290. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that has been used is confidential.







