Enhanced streamflow prediction with SWAT using support vector regression for spatial calibration: A case study in the Illinois River watershed, U.S

Lifeng Yuan; Kenneth J Forshay

doi:10.1371/journal.pone.0248489

. 2021 Apr 12;16(4):e0248489. doi: 10.1371/journal.pone.0248489

Enhanced streamflow prediction with SWAT using support vector regression for spatial calibration: A case study in the Illinois River watershed, U.S.

Lifeng Yuan ¹, Kenneth J Forshay ^2,^*

Editor: Mou Leong Tan³

PMCID: PMC8041176 PMID: 33844687

Abstract

Accurate streamflow prediction plays a pivotal role in hydraulic project design, nonpoint source pollution estimation, and water resources planning and management. However, the highly non-linear relationship between rainfall and runoff makes prediction difficult with desirable accuracy. To improve the accuracy of monthly streamflow prediction, a seasonal Support Vector Regression (SVR) model coupled to the Soil and Water Assessment Tool (SWAT) model was developed for 13 subwatersheds in the Illinois River watershed (IRW), U.S. Terrain, precipitation, soil, land use and land cover, and monthly streamflow data were used to build the SWAT model. SWAT Streamflow output and the upstream drainage area were used as two input variables into SVR to build the hybrid SWAT-SVR model. The Calibration Uncertainty Procedure (SWAT-CUP) and Sequential Uncertainty Fitting-2 (SUFI-2) algorithms were applied to compare the model performance against SWAT-SVR. The spatial calibration and leave-one-out sampling methods were used to calibrate and validate the hybrid SWAT-SVR model. The results showed that the SWAT-SVR model had less deviation and better performance than SWAT-CUP simulations. SWAT-SVR predicted streamflow more accurately during the wet season than the dry season. The model worked well when it was applied to simulate medium flows with discharge between 5 m³ s^-1 and 30 m³ s^-1, and its applicable spatial scale fell between 500 to 3000 km². The overall performance of the model on yearly time series is “Satisfactory”. This new SWAT-SVR model has not only the ability to capture intrinsic non-linear behaviors between rainfall and runoff while considering the mechanism of runoff generation but also can serve as a reliable regional tool for an ungauged or limited data watershed that has similar hydrologic characteristics with the IRW.

Introduction

Reliable prediction of monthly streamflow can provide crucial information, assisting with decision making for watershed managers, such as future flood and drought forecasting, water quality evaluation and water resources optimization [1, 2]. However, the rainfall-runoff relationship has highly complex and non-linear hydrological features because the transformation from rainfall to runoff is influenced by various natural and human factors including precipitation, terrain, soil, land use and land cover (LULC), evapotranspiration, and groundwater, which makes it difficult to simulate and estimate streamflow with desirable accuracy [3–5]. Numerous hydrologic models with varying degrees of complexity have been developed to expound the rainfall-runoff relationship and predict runoff [6]. Hydrologic models can be roughly categorized into three groups: conceptual model (or grey-box model), physically-based model (or white-box model), and data-driven model (or black-box model) [7, 8]. Conceptual models consider primary hydrological components (e.g. precipitation, snow accumulation and melt, soil moisture storage, river routing, and reservoirs) and are built based on observed data or empirical formulation between many hydrological variables [5]. Conceptual models are helpful to understand the critical physical processes in the hydrological cycle. Physically-based models primarily concern the mathematical description of numerous physical processes in the hydrologic cycle (e.g. various partial or differential equations of expressing the physical laws of mass, energy, and momentum conservations). Physically-based models facilitate the comprehension of hydrological mechanisms but require a considerable amount of spatiotemporal data and model parameters input [3]. Data-driven models include empirical-based statistic models (e.g. various regression formulas) and artificial-intelligent-based models (e.g. artificial neural network (ANN), support vector machine (SVM), and other machine learning methods). They possess powerful predictive ability which accurately captures computable relationships between the relevant input and output variables but neglect detailed characteristics and processes of watershed systems and simplify the nonlinear relationship of rainfall-runoff [7]. In practice, however, there is no clear boundary to divide a single model into the mentioned-above three groups since a hydrological model is often built on multiple methods to improve their applications.

The Soil and Water Assessment Tool (SWAT) is a conceptual, physically-based, and basin-scale hydrologic model and has been extensively applied worldwide [9, 10]. Like many other physically-based hydrologic models, SWAT requires a large amount of data and parameter inputs to run. However, some data are difficult to collect due to time or economic cost, as well as the values of many parameters can only be obtained by calibration [11]. However, the process of calibration is typically time-consuming [12] and complicated as it involves parameterization, the selection of optimization algorithms, and extensive iterative simulations to find optimal parameter combinations and appropriate value range [13]. This challenge is extreme in the cases where limited data exist for parameterization and calibration [14].

SVM is a data-driven machine learning model that has been widely applied to hydrologic prediction, such as short-term or long-term streamflow and sediment yield forecasting [4, 15–19], water quality prediction [20, 21], precipitation, temperature and evapotranspiration simulation [22, 23], and the process of parameterization [12]. The essential characteristic of the SVM method is its ability to efficiently and accurately predict the nonlinear relationship between input and output variables without considering their internal physical links. Furthermore, SVM based on the structural risk minimization principle has shown to be a superior ANN based on the empirical risk minimization principle in several hydrological prediction applications [3, 4, 12]. SVM uses the kernel function and the maximum margin algorithm to solve the nonlinear problem through projecting an input space to a feature space where the nonlinear problem is converted into the linear problem. Additionally, SVM typically applies a grid search method [8] to conduct hyperparameters optimization. The value of SVM applications in streamflow prediction includes how to find the optimal parameter set, raise prediction ability in the test data while keeping high accuracy in training data, as well as avoiding overfitting and uncertainty issues.

Although different categories of hydrologic models exist, streamflow prediction in an ungauged or watershed with limited monitoring data is still a challenging task in hydrology. SWAT can be applied to predict streamflow in an ungauged watershed, but the results of its application are hard to verify due to the lack of on-site data [24] and often there are underlying drivers of variability that are not contained in the typical calibration of a physical model [13]. Similarly, SVM and other machine learning will not perform well in streamflow simulation without a large amount of training data. If there are adjacent gauged watersheds (proxy watersheds) that have similar hydrological characteristics to those within an ungauged watershed, then we can use these proxy watersheds as donor watersheds and treat the ungauged watershed as the target watershed, and then conduct hydrological parameters transferability research [24–26]. The basis of this hypothesis is an application of the first law of geography [27] in which the climate and watershed conditions change smoothly over space and parameters in nearby regions. By integrating a hybrid SWAT-SVR approach, we can better capture the underlying variability of non-linear drivers while considering the hydrological processes. Hence, it is possible to build a SWAT-SVR hydrological model to predict streamflow in an ungauged target watershed with comparable proxy watersheds. In this article, we hypothesized that the application of SVM coupled to the physically-based SWAT model could help improve the model performance. We tested this hypothesis by comparing a common calibration approach SWAT Calibration and Uncertainty Programs (SWAT-CUP) with Sequential Uncertainty Fitting version 2 (SUFI2) algorithm to our hybrid SWAT-SVR method to develop models of streamflow at monthly time scales in the Illinois River watershed (IRW), USA.

Several works have evaluated the performance of SWAT and SVM in streamflow prediction [12, 19, 28]. Zhang, Srinivasan [12] et al. applied Artificial Neutral Network (ANN) and SVM methods to identify the optimal SWAT parameters to save the time cost of calibration and improve the efficiency of parameter calibration in two watersheds of the U.S. Jajarmizadeh, Kakaei Lafdani [28] et al. compared the monthly streamflow predictions from SWAT and SVM, and found the SVM model had a closer value for the average flow in comparison to the SWAT model. These efforts, however, either applied SVM in searching the optimal calibration parameters or built separate SWAT and SVR models, then estimated their running performance. Few studies have combined the two methods for a hybrid approach to streamflow prediction. Chiogna, Marcolini [19] et al. developed an SVM with SWAT model to predict hydropeaking in alpine watersheds in the Northeastern Italian. They used SVM to train the output of SWAT and found the SVM model can capture the fluctuation in streamflow. To the best of author’s knowledge, no study has coupled the SVM and SWAT for streamflow prediction while considering wet-dry change. The objective of this study is to show how a support vector regression (SVR) method to support SWAT calibration can be used to improve monthly streamflow prediction for different seasons in the IRW.

Materials and methods

Study area

The IRW (35°31’-36°9’N, 94°12’-95°2’W) crosses Arkansas and Oklahoma, USA, separated almost equally by a state border, and has a drainage area of 4200 km². The basin elevation ranges from 121 to 602 meters above mean sea level. The average slope of the IRW is 5.6%, and the slope ranges from 0 to 52.6%. The length of the Illinois River is approximately 230 km, flowing from Arkansas to Oklahoma before entering into Tenkiller Ferry Lake in Oklahoma [29]. Other large tributaries within the IRW include the Baron Fork Creek and the Flick Creek. The main soil types are Clarksville (43.8%), Rueter (26.9%) and Enders (18.6%) according to Soil Survey Geographic Database (SURRGO). The IRW is dominated by deciduous forest (40.7%) and pasture/hay (40.3%) as reported by the 2011 National Land Cover Dataset (NLCD).

The climate is humid in this region with an average annual temperature about 16°C. The average yearly precipitation is 1198 mm. The mean annual lake evaporation is about 1270 mm [30]. Thirteen U.S. Geological Survey (USGS) hydrologic stations were selected to develop this new method. These monthly discharge data can be accessed and downloaded by USGS official website (https://dashboard.waterdata.usgs.gov/app/nwd/?region=lower48). Daily weather data of five climate stations from the National Climatic Data Center (NCDC) were used as weather input of the SWAT model. Fig 1 shows the spatial distribution of terrain, rivers, hydrologic and meteorological stations, and lakes in this area, and the relative position of the IRW in the U.S.

Although some studies have focused on the IRW [29, 31–34], these efforts paid more attention on water quality and nonpoint source pollution (NSP) evaluation, and few attempted to improve the accuracy of streamflow prediction. However, accurate streamflow simulation is a fundamental base for subsequent water quality and NSP simulation. In this study, we concentrated on improving the accuracy of streamflow at a monthly time scale through integrating a physically-based SWAT model and a data-driven SVR method.

The SWAT model

SWAT is a continuous, semi-distributed, and physically-based hydrologic model used to simulate water cycles, crop growth, sediment yields, and agricultural chemical transport in a large river basin with varying soils, slopes and land use management conditions [9]. SWAT was developed by the U.S. Department of Agriculture Agricultural Research Service (USDA-ARS), and has been extensively used worldwide [10, 35]. In SWAT, a watershed is initially delineated into multiple sub-watersheds, then a sub-watershed is further divided into one or more hydrological response units (HRUs) where all land areas have similar land use, soil property, and slope combinations [36]. An HRU is the smallest spatial response unit where many physical processes such as hydrological cycle, soil erosion, nutrient and pesticide transport are simulated [37]. Primary input data include digital elevation model (DEM), land use, soil, and weather (i.e. precipitation, temperature, wind speed, solar radiation, and relative humidity). Water, sediment, and chemical movement in SWAT involve two phases: first, the watershed land areas control water transported to the channels together with sediment, nutrients and pesticides in each sub-watershed. Then, the movement of water and other mass through the stream network to the watershed outlet [38]. A more detailed description of the SWAT model can be available from online documentation (https://swat.tamu.edu/docs/).

SWAT model setup

We used ArcSWAT version 2012.10_4.19 within ArcGIS 10.4.1 to build the IRW SWAT model. Digital elevation model (DEM) was obtained from Shuttle Radar Topography Mission (SRTM) 1 Arc-Second (about 30 m × 30 m) Global Database and downloaded from USGS website (https://earthexplorer.usgs.gov/, 01-28-2018) (Fig 2a). Land use and land cover (LULC) data was from the 2011 NLCD dataset (https://www.mrlc.gov/, 01-31-2018) (Fig 2b), and spatial resolution is 100 m × 100 m. Soil data came from the SSURGO database (https://websoilsurvey.nrcs.usda.gov/, 02-05-2018) (Fig 2c). Climate data obtained from the National Climatic Data Center (NCDC) (https://www.ncdc.noaa.gov/, 02-07-2018) (Fig 2d). Due to missing precipitation and temperature records from NCDC climate data from Jan. 1990 to Dec. 2013, we downloaded alternative Climate Forecast System Reanalysis (CFSR) data from the SWAT official website (https://globalweather.tamu.edu/, 01-31-2018), then filled missing NCDC data using climate data from the closest CFSR grid stations (not shown in Fig 2d). All precipitation data of five climate stations meet the data consistency checks using the double mass curve method [39]. The basic information of thirteen hydrologic stations is listed in Table 1.

Table 1. Watershed properties of selected USGS stations.

No.	USGS station (Subwatershed No.)	Upstream area (km²)	Simulated upstream area^† (km²)	Data period	Number of data	Average monthly streamflow (m³ s^-1)	Group
1	07195800 (1)	36.8	36.2	1.1995–12.2013	228	0.42	Low flows
2	07195855 (7)	155.0	134.5	1.1995–12.2013	228	1.27	Low flows
3	07195865 (12)	49.5	52.8	1.1997–12.2013	204	0.68	Low flows
4	07196000 (17)	300.7	302.8	1.1995–12.2013	228	3.01	Low flows
5	07195500 (24)	1633.0	1570.2	1.1995–12.2013	228	18.71	Medium flows
6	07195430 (26)	1490.5	1438.0	1.1996–12.2013	216	17.68	Medium flows
7	07196090 (28)	2138.5	2072.8	7.2010–12.2013	42	25.47	Medium flows
8	07196973 (46)	64.8	66.0	1.1995–12.2002	96	0.73	Low flows
9	07196500 (51)	2462.5	2385.8	1.1995–12.2013	228	27.76	Medium flows
10	07197000 (52)	808.7	797.1	1.1995–12.2013	228	9.21	Medium flows
11	07196900 (62)	105.2	105.2	1.1995–12.2013	228	1.31	Low flows
12	07197360 (74)	233.8	228.3	1.1998–12.2013	192	2.41	Low flows
13	07198000 (85)	4186.2	4070.0	1.1995–12.2013	228	44.03	High flows

Open in a new tab

^†Note: The column of the simulated upstream area refers to delineate the upstream area by the ArcSWAT program.

The IRW was delineated into 86 subwatersheds with 1023 HRUs under a threshold area of 3000 ha. The multiple land use/soil/slope method was applied to define the HRUs with land use (10%), soil (10%) and slope (5%) threshold. The surface runoff was estimated using the SCS curve number method [40], and the Penman-Monteith equation [41] was applied to calculate the potential evapotranspiration. The streamflow was routed and calculated according to the variable storage routing method [38]. A five-year was used as a warm-up period (1990–1994) to initialize the model input and stabilize the SWAT model. The simulation running period of the SWAT model is from Jan-01-1995 to Dec-31-2013.

Streamflow prediction

Dividing dry and wet season

There is evidence that SWAT model performance can be improved and better reflect the seasonal change of parameters by separating the dry and wet seasons [42–44]. Therefore, we developed the SWAT-SVR model based on the separation of the dry and wet seasons to reflect the impact of seasonal change. In this paper, we used the runoff coefficient (RC) of subwatersheds and flow discharge at the outlet of subwatersheds to divide the dry and wet seasons. The RC is calculated by dividing the areally averaged total monthly runoff by the areally averaged total monthly rainfall. The areally averaged total monthly runoff is computed by multiplying flow rate measured at the watershed outlet with time then dividing by the watershed area. The Thiessen polygons of NCDC stations in Fig 2d were used to partition the IRW. Daily rainfall from NCDC stations was aggregated by month to represent the areally averaged total monthly rainfall in each Thiessen polygon region. The statistic period of data at each station can be found in Table 1.

Fig 3 Shows the distinction between wet and dry seasons of rainfall-runoff characteristics. The average monthly RC (AMRC) of the IRW was 0.3. The maximum and minimum AMRC were 0.45 and 0.11, which occurred in January and September. The AMRC before and after June was 0.39 (purple line in Fig 3) and 0.2 (green line in Fig 3). The AMRC gradually declines from January to September, then quickly increases afterward (red line in Fig 3). January to April were the months of the highest AMRC, and August to October were the months of the lowest AMRC. The AMRC at the subwatershed 28 did not follow the common trend of most subwatersheds because the data length of 07196090 site only came from 42 months, and it is far less than the other twelve sites.

To illustrate the distribution of monthly streamflow on thirteen stations, we plotted the average monthly streamflow hydrograph (Fig 4). Streamflow was categorized into three groups based on the volume of flows discharge: low flows, medium flows, and high flows (Table 1). Low flows with discharge less than 5 m³ s^-1 come from 07195800, 07195855, 07195865, 07196000, 07196973, 07196900, and 07197360; Medium flows with discharge between 5 m³ s^-1 and 30 m³ s^-1 are from 07195500, 07196090, 07195430, 07196500, and 07197000; High flows with discharge greater than 30 m³ s^-1 are from 07198000. Average monthly maximum and minimum streamflow occurred at stations 07198000 (the outlet of the IRW) and 07195800 (the subwatershed of the most upper reach), respectively. The maximum and minimum discharge occurred in April and September (Fig 4). Streamflow from January to June accounted for 67.39% of the annual total amount, which is approximately two times greater than those from July to December. Based on the analysis of RC and flows, we divided January to June as the wet season and July to December as the dry season.

Coupling SWAT with SVR

To improve monthly streamflow prediction, we combined the SWAT model and SVR method and developed the SWAT-SVR model. In this approach, the outcome of flow (including baseflow) was first simulated by SWAT with its default parameter combinations without the calibration procedure. Then, the simulated streamflow at month t from the SWAT model and the upstream drainage area of the station serve as two inputs of the SVR model to predict streamflow on month t. This design reduced time needed to calibrate and validate the SWAT model as well as time of features selection during the SVR application. In this design, SWAT was regarded as a comprehensive transfer function by integrating weather, terrain, LULC, soil data, and producing new flow output that serves as input to the SVR model.

SVM is a black box, mathematic model, which attempts to search for an optimal separating hyperplane with the maximal margin between observations and finds the optimal function and parameter sets fitting the observations while avoiding overfitting and having better generalization ability [19]. SVR belongs to an application of SVM for regression analysis. A detailed description of SVM theory is beyond the scope of this article, and it can be obtained from Vapnik [45], Hastie, Tibshirani [46], Chang and Lin [47], and Smola and Schölkopf [48].

The principle of SVM is rooted in the statistical learning and structural risk minimization theory [45]. Briefly SVM coverts a complex nonlinear problem in the original input space (i.e. the space of the observed data) into a simple linear problem in the feature space (i.e. some higher dimensional space) using a kernel function [49]. Commonly used kernel functions include the linear, polynomial, Gaussian radial basis (RBF), and sigmoid. Among these kernels, the linear kernel is a particular case of RBF, the sigmoid kernel behaves like RBF for certain parameters, and the polynomial kernel will produce more hyperparameters than the RBF kernel which causes more computational difficulties [50, 51]. Hence, we chose the Gaussian RBF kernel function, and its mathematic expression is described as:

K (x_{i}, x_{j}) = exp (- {γ ‖ x_{i} - x_{j} ‖}^{2})

(1)

In an SVR ε-regression application based on RBF kernel, three parameters need to be determined: the penalty parameter of the error term C (C > 0), the Gaussian RBF kernel parameter γ, and the width/deviation of the error margin ɛ. The grid search and the k-fold cross-validation method were used to optimize these parameters by defining the upper and lower bound for each parameter and estimating the predicted accuracy of the model. In the k-fold cross-validation, the dataset was subdivided into k subsets of nearly equal size. In each step, the k-1 subsets were used to train the model while the remaining subset was used for validation [19]. Each subset was applied exactly once for validation. At last, the averaged error of all k trials was calculated. In our study, we first chose a coarse numeric range of C, γ, and ɛ to conduct grid search, then narrowed down this search range according to the output of the SVR model. R version 3.4.0 running on RStudio version 1.1.456 and the ‘e1071’ package [52] were used for the development, training and testing of the SWAT-SVR model. Standardizing data can avoid numbers in greater ranges dominating those in smaller ranges and reduce calculation complexity [51]. Also, a scaling tool in the ‘e1071’ package does not work very well for SVR regression analysis. Before building the model, hence, we normalized two input variables (i.e. streamflow and upstream drainage area) using Eq 2.

x_{n e w, i} = \frac{x_{i} - x_{m i n}}{x_{m a x} - x_{m i n}}

(2)

where x_new,i is the normalized parameter, x_i is observed data series, and x_max and x_min are the maximum and minimum of the observation. Independent seasonal SWAT-SVR models were developed for monthly flow prediction at 13 stations. In each model run, the leave-one-out sampling method was applied to calibrate the SWAT-SVR model spatially. Out of n stations, one station was excluded for testing purposes, and the SWAT-SVR model was trained with the remaining (n-1) stations. This step was repeated until all stations had been removed once [24].

SWAT-CUP

SWAT-CUP, a standalone SWAT calibration procedure [13], was used to compare the results of SWAT-SVR streamflow prediction. Parameters sensitivity analysis was conducted by the all-at-a-time approach with 1000 SWAT-CUP simulations. SUFI2 was employed into sensitivity analysis, calibration and validation to seek an optimized parameter set due to the high effectiveness of this algorithm [53]. SWAT and SWAT-CUP were run for all stations at one time with three iterations during the wet and dry seasons. After the first two iterations with 250 simulations for each iteration, parameter ranges were narrowed down by considering both the physical limitations of parameters and suggested ranges from the calibration. We applied the calibrated parameter ranges, and independent data from the station left out to conduct another iteration with 250 simulations for validation. The procedures of calibration and validation followed the guidelines of Moriasi et al. [54]. Fig 5 demonstrates a research flowchart describing the methodology used in this study.

Model performance evaluation

We used R² (Pearson’s coefficient of determination), NSE (Nash-Sutcliffe efficiency), PBIAS (percent bias), RMSE (root mean square error), and RSR (RMSE-observation’s standard deviation ratio) to evaluate the model performance. R² and NSE are widely used as a reliable criterion to evaluate the predictive ability of hydrological models [55]. PBIAS measures the average magnitude of the simulations to be larger or smaller than their observations. In this study, positive values of PBIAS indicate the overestimation bias, and negative values refer to the underestimation bias. RMSE shows the discrepancy between the observed and simulated series. RSR indicates the residual variation of the prediction [56]. The lower RSR, PBIAS, and RMSE, the higher R² and NSE, and the better the model prediction performance. The ‘hydroGOF’ package in R was used to calculate the mentioned statistical indicators [57]. Table 2 listed the evaluation indicators and their calculation methods.

Table 2. Evaluation indicators of the model performance and their mathematic expressions^†.

Indicator Name	Calculation Equation	Description
Pearson’s coefficient of determination (R²)	$R^{2} = \frac{(\sum_{i = 1}^{n} (y_{i} - \bar{y}) (y_{i}' - \bar{y'}))^{2}}{\sum_{i = 1}^{n} (y_{i} - \bar{y})^{2} \sum_{i = 1}^{n} {(y_{i}' - \bar{y'})}^{2}}$	Range [0,1], and 1 is the perfect value (p.v.)
Nash-Sutcliffe efficiency (NSE)	$N S E = 1 - \frac{\sum_{i = 1}^{n} (y_{i} - y_{i}^{'})^{2}}{\sum_{i = 1}^{n} (y_{i} - \bar{y})^{2}}$	Range (-∞,1], and 1 is the p.v.
Percent Bias (PBIAS)	$P B I A S = 100 \times \frac{\sum_{i = 1}^{n} ({y_{i}}^{'} - y_{i})}{\sum_{i = 1}^{n} y_{i}}$	Range (-∞, +∞), and 0 is the p.v.
RMSE-observations standard deviation ratio (RSR)	$R S R = \frac{\sqrt{\sum_{i = 1}^{n} (y_{i} - y_{i}')^{2}}}{\sqrt{\sum_{i = 1}^{n} (y_{i} - \bar{y})^{2}}}$	Range [0, +∞), and 0 is the p.v.
Root Mean Square Error (RMSE)	$R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - y_{i}^{'})}^{2}}{n}}$	Range [0, +∞), and 0 is the p.v.

Open in a new tab

^†Note: y_i is the observed data series, $y_{i}^{'}$ is the simulated results series, the overbar represents the mean value of data series, and n is the sample number.

In this work, we applied a rating metric of hydrologic model evaluation from Moriasi et al. [54] to estimate the model performance (Table 3).

Table 3. Performance ratings of recommended statistics for streamflow simulations e.g Moriasi et al [54].

Performance Rating	RSR	NSE	PBIAS (%)
Very Good	0 ≤ RSR ≤ 0.5	0.75 < NSE ≤ 1	PBIAS < ±10
Good	0.5 < RSR ≤ 0.6	0.65 < NSE ≤ 0.75	±10 ≤ PBIAS < ±15
Satisfactory	0.6 < RSR ≤ 0.7	0.5 < NSE ≤ 0.65	±15 ≤ PBIAS < ±25
Unsatisfactory	RSR > 0.7	NSE ≤ 0.5	PBIAS ≥ ±25

Open in a new tab

Results and discussion

Performance comparison between SWAT-SVR and SWAT-CUP

A total of 52 independent SWAT-SVR models were developed for monthly streamflow prediction (i.e. calibration and validation) during the wet and dry seasons in 13 USGS hydrologic stations. The corresponding 52 simulation results from SWAT-CUP were regarded as comparison experiments estimating the model performance. Spatial calibration method was implemented for each site. In each run, streamflow time series data from 12 stations were treated as training data, and the station left out was used for testing purpose. The support vector ɛ-regression based on the Gaussian RBF kernel was applied for developing the SWAT-SVR model. The initial numeric range of parameters in SVR grid searching is: C (begin = 2⁻⁶, end = 2⁸, step = 1), γ (begin = 2⁴, end = 2⁻⁸, step = -1), and ɛ (begin = 2⁻⁸, end = 2⁻¹, step = 0.5). The following fine search used a smaller step and range for the above three parameters according to variable results on different models. The final value range of C for 13 SWAT-SVR models are from 53.0156 to 255.0156, the value of γ is 0.4, and ɛ is 0.00390625 for the wet season; the value of C falls between 32.0156 and 255.0156, γ is 1.2, and ɛ is 0.00390625 as well in the dry season. The k-value in cross-validation was set 5 for SVR simulations.

Table 4 shows the calibration results of the model by SWAT-SVR and SWAT-CUP methods during the wet and dry seasons. According to Moriasi et al. [54], we conducted rigid criteria for the evaluation of the model performance (i.e. the overall performance of the model should be determined conservatively as the lowest rating when the value of RSR, NSE, and PBIAS has a conflicting performance). Table 4 indicates that 100% (13/13) of the SWAT-SVR runs for the wet season and 84.6% (11/13) of the runs for the dry season had “Good” performance ratings in calibration. Based on the value of PBIAS, the SWAT-SVR model slightly underestimated monthly streamflow for each watershed during the wet and dry seasons, and SWAT-CUP method also underestimated wet season streamflow but remarkably overestimated dry season streamflow. The mean of NSE and R² of 13 stations decreased from 0.92 and 0.92 in the wet season to -0.16 and 0.55 in the dry season, respectively. This results are consistent with Zhang, Chen [43]’s study in which the SWAT model can produce good simulations for the wet season but poor simulations for the dry season. The possible reason is that R² and NSE are sensitive to extremely large number (i.e. high flows took place in the wet season). Compared with the performance of SWAT-CUP, the SWAT-SVR model has approximately similar performances for the wet and dry seasons. We noted that the variation of statistics is small, and the value of each indicator is close between different SWAT-SVR models. This is because a single SWAT-SVR watershed model was built based on eleven other common watersheds in calibration. Although the simulation results from SWAT-CUP had better overall performance (i.e. 53.8% of the runs had “Very Good” ratings) than those of the SWAT-SVR model in the wet season, SWAT-CUP failed to estimate monthly streamflow in the dry season, in which all runs were identified as “Unsatisfactory” ratings. We are not surprised that the SWAT-SVR model has a good performance in the period of calibration because SVR typically possesses a strong learning ability for the training dataset. In the following section, we focus on the discussion of the model performance in validation and expect that the SWAT-SVR model has better generalization ability and can be applied in an ungauged watershed.

Table 4. Calibration performance of streamflow simulations by SWAT-SVR and SWAT-CUP during the wet and dry seasons.

Station		SWAT-SVR						SWAT-CUP
Station		RSR	NSE	PBIAS	R²	RMSE (m³ s^-1)	Rating	RSR	NSE	PBIAS	R²	RMSE (m³ s^-1)	Rating
Wet season	07195800	0.49	0.76	-11.0	0.77	13.95	Good	0.39	0.85	-12.5	0.85	11.80	Good
	07195855	0.50	0.75	-11.6	0.76	14.19	Good	0.39	0.85	-12.6	0.85	11.83	Good
	07195865	0.49	0.76	-10.6	0.76	13.91	Good	0.25	0.94	-7.2	0.94	7.62	Very Good
	07196000	0.49	0.76	-10.1	0.77	13.92	Good	0.25	0.94	-7.0	0.94	7.55	Very Good
	07195500	0.49	0.76	-10.8	0.77	13.48	Good	0.29	0.92	-11.8	0.92	8.49	Good
	07195430	0.49	0.76	-11.3	0.77	13.42	Good	0.28	0.92	-10.3	0.93	8.15	Good
	07196090	0.51	0.74	-13.4	0.76	13.65	Good	0.18	0.97	-5.1	0.97	5.08	Very Good
	07196973	0.51	0.74	-13.2	0.76	14.08	Good	0.25	0.94	-7.5	0.94	7.46	Very Good
	07196500	0.50	0.75	-11.7	0.76	13.63	Good	0.27	0.93	-8.8	0.93	7.43	Very Good
	07197000	0.49	0.76	-10.6	0.77	13.85	Good	0.28	0.92	-11.0	0.92	8.61	Good
	07196900	0.50	0.75	-11.5	0.76	14.14	Good	0.25	0.94	-7.5	0.94	7.66	Very Good
	07197360	0.49	0.76	-11.2	0.76	13.96	Good	0.25	0.94	-7.6	0.94	7.60	Very Good
	07198000	0.42	0.83	-12.0	0.84	8.84	Good	0.25	0.94	-12.6	0.94	5.39	Good
Dry season	07195800	0.55	0.69	-12.3	0.70	8.57	Good	0.94	0.12	20.6	0.16	14.54	Unsatisfactory
	07195855	0.55	0.70	-11.9	0.70	8.56	Good	1.02	-0.03	94.4	0.63	15.78	Unsatisfactory
	07195865	0.55	0.69	-12.5	0.70	8.54	Good	0.96	0.08	14.9	0.11	14.84	Unsatisfactory
	07196000	0.55	0.70	-11.7	0.71	8.56	Good	1.01	-0.03	94.1	0.63	15.76	Unsatisfactory
	07195500	0.55	0.69	-12.1	0.70	8.39	Good	1.01	-0.03	97.8	0.63	15.38	Unsatisfactory
	07195430	0.58	0.66	-16.1	0.70	8.82	Satisfactory	1.02	-0.05	102.2	0.64	15.50	Unsatisfactory
	07196090	0.55	0.70	-12.6	0.71	8.24	Good	1.01	-0.01	95.1	0.63	15.13	Unsatisfactory
	07196973	0.55	0.70	-12.9	0.71	8.38	Good	1.18	-0.40	124.1	0.63	17.95	Unsatisfactory
	07196500	0.59	0.65	-16.6	0.69	8.51	Satisfactory	1.15	-0.33	123.6	0.65	16.63	Unsatisfactory
	07197000	0.55	0.70	-12.8	0.71	8.51	Good	1.19	-0.41	122.2	0.61	18.38	Unsatisfactory
	07196900	0.55	0.69	-12.5	0.70	8.58	Good	1.19	-0.41	123.6	0.63	18.44	Unsatisfactory
	07197360	0.55	0.70	-11.7	0.71	8.49	Good	1.18	-0.40	123.6	0.63	18.28	Unsatisfactory
	07198000	0.40	0.84	-12.2	0.85	3.98	Good	1.10	-0.22	114.5	0.63	10.85	Unsatisfactory

Open in a new tab

Fig 6 The performance ratings of the SWAT-SVR and SWAT-CUP model during wet and dry season validation. The values of NSE for 07196000 station from SWAT-SVR and SWAT-CUP are below zero in validated simulations. Hence, site 07196000 is not shown on the figures for clarity. The subsequent analysis only showed 12 valid stations. Fig 6a shows that 75% (9/12) of SWAT-SVR model prediction for the wet season falls into the ratings of “Good” and “Satisfactory”, and the performance ratings of three models are “Unsatisfactory”. In comparison with SWAT-SVR, 66.7% of SWAT-CUP simulations belong to “Good” and “Satisfactory”, and the ratings of four models are “Unsatisfactory”. Although 50% of all models had consistent ratings produced by SWAT-SVR and SWAT-CUP, more “Good” and less “Unsatisfactory” ratings were observed in the SWAT-SVR model. In the wet season, the average RSR, NSE, PBIAS, R², and RMSE from SWAT-SVR and SWAT-CUP is 0.65, 0.57, -5.33, 0.65, and 10.45 m³ s^-1, and 0.63, 0.58, 18.63, 0.65, and 9.88 m³ s^-1, respectively. We concluded that the SWAT-SVR model has less discrepancy (i.e. the smaller absolute value of PBIAS) than SWAT-CUP simulations despite very close values from other statistics, and SWAT-SVR slightly underestimated wet season streamflow.

In the dry season, the SWAT-SVR model had better model performance than SWAT-CUP simulations according to Fig 6b. Two SWAT-SVR models (07195500 and 07196500) had “Very Good” ratings, and the other two models (07196090 and 07197000) obtained “Satisfactory” ratings. No “Satisfactory” or better performance exists in SWAT-CUP simulations, and this result is coherent and consistent with the performance of SWAT-CUP in calibration. The average RSR, NSE, PBIAS, R², and RMSE for streamflow prediction is 0.70, 0.49, -12.5, 0.62, and 5.08 m³ s^-1 by SWAT-SVR, and 0.36, -0.58, 88.39, 0.36, and 8.57 m³ s^-1 by SWAT-CUP, respectively. It is clear that streamflow prediction from SWAT-CUP in the dry season had greater deviation in comparison with SWAT-SVR simulations. The developed model underestimated the dry season streamflow.

Streamflow prediction between the wet and dry seasons differed and wet season prediction easily obtained better performance. Low flows took place in dry seasons are a seasonal phenomenon, and their prediction is a challenging task in hydrology [58]. This difficulty may be attributed to the complexity of groundwater processes and the lack of effective evaluation criteria of low flows. Low flows in the dry season are typically generated from groundwater discharge or surface discharge from lakes, reservoirs, and marshes [58]. However, it is hard to investigate subsurface water discharge from nearby watersheds into a river channel in an unclosed watershed because of the limitation of hydrological measurement methods and the complexity of groundwater flow processes. Often these types of groundwater models are highly site-specific [59] or cover vast areas [60]. Furthermore, there are no effective and suitable statistical indicators to estimate the performance of low flows simulation. Both R² and NSE are known to put greater emphasis on high flows prediction and are sensitive to the hydrological regime, sample size or outliers [61]. Pushpalatha, Perrin [61] suggested using the objective function NSE of SqrtQ or lnQ for low flows evaluation.

The flow duration curves of observed versus simulated streamflow by SWAT-SVR are given in Fig 7 for each subwatershed. Fig 7 reveals that the developed model failed to capture extreme high flows with one exception (i.e. 07196090 in the dry season), but it worked well for various ranges of flow values especially for most medium flows and some low flows in the dry season. For example, simulations from 07195500, 07195430, 07196500, 07195865, and 07198000 in the wet season, and simulations from 07195800, 07195855, 07195865, 07195500, 07195430, 07196090, 07196500, 07197360, and 07198000 in the dry season matched observations well in medium and low flows. We noted that the flow duration curves of observations from 07196090 and 07196973 sites are steep. This is also because the length of flow data from the above two locations is 24 and 48 months, which only reflected the short and local temporal characters of flow duration.

Model suitability analysis

To clearly reflect the spatial distribution of the SWAT-SVR model performance, we plotted the rating map of different models in validation during the wet and dry seasons (Fig 8). In the wet season, five models with ratings of “Good” came from 07195500, 07196090, 07196500, 07196500, and 07196900 sites where the flow discharge belonged to medium flows between 5 m³ s^-1 and 30 m³ s^-1 except 07196900 with low flows where flow discharge is less than 5 m³ s^-1. Four “Satisfactory” models came from 07195800, 07195855, 07197360, and 07195430 sites where the first three sites belonged to the low flows group except for 07195430 with medium flows. In the dry season, models from 07195500 and 07196500 sites had “Very Good” performance while the other two models from 07196090 and 07197000 sites were rated as “Satisfactory”. All four of these models came from the medium flows group. Out of twelve models with medium flows, 07195500 and 07196500 had the best performance during the wet and dry seasons. The reason that SWAT-SVR cannot capture high flows is because events with a flow discharge larger than 30 m³ s^-1 were very rare (only account for 10.2% (263 data points) of total observations (2574 data points)). As a result, the number of high flows data in seasonal SVR calibration was less than 5.1% of total observations. Among high flows dataset, SVR cannot obtain enough training in calibration although the fact that SWAT generally overestimated these events (i.e. PBIAS is 21) is helpful to SVR training. The problem could be solved by adding more parameters controlling hydrological response such as precipitation, temperature and groundwater level to further train SVR. However, such an analysis is beyond the scope of our work. We also noted that the validation result from the outlet (07198000) of the IRW was unsatisfactory regardless of wet or dry season. This is because we know little about the operations of the upstream dam nearby 07198000 station, and this information has not been added into SWAT simulation. Meanwhile, this result also confirmed the opinion from Daggupati, Pai [25] that a single site calibration method (generally the outlet of a watershed) might not be suitable for simulations of a large watershed due to the spatial heterogeneity. In this case, the spatial calibration considering multiple sites is a more reliable method.

We plotted the relationship between estimating indicators and the upstream drainage area to further discover the spatial scale on which the model is applicable (Fig 9). In Fig 9, the y-axis is the value range of NSE, R², and RSR statistics; the x-axis represents the upstream drainage area of each station. The shaded region is 95% confidence interval of each indicator. We conducted the local polynomial regression analysis [62] on the above three indicators to find the trend of indicators change over the size of the watershed area. Fig 9a demonstrates that the indicator of NSE, R², and RSR have similar changing patterns during the wet and dry seasons. The value of NSE and R² can stay at a high level, and RSR keeps a low value when the size of a watershed falls in the range of 500 to 3000 km². If these conditions from Fig 9a can be met; meanwhile, PBIAS value is small, and then the model would have better performance. The response of PBIAS value on the change of watershed size did not present a distinct pattern (Fig 9b). Therefore, we conclude that the developed SWAT-SVR model is applicable at sites with medium flows (i.e. 07195500, 07195430, 07196090, 07196500, and 07197000) where the upstream drainage area is between 500 and 3000 km².

Streamflow prediction on yearly time series

To obtain an entire understanding of monthly streamflow prediction in the IRW, we combined the wet and dry seasons validated simulations, recalculated statistical indicators, and re-estimated overall model performance on the entire time series (i.e. calibration and validation periods are considered together). We summarized the overall performance indicators computed for SWAT-SVR and SWATCUP (Table 5). Table 5 shows 66.7% of twelve SWAT-SVR models had “Satisfactory” to “Very Good” performance ratings. The average RSR, NES, PBIAS, R², and RMSE is 0.62, 0.60, -8.34, 0.66, and 8.51 m³ s^-1 for the developed model, respectively. The overall performance of twelve models on yearly time series is “Satisfactory”. By comparison, only one site had a “Satisfactory” rating from SWAT-CUP. In most cases, the SWAT-SVR model outperformed the SWAT-CUP method.

Table 5. Overall performance ratings by SWAT-SVR and SWAT-CUP after combining wet and dry simulations.

Station	SWAT-SVR						SWAT-CUP
	RSR	NSE	PBIAS	R²	RMSE	Ratings	RSR	NSE	PBIAS	R²	RMSE	Ratings
	RSR	NSE	PBIAS	R²	(m³ s^-1)	Ratings	RSR	NSE	PBIAS	R²	(m³ s^-1)	Ratings
07195800	0.69	0.52	-12.9	0.57	0.31	Satisfactory	0.71	0.50	14.8	0.55	0.31	Unsatisfactory
07195855	0.63	0.60	-4.8	0.61	0.91	Satisfactory	0.88	0.22	46.4	0.49	1.26	Unsatisfactory
07195865	0.77	0.40	-25.4	0.49	0.57	Unsatisfactory	0.83	0.31	29.5	0.44	0.61	Unsatisfactory
07195500	0.49	0.76	-6.2	0.78	9.75	Very Good	0.64	0.58	35.2	0.70	12.89	Unsatisfactory
07195430	0.61	0.62	-25.6	0.74	11.80	Unsatisfactory	0.57	0.68	20.6	0.71	10.91	Satisfactory
07196090	0.49	0.76	-12.6	0.84	16.36	Good	0.46	0.78	32.2	0.84	15.49	Unsatisfactory
07196973	0.70	0.50	-4.7	0.56	0.53	Unsatisfactory	0.96	0.07	60.6	0.45	0.72	Unsatisfactory
07196500	0.52	0.73	-7.4	0.77	15.77	Good	0.66	0.56	37.1	0.68	19.92	Unsatisfactory
07197000	0.54	0.71	8.3	0.77	6.16	Good	0.78	0.40	61.0	0.65	8.87	Unsatisfactory
07196900	0.60	0.63	-0.9	0.67	1.07	Satisfactory	0.96	0.07	90.2	0.58	1.69	Unsatisfactory
07197360	0.63	0.61	-3.0	0.62	1.80	Satisfactory	0.87	0.24	62.9	0.57	2.51	Unsatisfactory
07198000	0.79	0.37	-4.9	0.45	37.10	Unsatisfactory	0.93	0.13	49.5	0.42	43.57	Unsatisfactory

Open in a new tab

We also plotted monthly streamflow hydrography for each site in Fig 10 to better explain where the developed model performed better than SWAT-CUP method. In Fig 10, all sites have similar hydrologic characteristics and they are all located in the IRW. The SWAT-SVR model works well for most medium flows and some low flows and can capture their timing and shape of rising and recession curves, but failed to capture extreme high flows on a monthly time scale (e.g. in the wet season of 2000, 2008, and 2011). We believe there are likely different drivers of hydrologic flow in wet and dry season that are not equivalently captured or modeled by SWAT, particularly because the purpose of SWAT development is not focused on flood prediction. As expected, the performance of SWAT-SVR heavily relied on the training data, it did not perform well when predicting high flows due to a small amount of training data in this study. However, we observed better prediction in the medium flow and few low flow conditions because SVR obtained enough training; another possible reason is that SVR captured a complicated nonlinear pattern from baseflow and groundwater patterns in the system that manifested in the high flow prediction.

Some of the dry seasons had more flow discharge than the wet seasons (e.g. the dry season of 1996, 2004, and 2009), and that was an error source of the SWAT-SVR model. The simulations from SWAT-CUP can capture extreme high flows (e.g. the wet season of 2000, 2008, and 2011), but far overestimated some medium flows and most low flows in the dry season (e.g. 1998, 2001, 2003, 2007, and 2010). There may be nonlinear drivers that exist due to other factors that are also difficult to incorporate during typical model calibration but better represented the system. Overall, the developed model can fit well with observations for most subwatersheds of the IRW.

In our study, the proposed method decreased the procedures of the SWAT model calibration and parameterization processing. Output streamflow from SWAT and the upstream drainage area were input into SVR where only three parameters needed to be verified. It made the parameter transfer of a hydrological model easier and feasible [63]. Additionally, we did not conduct the uncertainty analysis on the model but used strict criteria to estimate the model performance to limit the uncertainty of the SWAT-SVR model. Moreover, we used the spatial calibration and leave-one-out sampling method, meaning the validation work of any test watershed synthesized hydrologic information from the other 12 sub-watersheds. It is helpful for flow prediction at an ungauged or limited data watershed. In this sense, the developed model can serve as a regional tool as it integrates all information from nearby watersheds.

Conclusions

This study developed a streamflow prediction model on a monthly time scale based on the SWAT model and the SVR method. Streamflow output from SWAT simulation and the upstream drainage area were served as two input variables into SVR. The methodology considered various physical processes influencing flows change through integrating the SWAT model inside, as well as reducing time needed to calibrate and validate SWAT and time for feature selection in SVR while trying different parameter combinations. The overall performance of the model on the continuous time series is “Satisfactory” based on Table 5. The hybrid model predicted streamflow more accurately during the wet season than the dry season. Also, the model is likely applicable in situations that require better performance under medium flow conditions, for example, in this case, a watershed with medium flows with discharge between 5 m³ s^-1 and 30 m³ s^-1 where the upstream drainage area is between 500 to 3000 km². The strength of the proposed SVR approach is its capability to capture the intrinsic non-linear characteristics between rainfall-runoff while considering physical processes by integrating the SWAT model. Moreover, by using the spatial calibration and leave-one-out sampling method, the developed SWAT-SVR model can serve as a good regional tool for an ungauged or limited data watershed that has similar hydrologic characteristics with the IRW.

In cases where data are scarce, like an ungauged watershed, it is reasonable to apply proxy data and use machine learning techniques like SVM with physically based spatially distributed models, like SWAT, to produce high quality hydrologic prediction and, depending on the quantity of data available, describe more of the nonlinear variability that is often lost with conceptually built physical models that are inherently process weak [64]. Even though the calibration process may improve prediction without intrinsically including all physical processes [26], we believe this calibration approach can be incorporated into those process model predictions with a hybrid calibration procedure, like the one presented here. This approach may be a way to better represent the diversity of difficult to model hydrologic heterogeneity like groundwater discharge and nonlinearity that are contained within process model predictions often observed in physically based models within the constraints of current modeling practice, particularly in ungauged watersheds.

Supporting information

S1 Appendix

(DOCX)

Click here for additional data file.^{(18KB, docx)}

Acknowledgments

This research was performed while Dr. Lifeng Yuan held an NRC Research Associateship award at Robert S. Kerr Environmental Research Center, Ada, OK 74820. This work does not reflect the views of the US EPA, and no official endorsement should be inferred. We appreciate Dr. Yongping Yuan, Dr. Mohamed Hantush, Katherine Buckler, and Pat Bush for support of this paper. We are also thankful for Dr. Tibebe B. Tigabu and other two anonymous reviewers for their constructive comments.

Data Availability

The data is available at doi.org/10.23719/1520734.

Funding Statement

This research was funded by the United States Environmental Protection Agency as part of the Office of Research and Development, Safe and Sustainable Water Research Program. Ken Forshay is a Research Ecologist of the Environmental Protection Agency. Lifeng Yuan was a National Research Council, Senior Research Associate, resident at the U.S. EPA working with Dr. Kenneth Forshay.

References

1.Alizadeh MJ, Kavianpour MR, Kisi O, Nourani V. A new approach for simulating and forecasting the rainfall-runoff process within the next two months. Journal of Hydrology. 2017;548:588–97. 10.1016/j.jhydrol.2017.03.032 [DOI] [Google Scholar]
2.Huo Z, Feng S, Kang S, Huang G, Wang F, Guo P. Integrated neural networks for monthly river flow estimation in arid inland basin of Northwest China. Journal of Hydrology. 2012;420–421:159–70. 10.1016/j.jhydrol.2011.11.054 [DOI] [Google Scholar]
3.Misra D, Oommen T, Agarwal A, Mishra SK, Thompson AM. Application and analysis of support vector machine based simulation for runoff and sediment yield. Biosystems Engineering. 2009;103(4):527–35. 10.1016/j.biosystemseng.2009.04.017 [DOI] [Google Scholar]
4.Kalteh AM. Monthly river flow forecasting using artificial neural network and support vector regression models coupled with wavelet transform. Computers & Geosciences. 2013;54:1–8. 10.1016/j.cageo.2012.11.015 [DOI] [Google Scholar]
5.Yuan L, Zhou Q. Complexity of Soil Erosion and Sediment Yield System in a Watershed. Journal of Chongqing Institute of Technology (Natural Science). 2008;22(4):112–6. [Google Scholar]
6.Yuan L, Sinshaw T, Forshay KJ. Review of Watershed-Scale Water Quality and Nonpoint Source Pollution Models. Geosciences. 2020;10(1):25. 10.3390/geosciences10010025 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Devia GK, Ganasri BP, Dwarakish GS. A Review on Hydrological Models. Aquatic Procedia. 2015;4:1001–7. 10.1016/j.aqpro.2015.02.126 [DOI] [Google Scholar]
8.Londhe SN, Gavraskar S. Stream Flow Forecasting using Least Square Support Vector Regression. Journal of Soft Computing in Civil Engineering. 2018;2(2):56–88. [Google Scholar]
9.Arnold JG, Moriasi DN, Gassman PW, Abbaspour KC, White MJ, Srinivasan R, et al. SWAT:Model Use, Calibration and Validation. Transactions of the ASABE. 2012;55(4):1491–508. [Google Scholar]
10.Gassman PW, Balmer C, Siemers M, Srinivasan R, editors. The SWAT Literature Database: Overview of database structure and key SWAT literature trends. SWAT 2014 Conference Pernambuco, Brazil: Conference Proceedings http://swattamuedu/conferences/2014/ Accessed on [2018-06-27]; 2014.
11.EPA. A Review of Watershed and Water Quality Tools for Nutrient Fate and Transport. Review. Ada, Oklahoma, USA: Center for Environmental Solutions & Emergency Response | Groundwater Characterization & Remediation Division, Office of Research and Development (EPA), 2019 December. Report No.: EPA/600/R-19/232.
12.Zhang X, Srinivasan R, Van Liew M. Approximating SWAT Model Using Artificial Neural Network and Support Vector Machine. JAWRA Journal of the American Water Resources Association. 2009;45(2):460–74. 10.1111/j.1752-1688.2009.00302.x [DOI] [Google Scholar]
13.Abbaspour KC. SWAT-CUP: SWAT Calibration and Uncertainty Programs—A User Manual. Zurich, Switzerland: 2015.
14.Hrachowitz M, Savenije HHG, Blöschl G, McDonnell JJ, Sivapalan M, Pomeroy JW, et al. A decade of Predictions in Ungauged Basins (PUB)—a review. Hydrological Sciences Journal. 2013;58(6):1198–255. 10.1080/02626667.2013.803183 [DOI] [Google Scholar]
15.Kisi O, Cimen M. A wavelet-support vector machine conjunction model for monthly streamflow forecasting. Journal of Hydrology. 2011;399(1–2):132–40. 10.1016/j.jhydrol.2010.12.041 [DOI] [Google Scholar]
16.Nourani V, Alizadeh F, Roushangar K. Evaluation of a Two-Stage SVM and Spatial Statistics Methods for Modeling Monthly River Suspended Sediment Load. Water Resources Management. 2015;30(1):393–407. 10.1007/s11269-015-1168-7 [DOI] [Google Scholar]
17.Zhu S, Zhou J, Ye L, Meng C. Streamflow estimation by support vector machine coupled with different methods of time series decomposition in the upper reaches of Yangtze River, China. Environmental Earth Sciences. 2016;75(6). 10.1007/s12665-016-5337-7 [DOI] [Google Scholar]
18.Shabri A, Suhartono. Streamflow forecasting using least-squares support vector machines. Hydrological Sciences Journal. 2012;57(7):1275–93. 10.1080/02626667.2012.714468 [DOI] [Google Scholar]
19.Chiogna G, Marcolini G, Liu W, Perez Ciria T, Tuo Y. Coupling hydrological modeling and support vector regression to model hydropeaking in alpine catchments. Science of the Total Environment. 2018;633:220–9. 10.1016/j.scitotenv.2018.03.162 . [DOI] [PubMed] [Google Scholar]
20.García Nieto PJ, García-Gonzalo E, Alonso Fernández JR, Díaz Muñiz C. Hybrid PSO–SVM-based method for long-term forecasting of turbidity in the Nalón river basin: A case study in Northern Spain. Ecological Engineering. 2014;73:192–200. 10.1016/j.ecoleng.2014.09.042 [DOI] [Google Scholar]
21.Singh KP, Basant N, Gupta S. Support vector machines in water quality management. Analytic Chimica Acta. 2011;703(2):152–62. 10.1016/j.aca.2011.07.027 . [DOI] [PubMed] [Google Scholar]
22.Kundu S, Khare D, Mondal A. Future changes in rainfall, temperature and reference evapotranspiration in the central India by least square support vector machine. Geoscience Frontiers. 2017;8(3):583–96. 10.1016/j.gsf.2016.06.002 [DOI] [Google Scholar]
23.Tripathi S, Srinivas VV, Nanjundiah RS. Downscaling of precipitation for climate change scenarios: A support vector machine approach. Journal of Hydrology. 2006;330(3–4):621–40. 10.1016/j.jhydrol.2006.04.030 [DOI] [Google Scholar]
24.Noori N, Kalin L. Coupling SWAT and ANN models for enhanced daily streamflow prediction. Journal of Hydrology. 2016;533:141–51. 10.1016/j.jhydrol.2015.11.050 [DOI] [Google Scholar]
25.Daggupati P, Pai N, Ale S, Douglas-Mankin KR, Zeckoski RW, Jeong J, et al. A Recommended Calibration and Validation Strategy for Hydrologic and Water Quality Models. Transactions of the ASABE. 2015;58(6):1705–19. 10.13031/trans.58.10712 [DOI] [Google Scholar]
26.KlemeŠ V. Operational testing of hydrological simulation models. Hydrological Sciences Journal. 1986;31(1):13–24. 10.1080/02626668609491024 [DOI] [Google Scholar]
27.Tobler W. A computer movie simulating urban growth in the Detroit region. Economic Geography. 1970;46((Supplement)):234–40. [Google Scholar]
28.Jajarmizadeh M, Kakaei Lafdani E, Harun S, Ahmadi A. Application of SVM and SWAT models for monthly streamflow prediction, a case study in South of Iran. KSCE Journal of Civil Engineering. 2014;19(1):345–57. 10.1007/s12205-014-0060-y [DOI] [Google Scholar]
29.Mittelstet AR, Storm DE, White MJ. Using SWAT to enhance watershed-based plans to meet numeric water quality standards. Sustainability of Water Quality and Ecology. 2016;7:5–21. 10.1016/j.swaqe.2016.01.002 [DOI] [Google Scholar]
30.Johnson HL, Duchon CE. Atlas of Oklahoma climate: University of Oklahoma Press. Norman,Oklahoma: 1995. p. 104p. [Google Scholar]
31.Olsen RL, Chappell RW, Loftis JC. Water quality sample collection, data treatment and results presentation for principal components analysis—literature review and Illinois River Watershed case study. Water Research. 2012;46(9):3110–22. 10.1016/j.watres.2012.03.028 . [DOI] [PubMed] [Google Scholar]
32.Shepherd SL, Dixon JC, Davis RK, Feinstein R. The effect of land use on channel geometry and sediment distribution in gravel mantled bedrock streams, Illinois River watershed, Arkansas. River Res Appl. 2010:n/a-n/a. 10.1002/rra.1401 [DOI] [Google Scholar]
33.David MM, Haggard BE. Development of Regression-Based Models to Predict Fecal Bacteria Numbers at Select Sites within the Illinois River Watershed, Arkansas and Oklahoma, USA. Water, Air, & Soil Pollution. 2010;215(1–4):525–47. 10.1007/s11270-010-0497-7 [DOI] [Google Scholar]
34.Scott JT, Haggard BE, Sharpley AN, Romeis JJ. Change point analysis of phosphorus trends in the Illinois River (Oklahoma) demonstrates the effects of watershed management. Journal of Environmental Quality. 2011;40(4):1249–56. 10.2134/jeq2010.0476 . [DOI] [PubMed] [Google Scholar]
35.Gassman PW, Reyes MR, Green CH, Arnold JG. The Soil and Water Assessment Tool: Historical Development, Applications, and Future Research Directions. Transactions of the ASABE. 2007;50(4):1211–50. [Google Scholar]
36.Yuan L, Forshay KJ. Using SWAT to Evaluate Streamflow and Lake Sediment Loading in the Xinjiang River Basin with Limited Data. Water. 2019;12(1):39. 10.3390/w12010039 [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Yan B, Fang NF, Zhang PC, Shi ZH. Impacts of land use change on watershed streamflow and sediment yield: An assessment using hydrologic modelling and partial least squares regression. Journal of Hydrology. 2013;484:26–37. 10.1016/j.jhydrol.2013.01.008 [DOI] [Google Scholar]
38.Neitsch SL, Arnold JG, Kiniry JR, Williams JR. Soil and water assessment tool theoretical documentation version 2009. Texas Water Resources Institute, 2011.
39.Yuan L, Zhang Z, Liu X, Jiang Z. Rainfall time series data consistency test and analysis of Poyang Lake basin in the past 49 years. Journal of Anhui Agricultural Sciences. 2013;41:732–5. [Google Scholar]
40.Soil Conservation Service. National Engineering Handbook. Washington D.C.: USDA; 1972.
41.Penman HL. Natural evaporation from open water, bare soil and grass. Proceedings of the Royal Society of London Series A Mathematical and Physical Sciences. 1948;193(1032):120–45. [DOI] [PubMed] [Google Scholar]
42.Muleta MK. Improving Model Performance Using Season-Based Evaluation. Journal of Hydrologic Engineering. 2011;17(1):191–200. 10.1061/(ASCE)HE.1943-5584.0000421 [DOI] [Google Scholar]
43.Zhang D, Chen X, Yao H, Lin B. Improved calibration scheme of SWAT by separating wet and dry seasons. Ecological Modelling. 2015;301:54–61. 10.1016/j.ecolmodel.2015.01.018 [DOI] [Google Scholar]
44.Gao X, Chen X, Biggs T, Yao H. Separating Wet and Dry Years to Improve Calibration of SWAT in Barrett Watershed, Southern California. Water. 2018;10(274):1–13. 10.3390/w1003027430079254 [DOI] [Google Scholar]
45.Vapnik V. The nature of statistical learning theory. New York, USA: Springer Verlag; 1995. [Google Scholar]
46.Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. New York, USA: Springer Science & Business Media; 2009. [Google Scholar]
47.Chang C-C, Lin C-J. LIBSVM: A library for support vector machines. ACM transactions on intelligent systems and technology (TIST). 2011;2(3):1–27. [Google Scholar]
48.Smola AJ, Schölkopf B. A tutorial on support vector regression. Statistics and computing. 2004;14(3):199–222. [Google Scholar]
49.Yuan L, Li W, Zhang Q, Zou L. Debris Flow Hazard Assessment Based on Support Vector Machine. IEEE International Symposium on Geoscience and Remote Sensing; Denver, CO, USA2006. p. 4221–4.
50.Keerthi SS, Lin C-J. Asymptotic behaviors of support vector machines with Gaussian kernel. Neural computation. 2003;15(7):1667–89. 10.1162/089976603321891855 [DOI] [PubMed] [Google Scholar]
51.Hsu C-W, Chang C-C, Lin C-J. A practical guide to support vector classification. Taipei, Taiwan: National Taiwan University, 2003.
52.Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F. e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. 2019.
53.Yang J, Reichert P, Abbaspour KC, Xia J, Yang H. Comparing uncertainty analysis techniques for a SWAT application to the Chaohe Basin in China. Journal of Hydrology. 2008;358(1–2):1–23. 10.1016/j.jhydrol.2008.05.012 [DOI] [Google Scholar]
54.Moriasi DN, Arnold JG, Van Liew MW, Bingner RL, Harmel RD, Veith TL. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Transactions of the ASABE. 2007;50(3):885–900. [Google Scholar]
55.Legates DR, McCabe GJ. Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resources Research. 1999;35(1):233–41. 10.1029/1998wr900018 [DOI] [Google Scholar]
56.ASABE. Guidelines for Calibrating, Validating, and Evaluating Hydrologic and Water Quality (H/WQ) Models. American Society of Agricultural and Biological Engineers (ASABE). 2017;621:1–15.
57.Zambrano-Bigiarini M. hydroGOF: Goodness-of-fit functions for comparison of simulated and observed hydrological time series. 2017.
58.Smakhtin V. Low Flow Hydrology: A Review. Journal of Hydrology. 2001;240:147–86. 10.1016/S0022-1694(00)00340-1 [DOI] [Google Scholar]
59.Narr CF, Singh H, Mayer P, Keeley A, Faulkner B, Beak D, et al. Quantifying the effects of surface conveyance of treated wastewater effluent on groundwater, surface water, and nutrient dynamics in a large river floodplain. Ecological Engineering. 2019;129:123–33. 10.1016/j.ecoleng.2018.12.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Singh HV, Faulkner BR, Keeley AA, Freudenthal J, Forshay KJ. Floodplain restoration increases hyporheic flow in the Yakima River Watershed, Washington. Ecological engineering. 2018;116:110–20. 10.1016/j.ecoleng.2018.02.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Pushpalatha R, Perrin C, Moine NL, Andréassian V. A review of efficiency criteria suitable for evaluating low-flow simulations. Journal of Hydrology. 2012;420–421:171–82. 10.1016/j.jhydrol.2011.11.055 [DOI] [Google Scholar]
62.Cleveland WS, Grosse E, Shyu WM. Local regression models. Statistical models in S: Routledge; 2017. p. 309–76. [Google Scholar]
63.Patil SD, Stieglitz M. Comparing spatial and temporal transferability of hydrological model parameters. Journal of Hydrology. 2015;525:409–17. 10.1016/j.jhydrol.2015.04.003 [DOI] [Google Scholar]
64.McDonnell JJ, Sivapalan M, Vaché K, Dunn S, Grant G, Haggerty R, et al. Moving beyond heterogeneity and process complexity: A new vision for watershed hydrology. Water Resources Research. 2007;43(7). 10.1029/2006wr005467 [DOI] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0248489.r001

Decision Letter 0

Mou Leong Tan

25 Nov 2020

PONE-D-20-30823

Enhanced Streamflow Prediction with SWAT Using Support Vector Regression for Spatial Calibration: A Case Study in the Illinois River Watershed, U.S.

PLOS ONE

Dear Dr. Forshay,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jan 09 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Mou Leong Tan

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

3. We note that Figures 1, 2 and 5 in your submission contain [ap/satellite images which may be copyrighted.

All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (a) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (b) remove the figures from your submission:

a. You may seek permission from the original copyright holder of Figures 1, 2 and 5 to publish the content specifically under the CC BY 4.0 license.

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

4. Please include your tables as part of your main manuscript and remove the individual files. Please note that supplementary tables should remain as separate "supporting information" files.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

Reviewer #3: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This study proposed a method to improve monthly streamflow prediction performance by coupling a seasonal Support Vector Regression (SVR) model with the Soil and Water Assessment Tool (SWAT) model, and applied it in the Illinois River watershed (IRW), U.S. Overall, this paper presents an interesting approach for improving streamflow predictions. However, I think the following issues should be addressed before the paper is considered for publication.

1) I do not understand why the authors chose the approach to calibrate and validate the SWAT-VAR model by leaving out one station. This means that the authors need to develop 13 SWAT-SVR models, whose final parameter values could be rather different (unfortunately, the authors did not discuss this point in the paper). In this case, what should be the SWAT-VAR model for the entire watershed? In my opinion, the traditional approach that includes all stations but divides the study period into the calibration and validation periods works better here.

2) The SWAT model is a continuous simulation model. I could not fathom how the authors could run SWAT-CUP for dry and wet seasons independently. The authors have not provided any SWAT model parameter calibration results in the paper.

3) There are some logical flaws in the authors’ discussions related to Fig. 9. What is presented in Fig 9 is the evaluation statistics solely for the validated watershed. However, each SWAT-SVR model was developed using the data of the other 12 watersheds of various sizes. Performance at the single validated watershed is not sufficient to judge the model’s overall performance, let alone, to determine the application scope of the SWAT-SVR model. This judgement should be based on the model performance at all 13 watersheds. This is why I suggest the authors drop the “leaving-out-one-watershed” approach for calibration and validation.

4) The authors did not give any reason of including watershed area, but no other variable, in the SWAT-SVR model. Is it sufficient to include this single variable besides SWAT streamflow results in the model?

Reviewer #2: 1) The parameters considered in SWAT calibration and SWAT-SVR Calibration are not discussed. Is both of the calibration parameters chosen are the same for both model?

2) It is mentioned in the paper that SUFI 2 is being use for SWAT calibration, however for SWAT-SVR Calibration, how is it being conducted?

3) Author's use 5 statistical approaches to identify the model accuracy, however based on Table Table 3, only 3 statistical approaches rating has been shown, it will be better to include another 2 statistics.

4) Is is a bit unclear on how the SWAT-SVR being programmed, is it via Mathlab? The author may want to elaborate more on the system.

5) Figure 10 shows some high peak rainfall are unable to capture via both model, elaboration on this will hep future researcher to consider the factors.

6) Overall the paper is a good paper with some good analysis and explanation and may hep future researcher to conduct research on hydrological model.

Reviewer #3: The present work “Enhanced Streamflow Prediction with SWAT Using Support Vector Regression for Spatial Calibration: A Case Study in the Illinois River Watershed, U.S.” is interesting and original. Its main point of interest and originality is the development of a hybrid SWAT and Support Vector Regression (SVR) model based on 13 hydrologic gauging stations in Illinois River, US

However, there are some points that need clarification or re-consideration by the authors.

Introductions:

1. On page 4, line 68-69, the authors argue that several studies in the past have evaluated the performance of SWAT and SVM models in streamflow prediction separately, and the authors stated that few studies have coupled the two models. But the authors did not include those few studies and the drawbacks or gaps. Thus, the reviewer suggest to mention the past studies that focused on coupling of SWAT and SVM, and the novel idea of the current study.

Methodology:

2. On page 6, line 124 to 125, it was mentioned that multiple land use/soil/slope method was applied to define the HRUs in SWAT model with land use (10%), soil (10%) 125 and slope (5%) threshold. Is there any justification why these threshold values were selected?

3. In this manuscript, it seems that SWAT-CUP calibration approach was used and the modelled streamflow results are validated against measured ones. However, the authors did not mention the hydrologic parameters that control streamflow. When the authors discuss about the model performances, they compared SWAT-CUP against SWAT-SVR. But, it is difficult for the reader to understand easily how the model outcomes came especially for SWAT-CUP (example page 12, line 251 – 252). Moreover, the calibration and validation periods are not stated clearly,

4. One of the most important feature of SWAT-CUP is its capability to determine the uncertainty level of SWAT model prediction using sequential uncertainty fitting 2 (SUFI-2) algorithm. However, the current study used SWAT-CUP- (SUFI-2) as a tool of calibration and validation method, the level of model uncertainties was missed or not explained sufficiently why it was not included.

Result and discussions:

4. In the manuscript, it was mentioned that the predicted monthly streamflow by SWAT-SVR was more accurate during wet season that the dry season. Detail explanation is required why the model performances differ between the wet and dry seasons.

5. Page 12, line 251- 252, it was mentioned that “SWAT-CUP method also underestimated wet season streamflow but remarkably overestimated dry season streamflow. The SWAT-SVR model has approximately similar performances for the wet and dry seasons”. The reviewer believes that more discussions are required based on the feature of the two methods.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Tigabu, Tibebe B.

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Apr 12;16(4):e0248489. doi: 10.1371/journal.pone.0248489.r002

Author response to Decision Letter 0

2 Feb 2021

Response to editor and reviewer comments. Answer indicates the start of our response.

Editor Comments:

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Answer: We appreciate the Academic Editor’s concern on the format of our initial manuscript. We have formatted our manuscript according to above two template files. Please let us know if any aspect of our manuscript does not follow the format requirement of the PloS One journal.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

Answer: Yes, we have uploaded the study data in a stable, public repository as DOI: 10.23719/1520734 . Anyone can freely access our study data at doi.org/10.23719/1520734 upon publication.

The upload data are divided into two groups: spatial data and time-series data. Spatial data are Digital elevation model (DEM) that obtained from Shuttle Radar Topography Mission (SRTM) 1 Arc-Second (about 30 m × 30 m) Global Database and downloaded from USGS website (https://earthexplorer.usgs.gov/, 01-28-2018) (Fig 2a). Land use and land cover (LULC) data was from the 2011 NLCD dataset (https://www.mrlc.gov/, 01-31-2018) (Fig 2b), and spatial resolution is 100 m × 100 m. Soil data came from the SSURGO database (https://websoilsurvey.nrcs.usda.gov/, 02-05-2018) (Fig 2c). Time-series data include climate and discharge data. Climate data obtained from the National Climatic Data Center (NCDC) (https://www.ncdc.noaa.gov/, 02-07-2018) (Fig 2d). Due to missing precipitation and temperature records from NCDC climate data from Jan. 1990 to Dec. 2013, we downloaded alternative Climate Forecast System Reanalysis (CFSR) data from the SWAT official website (https://globalweather.tamu.edu/, 01-31-2018), then filled missing NCDC data using climate data from the closest CFSR grid stations (not shown in Fig 2d). The streamflow data came from thirteen U.S. Geological Survey (USGS) hydrologic stations. These monthly discharge data can be accessed and downloaded by USGS official website (https://dashboard.waterdata.usgs.gov/app/nwd/?region=lower48).

3. We note that Figures 1, 2 and 5 in your submission contain [ap/satellite images which may be copyrighted.

We require you to either (a) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (b) remove the figures from your submission:

a. You may seek permission from the original copyright holder of Figures 1, 2 and 5 to publish the content specifically under the CC BY 4.0 license.

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

Answer: We have deleted the google map in Figure 1 within our manuscript to follow the requirement of publication. We used ESRI ArcGIS software to generate Figure 1, 2, and 8, and applied Microsoft Visio to generate Figure 5. And, EPA possesses the using licensed versions of ESRI ArcGIS and Microsoft Visio.

All data but the google map in the previous Figure 1 used in our research are derived from public data from public domains and produced by the authors. Anyone can download these data from the corresponding public repository listed.

4. Please include your tables as part of your main manuscript and remove the individual files. Please note that supplementary tables should remain as separate "supporting information" files.

Answer: We have added the tables into the main body of our manuscript and removed the table document in the submitting system.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Partly

Reviewer #2: Yes

Reviewer #3: Partly

________________________________________

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: I Don't Know

________________________________________

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

________________________________________

4. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

________________________________________

5. Review Comments to the Author

1) I do not understand why the authors chose the approach to calibrate and validate the SWAT-VAR model by leaving out one station. This means that the authors need to develop 13 SWAT-SVR models, whose final parameter values could be rather different (unfortunately, the authors did not discuss this point in the paper). In this case, what should be the SWAT-SVR model for the entire watershed? In my opinion, the traditional approach that includes all stations but divides the study period into the calibration and validation periods works better here.

Answer: In the revision on line 416 we have cited and described our approach. We appreciate reviewer’s concern on the technical detail of our manuscript. Here we used a spatial proxy approach. Generally, conventional calibration method of SWAT or other watershed models could be roughly divided into two groups: temporal and spatial calibrations. Temporal calibration is typically performed by splitting the available observed data into two datasets according to different periods: one for calibration, and another for validation. However, data may also be split spatially, with all available data at a given measured location assigned to the calibration phase and correspondingly perform the validation at one or more other gauges within the watershed. This method is spatial calibration (or also named spatial proxy basin approach). This spatial calibration approach is useful when users are faced with data-limited, ungauged situations or a study area is a large watershed. We present an approach and method that may be ultimately applied to an ungauged watershed. As Prasad Daggupati et al. (2015) pointed out, a regular temporal calibration method may not work well for a large size watershed due to potential spatial variability within the basin. Our study area is over 4,200 km2, and available data length from 13 USGS hydrologic station are not consistent (07196090 and 07196973 stations have only 42 and 96 data point in this study). Use of a temporal method to divide the datasets for calibration (say 70%) and validation subset (30%) due to a lack of similar numbers of wet, moderate, and dry years occurring in both periods may not be appropriate (Gan et al., 1997) albeit more familiar.

Spatial calibration and validation approaches have been performed in several previous SWAT studies (e.g. Arnold et al.,2001, Van Liew and Garbrecht, 2003, Cao et al., 2006, Daggupati et al., 2015). Leave-one-out is a particular kind of spatial calibration method we used for this manuscript. For example, Navideh Noori et al. (2016) applied the leave-one-out method to develop a hybrid SWAT coupling ANN method.

We did develop 26 SWAT-SVR models (for wet and dry seasons) and corresponding 26 SWAT models calibrated by SWAT-CUP. In 26 SWAT-SVR models, we used SWAT parameters by default (no calibration). For the SVR approach, 3 parameters (C, γ, ε) needed to be determined. The final value range of C for 26 SWAT-SVR models are from 53.0156 to 255.0156, the value of 𝛾 is 0.4, and ɛ is 0.00390625 for the wet season; the value of C falls between 32.0156 and 255.0156, 𝛾 is 1.2, and ɛ is 0.00390625 as well in the dry season. We can see C value have a wide range of change, but 𝛾 and ɛ are similar during the wet and dry seasons, respectively. This is because for any two runs there are always 11 common watersheds in SWAT-SVR. We made Fig. 5 (research flowchart) to help readers better understand our research processes.

Answer: We have added an appendix e.g. Table 1 to help better describe our approach. We agree with reviewer’s opinion that SWAT model is a physically-based, continuous watershed model. However, that does not mean we cannot calibrate SWAT by dividing dry and wet seasons in SWAT-CUP. In fact, SWAT assumes that model parameters are season insensitive and attempt to identify ‘optimal’ values that would describe watershed behavior during dry and wet seasons. We considered and evaluated continuous unseparated calibration, but this assumption would compromise accuracy of model predictions. As we know, hydrologic models often perform poorly in simulation dry years in areas with large inter-annual variability in precipitation since the temporal variations in model parameters which exist in watersheds are not considered. Another reason is related with the objective function such as R2 and NSE, which can better reflect the hydrologic characteristics in wet periods than ones in dry periods. Misgana K. Muleta (2012), Dejian Zhang (2015), and Xin Gao(2018) published their paper which discussed in detail the method improvement of SWAT calibration by separating wet and dry seasons.

Technically, it is straightforward to separate dry and wet season in SWAT-CUP because this software supports the leap-day, month, or year calibration. For example, the calibration file of separated wet and dry seasons in SWAT-CUP could be written as below:

No. Wet season Obs. No. Dry season Obs.

3 FLOW_OUT_3_1995 0.89 1 FLOW_OUT_1_1995 1.04

4 FLOW_OUT_4_1995 0.89 2 FLOW_OUT_2_1995 0.69

5 FLOW_OUT_5_1995 1.20 6 FLOW_OUT_6_1995 1.22

9 FLOW_OUT_9_1995 0.16 7 FLOW_OUT_7_1995 0.85

10 FLOW_OUT_10_1995 0.11 8 FLOW_OUT_8_1995 0.33

11 FLOW_OUT_11_1995 0.10 12 FLOW_OUT_12_1995 0.34

15 FLOW_OUT_3_1996 0.27 13 FLOW_OUT_1_1996 0.57

16 FLOW_OUT_4_1996 0.44 14 FLOW_OUT_2_1996 0.34

17 FLOW_OUT_5_1996 0.41 18 FLOW_OUT_6_1996 0.44

21 FLOW_OUT_9_1996 0.69 19 FLOW_OUT_7_1996 0.11

22 FLOW_OUT_10_1996 0.37 20 FLOW_OUT_8_1996 0.03

23 FLOW_OUT_11_1996 1.92 24 FLOW_OUT_12_1996 1.07

… … … … … …

Note that these observations must be ranked by time series order in SWAT-CUP to run successfully.

In addition, we added the SWAT parameters initial range at SWAT-CUP calibration as below:

Table 1. The initial parameters and their range in calibration.

No Parameter Name1 Parameter Description Range Season

If used in the wet season If used in the dry season

1 R__CN2.mgt SCS runoff curve number II -0.25-0.25 Yes Yes

2 V__ALPHA_BF.gw Baseflow alpha factor (1 day−1) 0–1 Yes Yes

3 V__GWQMN.gw Threshold depth of water in the shallow aquifer required for return flow to occur (mm H2O) 0–2000 Yes Yes

4 V__GW_REVAP.gw Groundwater “revap” coefficient 0.02–0.2 Yes Yes

5 V__EPCO.hru Plant uptake compensation factor 0–1 Yes Yes

6 R__SOL_K (1).sol Saturated hydraulic conductivity at the 1st soil layer (mm h−1) 30-102 Yes Yes

7 R__SOL_AWC (1).sol Available water capacity of the 1st soil layer (mm H2O mm soil−1) 0.08-0.2 Yes No

8 R__SOL_BD (1).sol Moist bulk density at the 1st soil layer (g cm−3) 1.3-1.45 Yes No

9 A__OV_N.hru Manning’s “n” value for overland flow 0.01–30 Yes No

10 A__CH_K2.rte Effective hydraulic conductivity in main channel alluvium (mm h−1) −0.01–500 Yes Yes

11 R__HRU_SLP.hru Average slope steepness (m m−1) 0–1 Yes Yes

12 V_RCHRG_DP.gw Deep aquifer percolation fraction 0-1 Yes Yes

13 A_CH_K1 Effective hydraulic conductivity in tributary channel alluvium 0-300 Yes No

14 V_ESCO.hru Soil evaporation compensation factor 0-1 No Yes

1 Note: “A__”, “V__” and “R__” mean an absolute increase, a replacement, and a relative change to the initial parameter values, respectively.

Answer: We developed 26 SWAT-SVR models for different seasons to represent the corresponding subwatersheds and verify the effectiveness of our hybrid method, not only for one or two models. We are thankful for the constructive comments from the reviewer and agree with the reviewer’s opinion that the single validated watershed is not enough to judge the model’s overall performance. This study attempts to improve the monthly streamflow at a large watershed by coupling SWAT and the SVR method. Spatial heterogeneity and temporal variability are intrinsically connected to watershed characteristics and in this system, we believe that segregating wet and dry season helps better describe the intrinsic dynamics of watershed hydrology, at least in this system. In this study, we developed 26 models for different subwatersheds. We don’t believe that a single continuous everywhere model is appropriate for such large basin given the seasonal variation and the likelihood that there are important changes to drivers of hydrology in the wet and dry seasons.

Answer: See lines 240-245. Our goal was to develop a model with few variable inputs .We appreciate reviewer’s concern on the model parameter selection of SWAT-SVR. We worked to develop an innovative method (a hybrid model) to improve the prediction of monthly streamflow in a large watershed. Since SVR is effectively a black-box model, we don’t consider all the physical parameters mechanistically that are interacting in potentially non-linear or chaotic ways inside the model, although SWAT does much of this, we wish to avoid more model variables or more complex mechanistic interactions that may not behave in a predictable way so that our SVR approach will better improve the model overall simulation. At first, we hoped to only use the streamflow from SWAT with default parameters as the sole input for SVR, but performance greatly improved when we added the variable -upstream drainage area into the simulations. We believe the upstream drainage area is the important parameter for this empirical mode because it plays a critical role at the early period of the hydrologic model, we describe the parameters on lines 233-239. Other variables such as terrain, soil, land use and land cover, weather, and others are included by SWAT. Here, we regarded the SWAT model essentially as a transfer function then hybridize it to improve prediction.

Reviewer #2: 1) The parameters considered in SWAT calibration and SWAT-SVR Calibration are not discussed. Is both of the calibration parameters chosen are the same for both model?

Answer: We have added a table 1 in the appendix to better describe the model parameters. We agreed with reviewer’s opinion that the selection, sensitivity analysis and determination of SWAT parameters are very important for building a SWAT model. In our study, SWAT was applied with default values of parameters to simulate the streamflow at first, then we took the streamflow output from SWAT (no calibration) and upstream drainage area as input variables into the SVR model. For SVR, only three parameters need to be determined, and they are C, 𝛾, and ε which were determine by grid search and cross-validation. For SWAT-CUP, after parameter sensitivity analysis, we selected 13 parameters for wet season and 10 parameters for dry season, and applied the leave-one-out method and conducted spatial calibration for 13 stations. We attached initial parameters value range of SWAT-CUP before calibration for the wet and dry season. The ultimate parameter value or range of calibration for 13 stations are different one another because of the application of the spatial calibration method, so we don’t show all validation results from 13 stations in the manuscript. We focused on the model performance comparison of SWAT-SVR and SWAT-CUP.