Liquidity effects on oil volatility forecasting: From fintech perspective

Shusheng Ding; Tianxiang Cui; Yongmin Zhang; Jiawei Li

doi:10.1371/journal.pone.0260289

. 2021 Nov 29;16(11):e0260289. doi: 10.1371/journal.pone.0260289

Liquidity effects on oil volatility forecasting: From fintech perspective

Shusheng Ding ¹, Tianxiang Cui ^2,^*, Yongmin Zhang ¹, Jiawei Li ²

Editor: Cathy W S Chen³

PMCID: PMC8629280 PMID: 34843538

Abstract

Fin-tech is an emerging field, inspiring revolutionary innovations in the financial field. It may initiate the evolutionary episode of the financial research, where volatility forecasting is a crucial topic in finance. For forecasting volatility, GARCH model is a prevailing model, however, further improvement of the GARCH model is still challenging. In this paper, we demonstrate how Fintech can play a part in volatility forecasting by employing a metaheuristic procedure called Genetic Programming. On the basis, we are able to develop a new volatility forecasting model, which can beat GARCH family models (including GARCH, IGARCH and TGARCH models) in a significant way. Since genetic programming is an evolutionary algorithm based on the principles of natural selection, this innovative work will be a breakthrough point in the financial area. The innovation of this paper demonstrates how GP technology can be applied in the financial field, attempting to explore the volatility forecasting area from the combination of new technology and finance, known as fintech. More importantly, when the formula of volatility forecasting is unknown as we introduce a new factor, namely, the liquidity factor, we unveil that how GP method can be helpful in determining the specific volatility forecasting model format. We thereby exhibit the liquidity effects on volatility forecasting filed from the fintech perspective.

1 Introduction

1.1 Motivations and aims

Fin-tech is an emerging field, which drives revolutionary innovations in the financial spectrum recently. It integrates advanced technology into financial area, provoking the evolutionary episode of the financial industry and research (Buchak et al. 2018 [1]; Chen et al. 2019 [2]). Fin-tech has been playing particularly important roles in two areas of financial innovations: derivative trading and high frequency trading, which demand sophistical mathematical models, computing-intensive algorithms and big data handling techniques in order to capture trading signal in microseconds. In the option markets, volatility is the dominant driving factor for option price moves. In high frequency trading, volatility forecasting is essential to manage portfolio risk. Therefore, volatility forecasting would be a key issue in the financial field and volatility forecasting models have extensive applications, such as market timing, portfolio selection, as well as financial risk management (Lien and Wilson 2001 [3]).

GARCH models are prevalent in forecasting volatilities of financial markets (see Andersen et al. 2005 [4]; Tian and Hamori [5]) and oil market volatility has been intensively examined and forecasted by adopting GARCH models (see Wei et al. 2010 [6]; Efimova and Serletis 2014 [7]; Wang and Liu 2016 [8]; Xing and Wang 2019 [9]). However, existing GARCH type models are unable to capture either market micro structures, such as liquidity information or the high frequency trading signals because the volatility estimation precision from GARCH models depends on the accuracy of mean estimation models, which are in low frequency basis. Incorporating liquidity into GARCH models is necessary in order to estimate volatility more accurately, especially from high frequency trading.

In fact, the relationship between liquidity and volatility has been scrutinized in the existing literature. Fleming and Remolona (1999) [10] have investigated the relation between liquidity, volatility and public information in the US Treasury market. They declare that volatility and liquidity respond to the public information simultaneously. Feng, Hung and Wang (2014)’s [11] empirical results also support the liquidity impact on the volatility. More recently, Collin-Dufresne and Fos (2016) [12] explore the relations between liquidity and noise trading volatility. They affirm the significant impact of liquidity on noise trading volatility.

Volatility forecasting by using Artificial Intelligence (AI) and Genetic Programming (GP) can been witnessed in the literatures (see Yin et al. 2016; [13]; Ding et al. 2019 [14]; Weng et al. 2021 [15]; Mademlis and Dritsakis 2021 [16]). The goal of the paper is to adopt AI technologies to generate the best model which can comprise trading liquidity effect into volatility forecasting, and can be integrated into the existing fintech systems such as high frequency trading platforms and derivative trading platforms in the hedge fund industry.

1.2 Research contributions

The first contribution is that we adopt GP to identify the model format for forecasting oil volatility with liquidity. It is arguable that liquidity has a considerable effect on market volatility but how to integrate liquidity factor into the volatility forecasting model has no unanimous solution. Our GP system can identify the specific format of volatility forecasting with liquidity by two steps: (i) model formation search and (ii) model accuracy evolution.

In the first stage, we identify the potential form of volatility forecasting model. The GP is used to search the potential forms of the volatility forecasting model with integration of liquidity variable. GP holds the elegant characteristics that one can build the relevant performance criterion directly into the search procedure.

In the second stage, we apply Darwin’s theory of evolution to model accuracy evolution. To be specific, we adopt crossover and mutation operations during our volatility forecasting model selection and development. The main advantage of crossover and mutation algorithms is that they can help models to evolve according to the historical data in order to minimize forecasting errors. As a result, the generated forecasting model can provide far better forecasting results after massive generations of model evolution.

Based on this process, our GP system can identify the most relevant terms that have effects on predicting oil volatility. The bid ask spread term has been included in the volatility forecasting model by GP system, which captures the liquidity effect as it provides the trading information of the financial market (Deuskar et al. 2011 [17]). Since the volatility forecasting model format with liquidity is unknown, our contribution relies on the specific format identification of volatility forecasting model with liquidity, which makes accurate forecasting and liquidity sensitivity analysis applicable for the oil market.

Furthermore, our work is related heavily to existing literature where GP has been adopted in various financial areas. For example, Pimenta et al. (2018) [18] use GP to develop automated investing method in the stock market. More recently, Michell and Kristjanpoller (2020) [19] employ GP to develop trading rules in the US stock market. Ding et al. (2020) [20] apply GP to forecast future stock returns in different stock markets. Our paper extends the GP application to the oil market volatility forecasting, which is a crucial commodity market.

Forecasting oil price volatility is not only useful to commodity traders for speculating and hedging (Ma et al. 2019 [21]), but also important to economists for forecasting macroeconomic variables, such as inflation and industrial productions (Yin 2016 [22]; Chen et al. 2019 [23]). After the model was generated by our algorithm, we compare our model with GARCH models regarding the forecasting ability. The empirical results show that the in-sample performance of our model is more accurate than traditional GARCH models for both full-sample and most subsample predictions. Additionally, our model’s out-of-sample volatility prediction is also more precise compared with traditional GARCH models for both full-sample and subsample tests. The accurate volatility prediction of our GARCH can be envisioned as a considerable improvement of the existing GARCH models and our results demonstrate that multivariable GARCH model can still deliver accurate volatility predictions.

Finally, our research also produces significant empirical contributions. We shed new insights on the concept of “Fin-tech”. The term “artificial intelligence” shall be a pivotal ingredient in the fintech concept, which refers when a machine mimics cognitive functions that humans associate with other human minds, such as “learning” and “problem solving” (Russell and Norvig 2009 [24]). However, the prevailing recognition of fintech focuses on the application of technology into the financial service industry, such as online banking and mobile payment (Mackenzie 2015 [25]). This paper gives a further annotation, suggesting that fintech also involves how AI, such as genetic programming, can be integrated with financial modelling and this paper itself is a magnificent demonstration of finance with AI, which is a radical innovation in finance. This integration can provide deep insights of financial data analysis since all models are generated and evaluated purely based on financial data. Furthermore, our algorithm is also capable of dealing with big data as well as other forecasting problems, such as return forecasting. Consequently, this paper can serve as an impetus of developing other fintechs in hedge fund management including derivatives trading algorithms and high-frequency trading.

The remainder of the paper is structured as follows. In section 2, we describe the data and variables used in the paper. In section 3, we derive our GARCH model with the GP method. Section 4 presents the empirical results and the model performance in data fitting and volatility forecasting. Section 5 delivers the conclusions and implications of the paper.

2 Data and variable estimation

2.1 The data

The GARCH(1,1) model has been widely used in the financial literatures for volatility forecasting (see Bollerslev and Wright 2001 [26]; Hansen and Lunde 2005 [27]), especially in the oil market (see Klein and Walther 2016 [28]; Lux et al. 2016 [29]). As a consequence, we also use our model to forecast the oil return volatility compared with the traditional GARCH model and GARCH family models (see Fig 1 for oil daily returns). We will use the univariate GARCH-class models, including GARCH, IGARCH and the GARCH model with asymmetric effects, namely, GJR-GARCH, as benchmark models for oil volatility predictions in our paper. The liquidity effect would be measured by the bid-ask spread (BAS), which has been demonstrated to be positively correlated with price volatility in financial markets (see Bollerslev and Melvin 1994 [30]; Haugom et al. 2014 [31]). We have not included the weekend effect in this paper, which is popular in the stock markets. Studies on the weekend and daily effect in oil futures market are scant since it has been documented that the weekend effect may not be obvious in oil futures market (Yu and Shin 2011 [32]). It might be that oil futures contract has time-to-maturity, which may have a stronger effect than the weekend effect (Geman and Kharoubi, 2008 [33]).

We obtain the WTI oil futures trading data from Thomson Datastream with daily frequency, from January 1, 2001 to December 31 2019, yielding a total of 4,752 trading days with 4,752 observations. Most literatures have adopt such database and the trading data is continuous, which would be favored for daily volatility forecasting. We use the period from January 1, 2001 to December 31, 2010 as the in-sample period and the period from January 1, 2011 to December 31, 2019 as the out-of-sample period. By “out-of-sample”, we imply two things: Firstly, we use the previous year’s data to estimate coefficients of the model, which forecasts this year’s volatility, namely, we adopt one-year rolling window for the test.

For in-sample fitting, the estimation year and prediction year are the same. More importantly, we only use the period from January 1, 2001 to December 31, 2010 as the sample period for genetic programming to generate our model. We derive the theoretical model from genetic programming and we estimate all the model coefficients by regression. Then, we apply the generated volatility forecasting model to the period of January 1, 2011 to December 31, 2019 to testify the “out-of-sample” model performance. For the out-of-sample forecasting, we use previous year’s data to forecast the volatility of this year. For example, we use the data of 2013 to estimate the model parameters and then use those parameters to forecast volatility of year 2014. Similarly, we use the data of 2014 to estimate the model parameters and then use those parameters to forecast volatility of year 2015. We thereby roll over the sample by one year.

Furthermore, we use two whole periods as the full-sample volatility forecasting test and we use one year within the whole period as the subsample forecasting tests. In addition, for both in-sample and out-of-sample tests, we use 1-day ahead prediction during the data period and the statistical test for error differences will be also employed.

2.2 Variable estimation

Because the specific computational technique we have adopted, it may not allow us to use two identical time series data even with different time lags. As a result, we adopt two different time series for modeling the variance. The first sequence is the daily simplified realized variance series. Since we adopt a relatively low frequency data, we use the low frequency daily simplified realized variance rolling average in one month to approximate the daily simplified realized variance(see Schwert 1989 [34]), which is denoted as SRV_t and can be defined as:

\begin{matrix} S R V_{t} = \frac{1}{T} \sum_{i = 1}^{T} {(r_{t - i})}^{2} * 10, 000, \end{matrix}

(1)

since the mean of daily realized return is near 0 with the realized return r_t is defined as:

r_{t} = ln (\frac{P_{t}}{P_{t - 1}}) * 100

(Andersen and Bollerslev 1998 [35]), where P_t is the last day’s settlement price and we enlarged the SRV measurement by 10,000 times and the return by 100 times to avoid extreme small values.

The second sequence is the realized daily variance, which is denoted as RV_t and we use the standard deviation of the oil futures daily return (Christensen and Prabhala 1998 [36]), which can be defined as:

\begin{matrix} R V_{t} = \frac{1}{T - 1} \sum_{i = 1}^{T} (r_{t - i} - \bar{r})^{2} * 10, 000, \end{matrix}

(2)

where

\bar{r} = \frac{1}{T - 1} \sum_{i = 1}^{T} r_{t - i} .

We enlarged the measurement by 10,000 times to avoid extreme small values.

For both realized daily variance and simplified realized daily variance, we use the one-month rolling average approach and we use the updated real market data for each day prediction without iteration. We then define the simplified realized variance as the target series to be forecasted and we use other three series to fit the data, namely, Roll spread, ε_t and realized variance where ε_t is the residual taken from GARCH(1,1) model. As the BAS is positively related with the volatility, we use the Roll measure to approximate the daily BAS measurement. The Roll spread is an effective spread estimator for the bid-ask spread calculation and suits the low frequency data (Marshall et al. 2012 [37]; Haugom et al. 2014 [31]) and the Roll measure thereby could be severed as the market liquidity proxy. Specifically, Roll (1984) [38] assumes that the stocks have fundamental values, denoted as V_t at time t and V_t follows:

\begin{matrix} V_{t} = V_{t - 1} + η_{t} \end{matrix}

(3)

η_t is the residual term for this fundamental value random process, which differentiates from the term ε_t, which is the residual taken from GARCH volatility model, and η_t is the mean-zero, serially uncorrelated public information shock on day t.

Next, he denotes S_t as the last observed trade price on day t and presumes that S_t follows:

S_{t} = V_{t} + \frac{1}{2} E Q_{t},

where E is the effective spread and Q_t is a buy/sell indicator for the last trade that equals +1 for a buy and -1 for a sell. He further assumes that Q_t is equally likely to be +1 or -1 and Q_t is also serially uncorrelated, and independent of η_t.

Then he takes the first difference of equation and plugs in the result from equation, which yields

\begin{matrix} Δ S_{t} = \frac{1}{2} E Δ Q_{t} + η_{t} \end{matrix}

(4)

As a result, $cov (Δ S_{t}, Δ S_{t - 1}) = - \frac{1}{4} E^{2}$ or equivalently,

s p r e a d = 2 \sqrt{- cov (Δ S_{t}, Δ S_{t - 1})} .

Because when the auto-covariance is positive, the formula is undefined. We therefore use a modified version of the Roll estimator (Goyenko et al. 2009 [39]):

\begin{matrix} s p r e a d = {\begin{matrix} 2 \sqrt{- cov (Δ S_{t}, Δ S_{t - 1})}, cov (Δ S_{t}, Δ S_{t - 1}) \leq 0 \\ 0, cov (Δ S_{t}, Δ S_{t - 1}) > 0 \end{matrix} . \end{matrix}

(5)

In order to estimate the spread, we first take the difference of daily price, then we take the covariance of the price difference of t and t-1. Then, we take the square root of the negative covariance if it is greater than 0 and we take the value of 0 otherwise.

To sum up, the simplified realized variance, the realized variance, the BAS and the residual term are the four main variables concerned by this study and the residual term ε_t contains the noise information that has not been captured by the model. Table 1 presents the statistical summary of the main variables used in the paper.

Table 1. Statistical summary of variables used for the 18 year data.

Variable	Obs	Mean	Std. Dev.	Min	Max
BAS	4,752	0.6518	0.8023	0	8.111
Return	4,752	0.01731	2.312	-16.54	16.40
Simplified realized variance	4,752	535.11	632.18	29.85	6037.48
Realized variance	4,752	21083.9	9649.8	5104.7	79452.2
Price	4,752	63.5942	25.8163	17.45	145.29

Open in a new tab

3 Model development under genetic programming system

3.1 Preliminaries

In this section, we will develop our GARCH model based on the estimated variables in section 2. For the specific model development, we will adopt the Genetic Programming (GP) method from computer science, which is proposed by Koza (1994) [40]. GP is an evolutionary computation (EC) technique inspired by biological process Banzhaf et al. 1998 [41]; Hirsh et al. 2000 [42]; Poli et al. 2008 [43]). Since the form of volatility forecasting model with liquidity effects is uncertain, it would be beneficial to adopt GP method. One big advantage of adopting GP is that it can allow one to be agnostic about the general form of the model. In GP, a population of computer programs is evolved based on the principles of natural selection originated from Darwin’s theory of evolution. After certain number of generations, GP can transform populations of programs into new and better programs. As stated in Poli et al. (2008) [43], GP has been very successful at evolving novel and unexpected ways of solving problems.

Our GP system proceeds in the following way. The system will create a number of functions as a population randomly. It firstly generates a random population of functions, and then the evaluations of every function will be performed by the system, where the forecasting accuracy will be compared with the targeted function. We define the performance of the function as the fitness of the function.

Afterwards, genetic operations automatically choose one or two function(s) according the fitness evaluation. The genetic operations include two formats, namely, crossover and mutation. The crossover operation recombines two subitems from picked functions to produce new functions. The mutation operation will modify the subitems from picked functions before recombine them together to produce new functions. Then, based on crossover and mutation operations, the GP system will automatically reevaluate the newly-generated functions according to the function fitness. The probability of performing crossover and mutation operations will be pre-determined in the system. This main perquisites are the principles of evolution and the system will be discontinued after some pre-determined conditions are complied such as pre-determined threshold for function fitness. The system automatically chooses the best function as the solution according to the function fitness, which creates the new volatility forecasting model.

3.2 Genetic programming system

For our model development, we employ the following targeted functions as our forecasting task based on our GP system regarding the data sample period from January 1, 2001 to December 31, 2010:

\begin{matrix} f (L^{2}, σ^{2}, ε^{2}) = r^{2} \end{matrix}

(6)

where L², σ², ε² are squared BAS, realized variance and squared residuals respectively, and r² denotes the simplified realized variance, known as the squared return term, which is used to approximate the volatility. Our goal is to find the most relevant terms that have effects on predicting the simplified realized variance. For the robust purpose, we also adopt our model for realized variance forecasting, which has been defined in Section 2. The empirical forecasting results for both simplified realized variance and realized variance will be presented in Section 4.

Our GP system consists of the following parts:

Terminal Set: L², σ², ε².
Function Set: +, −, ×.
Fitness measure: the error between the value of the individual function and the corresponding desired output (i.e. r²).
GP parameters: population = 10000, the maximum length of the program = 1000, probability of crossover operation = 0.8 and probability of mutation operation = 0.1.
Ending conditions: when the measure of function fitness approximates 0 or the system runs up to 100 generations, the system will automatically discontinue. (For this work, the fitness measure will never reach 0, therefore the system will terminate after 100 generations.)

As the number of generation increases, the average error between the value obtained by forecasting functions and the target value may decrease. In this case, GP may produce some high-order terms in order to further reduce the error. However, these high-order terms may cause the overfitting problem. Therefore, we further eliminate the forecasting functions with more than 10 terms obtained by our GP system in order to prevent the overfitting. The parameters settings used above and the choice of 10 terms are based on empirical experiences and we do not claim that these are the optimal choices. Our main purpose is to test the effectiveness of GP approach on volatility forecasting task.

3.3 Model development

With the settings stated above, we ran our GP system for 50 times in order to acquire the specific model with no ε² term. Since the ε² data series is in the program, the model solved by the program will contain the information incorporated in the term ε². Therefore, the model we devise with no ε² term can lead to the comprehensive decomposition of the ε² term for the model. After simplification, the best function with no ε² term we obtained is the following:

\begin{matrix} σ_{t}^{2} = α_{0} + α_{1} σ_{t - 1}^{2} + α_{2} K_{t - 1} \end{matrix}

(7)

where $K_{t} = σ_{t}^{2} (1 - L_{t}^{2}) * (L_{t}^{2} - σ_{t}^{2} - L_{t}^{4})$ and L presents the BAS. We name the model as LGARCH (1,1), which is a liquidity-adjusted GARCH model.

Furthermore, the GARCH (1,1) model has the following the form:

\begin{matrix} \begin{matrix} σ_{t}^{2} = α_{0} + α_{1} σ_{t - 1}^{2} + α_{2} ε_{t - 1}^{2}, \\ r_{t} = ϕ + ε_{t}, \end{matrix} \end{matrix}

(8)

where σ_t is the volatility of target time series and ε_t is the residual term from the return prediction equation, which is: ϕ is the conditional mean, and $ε_{t} \sim N (0, σ_{t}^{2})$ .

Since the conditional variance may depend on the past squared residuals of the process, namely, ε_t, under the GARCH model, the ε_t term plays a vital role in data fitting and conditional variance prediction. The general decomposition of ε_t has been provided in the literatures, which is ε_t = σ_t*z_t (see Lamoureux and Lastrapes 1990 [44]; Nelson 1990 [45]) and z_t is the noise process. Therefore, $ε_{t}^{2} = σ_{t}^{2} * z_{t}^{2}$ .

Thus, based on this decomposition, our liquidity-adjusted GARCH model perfectly matches the decomposition of the original GARCH model. Like the GARCH-X model, the LGARCH(1,1) model uses K_t to replace $ε_{t}^{2}$ but it further elaborates the z_t term embedded in ε_t by adopting the BAS, which is denoted as L. It can be observed from Figs 2–4 that the BAS has the similar spike clusters with both simplified realized variance and realized variance for the periods late 2007 and early 2016. Therefore, our model implies that $K_{t - 1} = ε_{t - 1}^{2}$ . Similarly, concerning the GARCH-X model, we have the general form:

\begin{matrix} σ_{t}^{2} = ω + α y_{t - 1}^{2} + β σ_{t - 1}^{2} + γ x_{t}, \end{matrix}

(9)

where x_t is the exogenous variable, such as interest rate and y_t denotes the residual term, and $y_{t}^{2} = σ_{t}^{2} * z_{t}^{2}$ , where σ_t is the volatility of target time series and z_t is the noise term, which is defined as z_t ∼ IID(0, 1).

So, $y_{t - 1}^{2} = K_{t - 1}$ for the GARCH-X case. Furthermore, we have $y_{t}^{2} = σ_{t}^{2} * z_{t}^{2}$ and z_t is the noise term, which is defined as z_t ∼ IID(0, 1). $K_{t} = σ_{t}^{2} (1 - L_{t}^{2}) * (L_{t}^{2} - σ_{t}^{2} - L_{t}^{4})$ , so, it can be further deduced as $z_{t}^{2} = (1 - L_{t}^{2}) * (L_{t}^{2} - σ_{t}^{2} - L_{t}^{4})$ , which we denote as the liquidity adjustment (LA) on volatility. Compared with both general GARCH model as well as GARCH-X model, we have a further decomposition of the $ε_{t}^{2}$ term.

This liquidity adjustment uncovers the fundamental elements encompassed in z_t and it has two states:

When $L_{t}^{2} > 1$ , it indicates that the BAS in the market is relatively large and the liquidity of the market is insufficient. As a result, $(1 - L_{t}^{2}) < 0$ and $(L_{t}^{2} - 2 σ_{t}^{2} - L_{t}^{4}) < 0$ , which generate the positive product. Since the BAS and volatility have a positive correlation, the large BAS may indicate higher volatility in the future. Therefore, the adjustment increases the volatility term by adding a positive value to the process to reflect the potential increase of volatility in the near future.
When $L_{t}^{2} < 1$ , it indicates that the BAS in the market is relatively small and the market is liquid.

(i) When $L_{t}^{2} > σ_{t}^{2} + L_{t}^{4}$ , the product is still positive, in which case, the spread is still not small enough. Therefore, the adjustment still has positive value to unveil the potential increase of the future volatility.
(ii) When $L_{t}^{2} < σ_{t}^{2} + L_{t}^{4}$ , the product is negative, in which case, the BAS is trivial and trading frictions in the market are negligible. Therefore, the model may foresee possible volatility decrease in the near future by adding a negative value to the previous volatility.

In order to testify our model’s data fitness and volatility prediction ability, we adopt two tests for the model with both in-sample and out-of-sample tests. The first test is to predict the simplified realized variance and the second test is to predict the realized variance. We are able to show that the detailed decomposition of z_t in the GARCH model substantially heightens the forecasting accuracy of the GARCH model.

4 Empirical results

This section gives both empirical results for regression models and model performance of volatility forecasting. In particular, we compare our data fitting results as well as prediction results with three GARCH models, namely, GARCH, IGARCH and GJR-GARCH (denoted as TGARCH in the tables).

For the model performance evaluation, we use Mean Squared Error (MSE) to measure the model performance for both in-sample fitting and out-of-sample forecasting tests since our daily data can be quite noisy (Pong et al. 2004 [46]; Bollerslev et al. 2016 [47]). More importantly, MSE is quite suitable for time-series forecasting accuracy measurement (Ismail et al. 2011 [48]). The periodic averaged MSE can be defined as:

M S E_{T} = \frac{1}{T} \sum_{t = 1}^{T} {(O b s e r v e d_{t} - P r e d i c t e d_{t})}^{2},

where T represents the number of observations embedded in the forecasting period, Observed_t presents the observed variance from the market and Predicted_t presents the variance predicted from the models.

Lower MSE indicates higher forecasting accuracy as well as more powerful prediction of the model.

4.1 Empirical models

Before the model performance evaluation, we firstly run a series of regressions to certify the significant impact of BAS on both simplified realized variance and realized variance. The specification of the regression model is shown in the following equations:

\begin{matrix} S R V_{t} = β_{0} + β_{1} S R V_{t - 1} + β_{2} ε_{t - 1} + β_{3} L_{t - 1} + β_{4} R V_{t - 1} + δ_{t}, \end{matrix}

(10)

\begin{matrix} R V_{t} = β_{0} + β_{1} R V_{t - 1} + β_{2} ε_{t - 1} + β_{3} L_{t - 1} + β_{4} S R V_{t - 1} + δ_{t} . \end{matrix}

(11)

where SRV_t is the simplified realized variance, RV_t is the realized variance, L_t is the bid-ask spread. We run a series of multivariable regression analysis and the results are shown in Table 2. It is observable that the impact of BAS on volatility is oftentimes significant after controlling different variables. Therefore, the empirical delivers strong support of BAS’s impact on volatility and we thereby incorporate it into the GARCH model for volatility forecasting.

Table 2. Regression results summary for simplified realized variance and realized variance.

	SRV _t	SRV _t	SRV _t	RV _t	RV _t	RV _t
SRV _t−1	0.175***	0.12***		0.14***	0.082***
SRV _t−1	(0.0142)	(0.033)		(0.073)	(0.015)
$L_{t - 1}^{2}$	10.01***	6.27***	14.53**	10.66***	11.12***	20.08***
$L_{t - 1}^{2}$	(7.71)	(7.05)	(7.25)	(3.57)	(3.39)	(4.57)
$ε_{t - 1}^{2}$		-0.01	0.032**		0.094***	0.002**
$ε_{t - 1}^{2}$		(0.0288)	(0.015)		(0.013)	(0.0009)
RV _t−1			0.77***			1.12***
RV _t−1			(0.028)			(0.019)

Open in a new tab

4.2 In-sample data fitting

For the in-sample modeling, we compare three models with our model in fitting both simplified realized variance and realized variance. Table 3 shows the in-sample fitting MSE against the simplified realized variance for all four models during the data period 2001-2010. In general, our model outperforms other three models and the improvement rate is around 25% compared with other three GARCH models for the full sample test against the simplified realized variance. Moreover, the improvement rate is around 20% on average for the subsample test against the simplified realized variance. However, in the year 2005, the simplified realized variance is quite stationary (see Fig 2), our model thereby may not be able to outperform the TGARCH model since the liquidity effects on volatility might be trivial. In all other cases, our model exhibits the superior characteristics in the in-sample model fitting analysis. Additionally, Table 4 presents the in-sample fitting MSE against the realized variance for all four models during the data period 2001-2010. For the autoregressive fitting, the results of our model are dominating since our model overwhelmingly surpasses all other three models with nearly 75% improvement rate for the full sample fitting test against the realized variance. Concerning the average improvement rate of subsample tests, it is around 70% against the realized variance. This result illustrates that liquidity plays a significant role in the autoregressive model fitting for the realized variance.

Table 3. In sample fitting with simplified realized variance.

This table presents the oil futures volatility in-sample fitting results of Liquidity-adjusted GARCH (LGARCH) with other three GARCH models against the simplified realized variance using the Mean Squared Error (MSE). The p-values for statistical differences of the forecasting errors are also presented. The LGARCH model outweighs all other three models in both full sample and most subsample tests except year 2005 compared with TGARCH. Where en = *10ⁿ, e.g. e − 02 = *10⁻².

		GARCH	LGARCH	Improve	IGARCH	LGARCH	Improve	TGARCH	LGARCH	Improve
		(MSE)	(MSE)	Rate (p-value)	(MSE)	(MSE)	Rate (p-value)	(MSE)	(MSE)	Rate (p-value)
Full sample		2.54	1.89	25.5% (0.00)	2.56	1.91	25.3% (0.00)	2.52	1.88	25.4% (0.00)
Subsamples:	Year
	2001	4.78	3.25	32.1% (0.00)	4.77	3.25	32.0% (0.00)	4.78	3.25	32.1% (0.00)
	2002	5.23e-01	4.41e-01	15.6% (0.00)	5.23e-01	4.41e-01	15.6% (0.00)	5.25e-01	4.41e-01	15.9% (0.00)
	2003	1.72	1.07	37.7% (0.00)	1.72	1.07	37.7% (0.00)	1.72	1.07	37.7% (0.00)
	2004	6.2e-01	5.69e-01	8.16% (0.00)	6.19e-01	5.69e-01	8.14% (0.00)	6.2e-01	5.69e-01	8.16% (0.00)
	2005	3.97e-01	3.91e-01	1.6% (0.00)	3.98e-01	3.91e-01	1.7% (0.00)	3.87e-01	3.91e-01	-0.1% (0.00)
	2006	2.26e-01	1.84e-01	18.8% (0.00)	2.23e-01	1.84e-01	17.7% (0.00)	2.26e-01	1.84e-01	18.6% (0.00)
	2007	3.93e-01	3.21e-01	18.2% (0.00)	3.92e-01	3.21e-01	17.9% (0.00)	3.86e-01	3.21e-01	16.7% (0.00)
	2008	9.24	8.12	11.9% (0.21)	9.15	8.12	11.1% (0.30)	9.28	8.12	12.3% (0.29)
	2009	7.06	5.41	23.3% (0.00)	7.09	5.41	23.6% (0.00)	7.03	5.41	22.9% (0.00)
	2010	2.22e-01	1.55e-01	29.8% (0.00)	2.19e-01	1.55e-01	28.9% (0.00)	2.16e-01	1.55e-01	27.8% (0.00)
Average of Subsamples		2.52	1.90	19.74%	2.51	1.90	19.48%	2.50	1.90	19.16%

Open in a new tab

Table 4. In sample fitting with realized variance.

This table presents the oil futures volatility in-sample fitting results of Liquidity-adjusted GARCH (LGARCH) with other three GARCH models against the realized variance using the Mean Squared Error (MSE). The p-values for statistical differences of the forecasting errors are also presented. The LGARCH model outweighs all other three models in both full sample and all subsample tests. Where en = *10ⁿ, e.g. e − 02 = *10⁻².

		GARCH	LGARCH	Improve	IGARCH	LGARCH	Improve	TGARCH	LGARCH	Improve
		(MSE)	(MSE)	Rate (p-value)	(MSE)	(MSE)	Rate (p-value)	(MSE)	(MSE)	Rate (p-value)
Full sample		5.74e-01	1.44e-01	74.9% (0.00)	5.61e-01	1.44e-01	74.3% (0.00)	5.65e-01	1.44e-01	74.6% (0.00)
Subsamples:	Year
	2001	3.42	2.47e-01	92.8% (0.00)	3.44	2.48e-01	92.8% (0.00)	3.31	2.48e-01	92.5% (0.00)
	2002	2.63e-01	8.41e-02	96.8% (0.00)	2.65e-01	8.41e-02	96.8% (0.00)	2.68e-01	8.41e-02	96.9% (0.00)
	2003	2.93	1.27e-01	95.6% (0.00)	2.95	1.27e-01	95.6% (0.00)	2.68	1.27e-01	95.5% (0.00)
	2004	2.72	1.5e-01	94.4% (0.00)	2.7	1.5e-01	94.4% (0.00)	2.71	1.5e-01	94.5% (0.00)
	2005	3.22	2.27e-01	92.9% (0.00)	3.15	2.27e-01	92.8% (0.00)	3.04	2.27e-01	92.5% (0.00)
	2006	2.91	6.44e-01	77.8% (0.00)	2.84	6.44e-01	77.3% (0.00)	2.84	6.44e-01	77.5% (0.00)
	2007	3.22e-01	1.11e-01	65.7% (0.00)	3.16e-01	1.11e-01	65.1% (0.00)	3.12e-01	1.11e-01	64.7% (0.00)
	2008	2.12	1.89	10.5% (0.01)	2.01	1.89	5.7% (0.01)	2.07	1.89	8.4% (0.01)
	2009	1.23	2.61e-01	23.3% (0.01)	1.24	2.6e-01	23.6% (0.01)	1.26	2.6e-01	22.9% (0.01)
	2010	2.67	6.42e-01	75.8% (0.00)	2.63	6.42e-01	75.5% (0.00)	2.61	6.42e-01	75.2% (0.00)
Average of Subsamples		5.74e-01	2.48e-01	78.17%	5.6e-01	2.48e-01	77.52%	5.65e-01	2.48e-01	77.72%

Open in a new tab

4.3 Out-of-sample forecasting

For the out-of-sample forecasting, we also compare three models with our model in forecasting both simplified realized variance and realized variance with the prediction errors measured by MSE.

For out-of-sample forecasting, we use the normalized MSE loss function proposed by Chen and Watanabe (2019) [49]:

\begin{matrix} M S E_{m} = \frac{1}{m} \sum_{t = n + 1}^{n + m} {((O b s e r v e d_{t} - P r e d i c t e d_{t}) / 2)}^{2}, \end{matrix}

(12)

where m is the out-of sample size.

Table 5 shows out-of-sample forecasting MSE against the simplified realized variance for all four models during the data period 2011-2019. In general, our model outperforms other three models and the improvement rate is around 22% compared with other three GARCH models for the full sample test against the simplified realized variance. For the average improvement rate of subsample, it is around 17% compared with the three GARCH models. Additionally, Table 6 presents the out-of-sample forecasting MSE against the realized variance for all four models during the data period 2011-2019. The results are also dominating for both full sample and subsample tests. Our model overwhelmingly surpasses all three GARCH models with the full sample improvement rate around 73% and the average improvement rate of subsample around 70%. Therefore, it is arguable that our model produces a substantially accurate forecasting results compared with all three GARCH models.

Table 5. Out of sample forecasting with simplified realized variance.

This table presents the oil futures volatility out-of-sample forecasting results of Liquidity-adjusted GARCH (LGARCH) with other three GARCH models against the simplified realized variance using the Mean Squared Error (MSE). The p-values for statistical differences of the forecasting errors are also presented. The LGARCH model outweighs all other three models in both full sample and all subsample tests. Where en = *10ⁿ, e.g. e − 02 = *10⁻².

		GARCH	LGARCH	Improve	IGARCH	LGARCH	Improve	TGARCH	LGARCH	Improve
		(MSE)	(MSE)	Rate (p-value)	(MSE)	(MSE)	Rate (p-value)	(MSE)	(MSE)	Rate (p-value)
Full sample		1.86e-01	1.44e-01	22.6% (0.00)	1.87e-01	1.44e-01	22.9% (0.00)	1.85e-01	1.44e-01	22.1% (0.00)
Subsamples:	Year
	2011	4.7e-01	4.03e-01	14.2% (0.00)	4.82e-01	4.03e-01	16.3% (0.00)	4.51e-01	4.03e-01	10.6% (0.00)
	2012	5.17e-02	1.83e-02	64.6% (0.00)	4.14e-02	1.83e-02	55.7% (0.00)	5.02e-02	1.83e-02	63.5% (0.00)
	2013	2.21e-02	2.05e-02	7.24% (0.00)	2.18e-02	2.05e-02	5.96% (0.00)	2.16e-02	2.05e-02	5.09% (0.00)
	2014	3.07e-01	2.55e-01	16.9% (0.01)	2.88e-01	2.55e-01	11.4% (0.05)	2.74e-01	2.55e-01	6.90% (0.06)
	2015	1.26e-01	9.88e-02	21.5% (0.00)	1.14e-01	9.88e-02	13.3% (0.02)	1.05e-01	9.88e-02	5.91% (0.07)
	2016	1.84	1.39	24.4% (0.02)	1.72	1.39	19.1% (0.03)	1.66	1.39	16.7% (0.02)
	2017	1.49e-01	1.03e-01	30.8% (0.00)	1.42e-01	1.03e-01	27.4%(0.00)	1.39e-01	1.03e-01	25.9% (0.00)
	2018	6.21e-01	5.14e-01	17.2% (0.00)	6.22e-01	5.14e-01	17.6%(0.00)	5.87e-01	5.14e-01	12.5% (0.00)
	2019	3.55e-01	2.54e-01	27.3% (0.03)	3.34e-01	2.58e-01	22.7%(0.04)	3.47e-01	2.58e-01	25.6% (0.05)
Average of Subsamples		4.36e-01	3.38e-01	24.95%	4.16e-01	3.38e-01	21.11%	4.03e-01	3.38e-01	19.22%

Open in a new tab

Table 6. Out of sample forecasting with realized variance.

This table presents the oil futures volatility out-of-sample forecasting results of Liquidity-adjusted GARCH (LGARCH) with other three GARCH models against the realized variance using the Mean Squared Error (MSE). The p-values for statistical differences of the forecasting errors are also presented. The LGARCH model outweighs all other three models in both full sample and most subsample tests. Where en = *10ⁿ, e.g. e − 02 = *10⁻².

		GARCH	LGARCH	Improve	IGARCH	LGARCH	Improve	TGARCH	LGARCH	Improve
		(MSE)	(MSE)	Rate (p-value)	(MSE)	(MSE)	Rate (p-value)	(MSE)	(MSE)	Rate (p-value)
Full sample		3.41	9.33e-01	72.6% (0.00)	3.42	9.33e-01	73.1% (0.00)	3.56	9.33e-01	73.9% (0.00)
Subsamples:	Year
	2011	4.73	2.51	44.5% (0.03)	4.51	2.51	44.5% (0.04)	3.92	2.51	36.2% (0.09)
	2012	1.55e-04	9.10e-05	49.1% (0.02)	1.79e-04	9.10e-05	49.6% (0.02)	1.49e-04	9.10e-05	39.1% (0.01)
	2013	1.76e-04	1.43e-05	93.1% (0.00)	2.05e-04	1.43e-05	93.1% (0.00)	1.99e-04	1.43e-05	92.9% (0.00)
	2014	2.74	1.57	42.6% (0.00)	2.86	1.57	45.1% (0.00)	2.71	1.57	41.8% (0.03)
	2015	3.12e-03	4.42e-04	85.8% (0.00)	3.05e-03	4.42e-04	85.4%(0.00)	3.04e-03	4.42e-04	85.4% (0.00)
	2016	2.05e-03	2.51e-04	87.7% (0.00)	2.25e-03	2.51e-04	88.8%(0.00)	1.55e-03	2.51e-04	83.8% (0.00)
	2017	1.87	3.25e-01	82.5% (0.00)	1.76	3.25e-01	81.8%(0.00)	1.82	3.25e-01	82.4% (0.00)
	2018	3.05e-03	5.24e-04	82.8% (0.00)	3.20e-03	5.24e-04	83.6%(0.00)	2.97e-03	5.24e-04	82.3% (0.00)
	2019	4.82e-04	6.40e-05	86.7% (0.00)	4.77e-04	6.40e-05	86.5%(0.00)	4.54e-04	6.40e-05	85.9% (0.00)
Average of Subsamples		1.13	5.04e-01	72.0%	1.12	5.04e-01	73.1%	1.03	5.04e-01	69.9%

Open in a new tab

Nevertheless, for the out-of-sample forecasting, the results may not be as stable as the in sample fitting. For the data of 2013 and 2014, the oil price move trend has been changed. In 2013, the oil price had an increasing trend with the annual return of 6.90%. On the other hand, however, the oil price plummeted in 2014, with the annual return of -45.55%, which yielded the lowest price of $53.45 since 2010. As a result, the return volatility movement may not follow similar pattern, which may result in the significant difference of volatility forecasting.

4.4 Robustness check

To ensure our results are robust, we also adopt the Mincer-Zarnowitz regression to verify the relative performance of our volatility forecasting model compared with other GARCH models. Following Mincer and Zarnowitz (1969) [50], we run following two regressions based on our volatility forecasting results:

\begin{matrix} S R V_{t} = β_{0} + β_{1} {\hat{v}}_{m o d e l 1, t} + β_{2} {\hat{v}}_{m o d e l 2, t} + ϵ_{t}, \end{matrix}

(13)

\begin{matrix} R V_{t} = β_{0} + β_{1} {\hat{v}}_{m o d e l 1, t} + β_{2} {\hat{v}}_{m o d e l 2, t} + ϵ_{t}, \end{matrix}

(14)

where SRV_t and RV_t are the simplified realized variance and the realized variance observed at time t respectively, ${\hat{v}}_{m o d e l 1, t}$ and ${\hat{v}}_{m o d e l 2, t}$ are the forecasted variance from model 1 and model 2 at time t respectively.

For evaluating the model performance, we first run the single variable regression on the value predicted by LGARCH model as the model 1 in Eqs (12) and (13). Then, we run two-variable regression on both LGARCH as model 1 and one of three models, GARCH, IGARCH and TGRACH as model 2 in Eqs (12) and (13) sequentially. Thus, we can evaluate the model performance from two respects:

Firstly, we compare the adj − R² from single variable regression with the adj − R² from two-variable regression to see whether the three GARCH models add significant explanatory power to our model. Then, for the relative performance, we investigate the significance and magnitude of the coefficients for both model 1 and model 2 in Eqs (12) and (13). Large and significant coefficients demonstrate high performance. The model performance from Mincer-Zarnowitz regression is shown in Tables 7–10.

Table 7. Regression results for simplified realized variance (in-sample).

This table reports the Mincer-Zarnowitz regression result for the model performance comparison regarding the in-sample simplified realized variance. SRV0 represents the single variable regression for our model only, SRV-G represents the model 1 is our model and model 2 is GARCH model defined in Eq (12); SRV-IG represents the model 1 is our model and model 2 is IGARCH model defined in Eq (12); SRV-TG represents the model 1 is our model and model 2 is TGARCH model defined in Eq (12). *, **, *** indicate statistical significance at 10%, 5% and 1% levels, respectively.

	SRV0	SRV-G	SRV-IG	SRV-TG
LGARCH	1.03***	1.17***	1.16***	1.19***
LGARCH	(0.046)	(0.057)	(0.057)	(0.058)
GARCH		-0.31***
GARCH		(0.077)
IGARCH			-0.30***
IGARCH			(0.079)
TGARCH				-0.334***
TGARCH				(0.078)
Adj − R²	0.163	0.168	0.167	0.169

Open in a new tab

Table 10. Regression results for realized variance (out-of-sample).

This table reports the Mincer-Zarnowitz regression result for the model performance comparison regarding the out-of-sample realized variance. RV0 represents the single variable regression for our model only, RV-G represents the model 1 is our model and model 2 is GARCH model defined in Eq (13); RV-IG represents the model 1 is our model and model 2 is IGARCH model defined in Eq (13); RV-TG represents the model 1 is our model and model 2 is TGARCH model defined in Eq (13). *, **, *** indicate statistical significance at 10%, 5% and 1% levels, respectively.

	RV0	RV-G	RV-IG	RV-TG
LGARCH	0.005***	0.0039***	0.002***	0.005***
LGARCH	(0.00025)	(0.0003)	(0.0006)	(0.00003)
GARCH		0.001***
GARCH		(0.0003)
IGARCH			0.002***
IGARCH			(0.0005)
TGARCH				0.0003*
TGARCH				(0.00019)
Adj − R²	0.628	0.666	0.667	0.63

Open in a new tab

In particular, Tables 7 and 8 show in-sample model performance for both simplified realized variance and realized variance respectively. In Table 7, it can be seen that the adj − R² is 16.2% for our model and additional explanatory power that the three GARCH models can enhance is quite marginal. More importantly, the coefficient of our model is much larger than the coefficients for other three GARCH models. Similarly, from Table 8, it is observable that the adj − R² is 58.3% for our model and the explanatory power that the three GARCH models can add is quite small and the coefficients for other three GARCH models are all negative. Therefore, our model overwhelmingly outperforms the other three GARCH models regarding the in-sample volatility forecasting.

Table 8. Regression results for realized variance (in-sample).

This table reports the Mincer-Zarnowitz regression result for the model performance comparison regarding the in-sample realized variance. RV0 represents the single variable regression for our model only, RV-G represents the model 1 is our model and model 2 is GARCH model defined in Eq (13); RV-IG represents the model 1 is our model and model 2 is IGARCH model defined in Eq (13); RV-TG represents the model 1 is our model and model 2 is TGARCH model defined in Eq (13). *, **, *** indicate statistical significance at 10%, 5% and 1% levels, respectively.

	RV0	RV-G	RV-IG	RV-TG
LGARCH	0.0079***	0.0073***	0.0073***	0.0073***
LGARCH	(0.00013)	(0.00016)	(0.00016)	(0.00017)
GARCH		0.0013***
GARCH		(0.00022)
IGARCH			0.0014***
IGARCH			(0.00021)
TGARCH				0.0013***
TGARCH				(0.00023)
Adj − R²	0.583	0.589	0.588	0.586

Open in a new tab

On the other hand, Tables 9 and 10 exhibit out-of-sample model performance for both simplified realized variance and the realized variance. Table 9 presents the adj − R² is 9.47% for forecasting the simplified realized variance regarding the single regression of our model. The additional adj − R² the other three GARCH models can add is nearly 2% and the coefficients are much smaller than our model. Similarly, Table 10 represents the adj − R² is 62.8% for forecasting the realized variance regarding the single regression of our model. The additional adj − R² the other three GARCH models can add is nearly 4%. The coefficient for our model is quite close to the IGARCH model, but is still larger than other two GARCH models. As a result, our model in general outperforms the other three GARCH models for the out-of-sample volatility forecasting.

Table 9. Regression results for simplified realized variance (out-of-sample).

This table reports the Mincer-Zarnowitz regression result for the model performance comparison regarding the out-of-sample simplified realized variance. SRV0 represents the single variable regression for our model only, SRV-G represents the model 1 is our model and model 2 is GARCH model defined in Eq (12); SRV-IG represents the model 1 is our model and model 2 is IGARCH model defined in Eq (12); SRV-TG represents the model 1 is our model and model 2 is TGARCH model defined in Eq (12). *, **, *** indicate statistical significance at 10%, 5% and 1% levels, respectively.

	SRV0	SRV-G	SRV-IG	SRV-TG
LGARCH	0.68***	0.53***	0.54***	0.55***
LGARCH	(0.053)	(0.058)	(0.057)	(0.058)
GARCH		0.061***
GARCH		(0.098)
IGARCH			0.059***
IGARCH			(0.095)
TGARCH				0.057***
TGARCH				(0.096)
Adj − R²	0.0947	0.115	0.116	0.114

Open in a new tab

Finally, we employ the QLIKE loss function for the robustness purpose. The QLIKE loss function could be written as:

\begin{matrix} Q L I K E_{j} = \frac{1}{T} \sum_{t = 1}^{T} (l n {\hat{σ}}_{t, j}^{2} + \frac{σ_{t, j}^{2}}{{\hat{σ}}_{t, j}^{2}}), \end{matrix}

(15)

where $σ_{t, j}^{2}$ is the real value of variance and ${\hat{σ}}_{t, j}^{2}$ is the predicted value of variance. From Table 11, we found that the QLIKE loss function results for our GARCH model is always the lowest among the four models. The QLIKE loss function results are robust for in-sample periods.

Table 11. QLIKE loss function results for four models.

This table reports the QLIKE loss function results for the model performance comparison regarding the four models. SRV represents the simplified realized variance, IS represents the in-sample results (from year 2000-2010) and OS represents the out-of-sample results (from year 2011-2019).

	RV-IS	SRV-IS	RV-OS	SRV-OS
LGARCH	-12.96	-7.24	10.87	3.11
GARCH	-10.73	-6.54	19.24	3.86
IGARCH	-10.65	-6.55	19.38	3.89
TGARCH	-10.59	-6.56	18.88	3.88

Open in a new tab

For out-of-sample forecasting, we use the normalized QLIKE loss function proposed by Chen and Watanabe (2019)

\begin{matrix} Q L I K E_{j} = \frac{1}{m} \sum_{t = n + 1}^{n + m} (\frac{σ_{t, j}^{2}}{{\hat{σ}}_{t, j}^{2}} - l n (\frac{σ_{t, j}^{2}}{{\hat{σ}}_{t, j}^{2}}) - 1), \end{matrix}

(16)

where m is the out-of sample size, ${\hat{σ}}_{t, j}^{2}$ is the predicted value and $σ_{t, j}^{2}$ is the actual value.

We compare our model with both ANN-GARCH and SVM-GARCH models, reporting the results in Table 12. The ANN-GARCH model is based on the Artificial Neural Networks, which is the non-parametric method. This method can be applied to uncover the nonlinear associations between the parameters of the GARCH model. Nevertheless, ANN-GARCH is less flexible compared with other models such as neural fuzzy inference system. In addition, the backpropagation algorithm of the ANN disables itself to learn from its own forecasting error. (Kristjanpoller and Minutolo, 2016 [51]).

Table 12. QLIKE loss function results for ANN and SVM models.

This table reports the QLIKE loss function results for the model performance comparison regarding our model with GARCH models with ANN and SVM. SRV represents the simplified realized variance, IS represents the in-sample results (from year 2000-2010) and OS represents the out-of-sample results (from year 2011-2019). SVMl, SVMp and SVMg represent the SVM with linear, polynomial and Gaussian kernel, respectively.

	RV-IS	SRV-IS	RV-OS	SRV-OS
LGARCH	-12.96	-7.24	10.87	3.11
ANN-GARCH	-11.04	-7.02	18.11	3.55
SVMl-GARCH	-10.53	-6.22	19.07	3.78
SVMp-GARCH	-10.85	-6.67	18.89	3.69
SVMg-GARCH	-10.96	-6.84	18.75	3.62

Open in a new tab

For the ANN-GARCH method, each neural network connects a group of volatility forecasting variables with output variables and a number of hidden layers. Neurons are connected between the layers for connections that are activated by triggering a threshold. The input group of variables and output group of variables can be a combination of all of different number of neurons in each layer, where the relations between inputs and outputs are embedded (see Kristjanpoller and Hernandez, 2017 [52]; Bhattacharya and Ahmed, 2018 [53]).

The SVM-GARCH model is based on the support vector machine (SVM) stems from the statistical learning theory (see Cortes and Vapnik, 1995 [54]), For the SVM method, it develops a nonlinear mapping function from input space towards a high-dimensional hidden space. On the basis, SVM estimates a linear regression model in the output space, which corresponds to a nonlinear regression in the low-dimensional input space. Theoretically, SVM could approximate any nonlinear mapping relations between input space and output space.

Nevertheless, since SVM has different key kernels, for instance, linear kernel, polynomial kernel, and Gaussian kernel, the choice of kernels may become tricky. For example, if the residuals of regression model are Gaussian, the results that SVM delivers may not be solid enough, since this kernel is based on Probability Distribution Function, residuals generated by SVM may not follow Gaussian distribution. (see Perez-Cruz et al., 2003 [55]; Chen et al., 2010 [56]).

Based on Table 12, it is thereby arguable that our GARCH outperforms other three GARCH models and we are confident that our results are robust. Further in Table 12, we adopt the QLIKE function to compare our model with the GARCH models with Artificial Neural Network (ANN) and Support Vector Machine (SVM). For SVM-GARCH, we have employed different kernel functions, including linear, polynomial and Gaussian kernels. By comparing our model with those models, we also demonstrate that our results are robust since our model has the lowest QLIKE values.

5 Conclusion and implications

To sum up, fin-tech is a popular financial topic more recently and with the aid from computer science, we are able to improve GARCH family models, which are widely used model in financial time series analysis. This paper proposes the improved GARCH model, which integrates the liquidity factor. Our new GARCH model, named LGARCH, takes the bid-ask spread term into the volatility forecasting, shedding insights on the understanding of the GARCH model as well as the impact of liquidity on the volatility forecasting process. Since how to integrate liquidity into the volatility forecasting model is open to discussion, we use GP method to identify the specific model format for the LGARCH model. This result also indicates that the liquidity factor plays a vital role in oil volatility forecasting since it appears in the final model format. In this vein, fintech facilitates our understanding of oil volatility forecasting as a new factor has been introduced.

Compared with GARCH, IGARCH and TGARCH, our model generally outperforms all three models for predicting both simplified realized variance and realized variance. More importantly, the improvement rate of our model compared with three GARCH models is around 20% for forecasting simplified realized variance and 70% for forecasting realized variance. We also use the Mincer-Zarnowitz regression to demonstrate our results are robust. The strong empirical results and outstanding model performance deliver significant findings on liquidity impact on volatility and yields helpful techniques for volatility forecasting. In addition, the more accurate forecasting model can be integrated into the existing fintech systems such as high frequency trading platforms and derivative trading platforms by providing better volatility estimation. Our sample period has 16 years oil daily data, which covered before and after financial crisis periods. Moreover, our paper has two robustness measures for both in-sample and out-of-sample periods, demonstrating the generality and validity of our results.

Furthermore, our LGARCH model can also yield a couple of policy implications. Firstly, compared with traditional GARCH models, our oil volatility forecasting model can provide more reliable volatility of crude oil market. The reliable volatility prediction can help oil importing economies to determine countries’ oil reserve levels in order to alleviate the negative impact on the economy. More importantly, market volatility might reflect the fragility of financial markets and economy (Celik and Ergin, 2014 [57]). Therefore, reliable forecasts of oil prices’ volatility may play crucial roles for macroeconomic policy makers in setting monetary policies to stabilize the economy. Since our results have shown better volatility prediction of the crude oil market, energy economists, energy policy makers, and financial analysts may include market liquidity effect into the volatility forecasting models and utilize market liquidity as an indicator for future volatility changes.

Finally, Genetic Programming (GP) is the subdiscipline of evolutionary algorithms and it optimizes forecasting functions according to an evolutionary process with the evaluation criterion nested. We need to emphasize that, this paper is just a demonstration of using GP into volatility forecasting for the oil market, which can serve as a starting point of fintech development. The GP system in volatility forecasting can be further applied into other financial markets, such as stock market and foreign exchange market. More importantly, since GP is good at providing novel and unexpected insights, it can also be applied to other financial forecasting problems, such as return forecasting, price forecasting and correlation forecasting. As a result, it could be just a starting point of the technology integration in financial modelling for fintech.

Data Availability

The data used in this study are third party data owned by Thomson Reuters. The authors accessed the data using an account paid for by their university. Others may access this data at the Datastream database of Thomson Reuters [https://www.thomsonone.com/DirectoryServices/2006-04-01/Web.Public/Login.aspx?brandname=datastream]. The data set was extracted from the time series section of the Datastream database for the US crude oil futures market daily data, ranging from January 1, 2001 to December 31 2019. The authors confirm that they did not have special access privileges that others would not have.

Funding Statement

The author(s) received no specific funding for this work.

References

1. Buchak G, Matvos G, Piskorski T, Seru A. Fintech, regulatory arbitrage, and the rise of shadow banks. Journal of Financial Economics. 2018;130(3):453–483. doi: 10.1016/j.jfineco.2018.03.011 [DOI] [Google Scholar]
2. Chen MA, Wu Q, Yang B. How valuable is FinTech innovation? The Review of Financial Studies. 2019;32(5):2062–2106. doi: 10.1093/rfs/hhy130 [DOI] [Google Scholar]
3. Lien D, Wilson BK. Multiperiod hedging in the presence of stochastic volatility. International Review of Financial Analysis. 2001;10(4):395–406. doi: 10.1016/S1057-5219(01)00060-6 [DOI] [Google Scholar]
4. Andersen TG, Bollerslev T, Meddahi N. Correcting the errors: Volatility forecast evaluation using high-frequency data and realized volatilities. Econometrica. 2005;73(1):279–296. doi: 10.1111/j.1468-0262.2005.00572.x [DOI] [Google Scholar]
5. Tian S, Hamori S. Modeling interest rate volatility: A Realized GARCH approach. Journal of Banking & Finance. 2015;61:158–171. doi: 10.1016/j.jbankfin.2015.09.008 [DOI] [Google Scholar]
6. Wei Y, Wang Y, Huang D. Forecasting crude oil market volatility: Further evidence using GARCH-class models. Energy Economics. 2010;32(6):1477–1484. doi: 10.1016/j.eneco.2010.07.009 [DOI] [Google Scholar]
7. Efimova O, Serletis A. Energy markets volatility modelling using {GARCH}. Energy Economics. 2014;43:264–273. 10.1016/j.eneco.2014.02.018 [DOI] [Google Scholar]
8. Wang Y, Liu L. Crude oil and world stock markets: volatility spillovers, dynamic correlations, and hedging. Empirical Economics. 2016;50(4):1481–1509. doi: 10.1007/s00181-015-0983-2 [DOI] [Google Scholar]
9. Xing Y, Wang J. Linkages between global crude oil market volatility and financial market by complexity synchronization. Empirical Economics. 2019; p. 1–17. [Google Scholar]
10. Fleming MJ, Remolona EM. Price formation and liquidity in the US Treasury market: The response to public information. The Journal of Finance. 1999;54(5):1901–1915. doi: 10.1111/0022-1082.00172 [DOI] [Google Scholar]
11. Feng SP, Hung MW, Wang YH. Option pricing with stochastic liquidity risk: Theory and evidence. Journal of Financial Markets. 2014;18:77–95. doi: 10.1016/j.finmar.2013.05.002 [DOI] [Google Scholar]
12. Collin-Dufresne P, Fos V. Insider trading, stochastic liquidity, and equilibrium prices. Econometrica. 2016;84(4):1441–1475. doi: 10.3982/ECTA10789 [DOI] [Google Scholar]
13. Yin Z, O’Sullivan C, Brabazon A. An analysis of the performance of genetic programming for realised volatility forecasting. Journal of Artificial Intelligence and Soft Computing Research. 2016;6(3):155–172. doi: 10.1515/jaiscr-2016-0012 [DOI] [Google Scholar]
14. Ding S, Zhang Y, Duygun M. Modeling Price volatility based on a genetic programming approach. British Journal of Management. 2019;30(2):328–340. doi: 10.1111/1467-8551.12359 [DOI] [Google Scholar]
15. Weng F, Zhang H, Yang C. Volatility forecasting of crude oil futures based on a genetic algorithm regularization online extreme learning machine with a forgetting factor: The role of news during the COVID-19 pandemic. Resources Policy. 2021;73:102148. doi: 10.1016/j.resourpol.2021.102148 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Mademlis DK, Dritsakis N. Volatility Forecasting using Hybrid GARCH Neural Network Models: The Case of the Italian Stock Market. International Journal of Economics and Financial Issues. 2021;11(1):49. doi: 10.32479/ijefi.10842 [DOI] [Google Scholar]
17. Deuskar P, Gupta A, Subrahmanyam MG. Liquidity effect in OTC options markets: Premium or discount? Journal of Financial Markets. 2011;14(1):127–160. doi: 10.1016/j.finmar.2010.08.003 [DOI] [Google Scholar]
18. Pimenta A, Nametala CA, Guimarães FG, Carrano EG. An automated investing method for stock market based on multiobjective genetic programming. Computational Economics. 2018;52(1):125–144. doi: 10.1007/s10614-017-9665-9 [DOI] [Google Scholar]
19. Michell K, Kristjanpoller W. Generating trading rules on US Stock Market using strongly typed genetic programming. Soft Computing. 2020;24(5):3257–3274. doi: 10.1007/s00500-019-04085-1 [DOI] [Google Scholar]
20. Ding S, Cui T, Xiong X, Bai R. Forecasting stock market return with nonlinearity: a genetic programming approach. Journal of Ambient Intelligence and Humanized Computing. 2020; p. 1–13. [Google Scholar]
21. Ma F, Liao Y, Zhang Y, Cao Y. Harnessing jump component for crude oil volatility forecasting in the presence of extreme shocks. Journal of Empirical Finance. 2019;52:40–55. doi: 10.1016/j.jempfin.2019.01.004 [DOI] [Google Scholar]
22. Yin L. Does oil price respond to macroeconomic uncertainty? New evidence. Empirical Economics. 2016;51(3):921–938. doi: 10.1007/s00181-015-1027-7 [DOI] [Google Scholar]
23. Chen Y, Ma F, Zhang Y. Good, bad cojumps and volatility forecasting: New evidence from crude oil and the US stock markets. Energy Economics. 2019;81:52–62. doi: 10.1016/j.eneco.2019.03.020 [DOI] [Google Scholar]
24. Russell S, Norvig P. Artificial Intelligence: A Modern Approach. 3rd ed. Upper Saddle River, NJ, USA: Prentice Hall Press; 2009. [Google Scholar]
25. Mackenzie A. THE FINTECH REVOLUTION. London Business School Review. 2015;26(3):50–53. doi: 10.1111/2057-1615.12059 [DOI] [Google Scholar]
26. Bollerslev T, Wright JH. High-frequency data, frequency domain inference, and volatility forecasting. Review of Economics and Statistics. 2001;83(4):596–602. doi: 10.1162/003465301753237687 [DOI] [Google Scholar]
27. Hansen PR, Lunde A. A forecast comparison of volatility models: does anything beat a GARCH (1, 1)? Journal of Applied Econometrics. 2005;20(7):873–889. doi: 10.1002/jae.800 [DOI] [Google Scholar]
28. Klein T, Walther T. Oil price volatility forecast with mixture memory {GARCH}. Energy Economics. 2016;58:46–58. 10.1016/j.eneco.2016.06.004 [DOI] [Google Scholar]
29. Lux T, Segnon M, Gupta R. Forecasting crude oil price volatility and value-at-risk: Evidence from historical and recent data. Energy Economics. 2016;56:117–133. 10.1016/j.eneco.2016.03.008 [DOI] [Google Scholar]
30. Bollerslev T, Melvin M. Bid-ask spreads and volatility in the foreign exchange market: An empirical analysis. Journal of International Economics. 1994;36(3):355–372. doi: 10.1016/0022-1996(94)90008-6 [DOI] [Google Scholar]
31. Haugom E, Langeland H, Molnár P, Westgaard S. Forecasting volatility of the US oil market. Journal of Banking & Finance. 2014;47:1–14. doi: 10.1016/j.jbankfin.2014.05.026 [DOI] [Google Scholar]
32. Yu HC, Shin TL. Gold, crude oil and the weekend effect: a probability distribution approach. Investment Management and Financial Innovations. 2011;(8, Iss. 2):39–51. [Google Scholar]
33. Geman H, Kharoubi C. WTI crude oil futures in portfolio diversification: The time-to-maturity effect. Journal of Banking & Finance. 2008;32(12):2553–2559. doi: 10.1016/j.jbankfin.2008.04.002 [DOI] [Google Scholar]
34. Schwert GW. Why does stock market volatility change over time? The Journal of Finance. 1989;44(5):1115–1153. doi: 10.1111/j.1540-6261.1989.tb02647.x [DOI] [Google Scholar]
35. Andersen TG, Bollerslev T. Deutsche mark–dollar volatility: intraday activity patterns, macroeconomic announcements, and longer run dependencies. The Journal of Finance. 1998;53(1):219–265. doi: 10.1111/0022-1082.85732 [DOI] [Google Scholar]
36. Christensen BJ, Prabhala NR. The relation between implied and realized volatility. Journal of Financial Economics. 1998;50(2):125–150. doi: 10.1016/S0304-405X(98)00034-8 [DOI] [Google Scholar]
37. Marshall BR, Nguyen NH, Visaltanachoti N. Commodity liquidity measurement and transaction costs. Review of Financial Studies. 2012;25(2):599–638. doi: 10.1093/rfs/hhr075 [DOI] [Google Scholar]
38. Roll R. A simple implicit measure of the effective bid-ask spread in an efficient market. The Journal of Finance. 1984;39(4):1127–1139. doi: 10.1111/j.1540-6261.1984.tb03897.x [DOI] [Google Scholar]
39. Goyenko RY, Holden CW, Trzcinka CA. Do liquidity measures measure liquidity? Journal of Financial Economics. 2009;92(2):153–181. [Google Scholar]
40. Koza JR. Genetic programming as a means for programming computers by natural selection. Statistics and Computing. 1994;4(2):87–112. doi: 10.1007/BF00175355 [DOI] [Google Scholar]
41. Banzhaf W, Francone FD, Keller RE, Nordin P. Genetic programming: an introduction: on the automatic evolution of computer programs and its applications. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; 1998. [Google Scholar]
42. Hirsh H, Banzhaf W, Koza JR, Ryan C, Spector L, Jacob C. Genetic Programming. IEEE Intelligent Systems. 2000;15(3):74–84. doi: 10.1109/5254.846288 [DOI] [Google Scholar]
43. Poli R, Langdon WB, McPhee NF. A Field Guide to Genetic Programming. Lulu Enterprises, UK Ltd; 2008. [Google Scholar]
44. Lamoureux CG, Lastrapes WD. Heteroskedasticity in stock return data: volume versus GARCH effects. The Journal of Finance. 1990;45(1):221–229. doi: 10.1111/j.1540-6261.1990.tb05088.x [DOI] [Google Scholar]
45. Nelson DB. Stationarity and persistence in the GARCH (l, l) model. Econometric Theory. 1990;6(3):318–334. doi: 10.1017/S0266466600005296 [DOI] [Google Scholar]
46. Pong S, Shackleton MB, Taylor SJ, Xu X. Forecasting currency volatility: A comparison of implied volatilities and AR(FI)MA models. Journal of Banking & Finance. 2004;28(10):2541–2563. doi: 10.1016/j.jbankfin.2003.10.015 [DOI] [Google Scholar]
47. Bollerslev T, Patton AJ, Quaedvlieg R. Exploiting the errors: A simple approach for improved volatility forecasting. Journal of Econometrics. 2016;192(1):1–18. doi: 10.1016/j.jeconom.2015.10.007 [DOI] [Google Scholar]
48. Ismail S, Shabri A, Samsudin R. A hybrid model of self-organizing maps (SOM) and least square support vector machine (LSSVM) for time-series forecasting. Expert Systems with Applications. 2011;38(8):10574–10578. doi: 10.1016/j.eswa.2011.02.107 [DOI] [Google Scholar]
49. Chen CW, Watanabe T. Bayesian modeling and forecasting of Value-at-Risk via threshold realized volatility. Applied Stochastic Models in Business and Industry. 2019;35(3):747–765. doi: 10.1002/asmb.2395 [DOI] [Google Scholar]
50.Mincer JA, Zarnowitz V. The evaluation of economic forecasts. In: Economic forecasts and expectations: Analysis of forecasting behavior and performance. NBER; 1969. p. 3–46.
51. Kristjanpoller W, Minutolo MC. Forecasting volatility of oil price using an artificial neural network-GARCH model. Expert Systems with Applications. 2016;65:233–241. doi: 10.1016/j.eswa.2016.08.045 [DOI] [Google Scholar]
52. Kristjanpoller W, Hernández E. Volatility of main metals forecasted by a hybrid ANN-GARCH model with regressors. Expert Systems with Applications. 2017;84:290–300. doi: 10.1016/j.eswa.2017.05.024 [DOI] [Google Scholar]
53. Bhattacharya S, Ahmed A. Forecasting crude oil price volatility in India using a hybrid ANN-GARCH model. International Journal of Business Forecasting and Marketing Intelligence. 2018;4(4):446–457. doi: 10.1504/IJBFMI.2018.095154 [DOI] [Google Scholar]
54. Cortes C, Vapnik V. Support-vector networks. Machine learning. 1995;20(3):273–297. doi: 10.1007/BF00994018 [DOI] [Google Scholar]
55. Pérez-Cruz F, Afonso-Rodriguez JA, Giner J. Estimating GARCH models using support vector machines. Quantitative Finance. 2003;3(3):163. doi: 10.1088/1469-7688/3/3/302 [DOI] [Google Scholar]
56. Chen S, Härdle WK, Jeong K. Forecasting volatility with support vector machine-based GARCH model. Journal of Forecasting. 2010;29(4):406–433. [Google Scholar]
57. Celik S, Ergin H. Volatility forecasting using high frequency data: Evidence from stock markets. Economic Modelling. 2014;36:176–190. doi: 10.1016/j.econmod.2013.09.038 [DOI] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0260289.r001

Decision Letter 0

Cathy W S Chen

12 Jul 2021

PONE-D-21-17939

Liquidity Effects on Oil Volatility Forecasting: From Fintech Perspective

PLOS ONE

Dear Dr. Cui,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

The authors use the period from January 1, 2001 to December 31, 2010 as the in-sample period and the period from January 1, 2011 to December 31, 2016 as the out-of-sample period. However, the data is too old. The authors are recommended to extend the out-of-sample period to 2019 at least and to update the results.

The authors should normalize both loss functions, MSE and QLIKE, to be the robust and homogeneous loss functions proposed by Patton (2011); see Chen and Watanabe (2019) for details.

==============================

Please submit your revised manuscript by Aug 26 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Cathy W.S. Chen, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (if provided):

The authors should normalize both loss functions, MSE and QLIKE, to be the robust and homogeneous loss functions proposed by Patton (2011); see Chen and Watanabe (2019) for details.

Chen, C.W.S. and Watanabe, T. (2019) Bayesian modeling and forecasting of Value-at-Risk via threshold realized volatility, Applied Stochastic Models in Business and Industry, 35, 747-765.

Patton A.J. (2011) Volatility forecast comparison using imperfect volatility proxies. Journal of Econometrics, 160, 246-256.

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This paper proposed a liquidity-adjusted GARCH model, denoted by LGARCH, for volatility forecasting by employing the Genetic Programming (GP) technique. An empirical study is conducted to investigate the fitting and prediction performances of the LGARCH model compared with the GARCH, IGARCH, and TGARCH models. The authors claimed that the LGARCH model overwhelmingly surpasses all the three GARCH models in both in-sample volatility fitting and out-of-sample volatility forecasting.

I agree that incorporating AI techniques is an exciting direction for modeling financial time series. This study belongs to this direction and focuses on including liquidity effects on oil volatility forecasting. However, I have several concerns and do not recommend the manuscript for publication in its current state. Please see the details of my report in the attachment.

Reviewer #2: In order to forecast the oil volatility with liquidity, the authors propose the Genetic programming (GP) method to identify the model format. Based on the GP method, the authors propose the liquidity-adjusted GARCH (LGARCH) model. For comparison, they consider the GARCH, IGARCH, and GJR-GARCH models. The empirical results show that the proposed LGARCH model outperforms in both sample fitting and out of sample forecasting. The robust checking is guaranteed using the loss function QLIKE. The results are interesting but the authors need to revise the paper carefully such as undefined random errors or some functions. The comments and questions are as follows. In the hope that they will be helpful to the authors.

See the attached file for the comments.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PONE-D-21-17939_report.pdf

Click here for additional data file.^{(90.7KB, pdf)}

Attachment

Submitted filename: Plosone_17939.pdf

Click here for additional data file.^{(110.1KB, pdf)}

PLoS One. 2021 Nov 29;16(11):e0260289. doi: 10.1371/journal.pone.0260289.r002

Author response to Decision Letter 0

26 Jul 2021

(Also see attached file)

Comments from Reviewer 1

1. In the GP method, the authors should explain the method in details. What is the criterion for choosing the best model? For example, in line 226, the definition of r2 should be clarified. The selection basis should be clarified. For instance, in equation (6), is there any regular condition for the function f? Is there any penalty function to avoid the overfitting problem?

Response: Thanks, the selection criterion has been further clarified. Normally, the ending conditions for evolutionary computation based method are a desired solution is found (the measure of function fitness approximates 0), or the maximum number of generations is reached. For this work, as the fitness measure will never reach 0, therefore the system will terminate after a certain pre-defined number of generations (100 in this work). The parameters settings used here are based on empirical experiences and we do not claim that these settings are the optimal choices. Our main purpose is to test the effectiveness of GP approach on volatility forecasting task. r2 can be defined as the squared return term, which is used to approximate the volatility. We do not use the regular condition for function f, in order to prevent overfitting, we further eliminate the forecasting functions with more than 10 terms obtained by GP, which is similar to the idea of dropout in neural network training.

2. In empirical analysis, the authors could explain the results in details. For example, in Table 5, the MSE for the out of sample forecasting with simplified realized variance improve around 58% in 2012 but improve only 2% in 2013. On the other hand, the MSE for the out of sample forecasting with realized variance improve improves 98% in 2013 but only improves around 40% in 2014. Is there any significant difference between the above data?

Response: Nevertheless, for the out-of-sample forecasting, the results may not be as stable as the in sample fitting. For the data of 2013 and 2014, the oil price move trend has been changed. In 2013, the oil price had an increasing trend with the annual return of 6.90%. On the other hand, however, the oil price plummeted in 2014, with the annual return of -45.55%, which yielded the lowest price of $53.45 since 2010. As a result, the return volatility movement may not follow similar pattern, which may result in the significant difference of volatility forecasting.

3. In addition, in empirical studies, do the authors consider the rolling method for the estimation? The authors should clarify this.

Response: we have considered the rolling over method. For the out-of-sample forecasting, we use previous year’s data to forecast the volatility of this year. For example, we use the data of 2013 to estimate the model parameters and then use those parameters to forecast volatility of year 2014. Similarly, we use the data of 2014 to estimate the model parameters and then use those parameters to forecast volatility of year 2015. We thereby roll over the sample by one year.

4. In equation (3), (10), and (11), the definition of η t is missing. The authors should clarify this.

Response: ηt has been defined.

5. On page 5, the sentence from line 171 to line 173 “He further assumes that Qt is equally likely to be 171 +1 first difference of equation and plugs in the result from equation” is not clear. The authors should revise this.

Response: Thank you for your comments, we have corrected the mistake.

6. On page 5, is COV identical to cov? The notations have to be consistent.

Response: Thank you for your comments, we have corrected the mistake.

7. On page 7, in line 245, z t is not defined clearly.

Response: zt has been defined

8. In equation (14), the authors propose the loss function QLIKE. However, the definition is not clear. The robustness implies that the term ˆσ t,j σ t,j is close to one but there is no constraint for the first term ln ˆσ t,j . Hence, the authors should clarify why the smaller QLIKE, the better performance.

Response: To further justify the issue with QLIKE, we adopted the normalize loss function for QLIKE (see equation (16)), which is mainly used for out-of-sample test.

9. Some notations must be Italic such as in line 170 “E” and in line 350 and line 351 “t”.

Response: Thank you for your comments, we have corrected the mistake.

Comments from Reviewer 2

1. Since volatility cannot be observed directly, this study employs the SRVt in (1) (or RV t in (2)) as a proxy of volatility. The objective of the GP introduced in Section 3.2 is designed to minimize the error between the SRVt (or RVt) and the associated predicted variance. This design is highly related to the evaluation measure MSE T defined in Section 4, where the Observed t is the SRVt (or RVt) and the Predicted t is the predicted variance. However, for the three considered GARCH models, the model parameters are estimated under different criteria. For example, they aim to minimize the L 2 loss between the observed returns and the associated predicted returns. In other words, the three GARCH models are not designed to produce volatilities close to SRV t (or RVt ) but to capture the heteroscedasticity in the return process. The role of the volatility obtained from the three GARCH models is the conditional variance of r t 1 conditional on the information up to t−1. Therefore, it is not surprising that their volatilities are not close to SRV t (or RVt ).

Due to these facts, I would say that the comparison study in this article is not fair, and the conclusion that “the LGARCH model overwhelmingly surpasses all the three GARCH models” is not proper since the LGARCH and the three GARCH models are designed for different targets. Most importantly, the current comparison is difficult to identify whether the good performance of the LGARCH model is due to the liquidity effects or just because of the different estimation targets of the LGARCH and the three GARCH models.

Response: There are two reasons why the comparison is fair. Firstly, although the GARCH model is designed to capture the heteroscedasticity in the return process, the purpose of capturing the heteroscedasticity is to accurately forecast the volatility. The heteroscedasticity means the volatility of the random variable varies with time. The GARCH model is designed to capture such volatility movement pattern in order to predict its future path. More importantly, our model is based on GARCH model and the format is quite similar with the original GARCH model, which cannot have huge estimation target difference compared with the original GARCH model.

Furthermore, for the purpose of capturing the dynamics of volatility, the family of GARCH models has been widely applied to energy markets such as oil market. Sadorsky (2006) for instance, finds that the threshold GARCH (or GJR, Glosten et al., 1993) fits well for heating oil and natural gas volatilities, whereas the standard GARCH(1,1) model fits well for crude oil and unleaded gasoline volatilities. Kang et al. (2009) show that component GARCH (CGARCH) and fractional integrated GARCH (FIGARCH) provide superior performance. Recently, Ma et al. (2021) prove GARCH family models fit oil volatility quite smoothly. Liang et al. (2021) also demonstrate that GARCH-MIDAS-type models provide solid performance for natural gas futures volatility forecasting.

2. In the literature, many other approaches are designed for estimating proxies of volatility similar to SRV t (or RV t ), but without considering liquidity effects. For example, Chen et al. (2010) proposed a GARCH-SVM approach for volatility fore- casting and adopted a proxy of volatility similar to RV t . Hajizadeh et al. (2012) proposed an ANN-GARCH type model for forecasting the volatility of S&P 500 index return and employed a proxy of volatility similar to RV t . I suggest the authors design a new comparison scenario to investigate whether the LGARCH outperforms these types of approaches and whether including the liquidity effects improves the volatility forecasting.

Response: Thank you for your comments, we have now included the ANN-GARCH and GARCH-SVM model comparison in our robustness check part.

3. Many notations and relationships between random variables are not well defined. Some examples are listed in the following:

(a) Line 151: Pt is not defined.

Response: Pt has been defined.

(b) Line 151 and 169: Do Pt and St represent the same thing?

Response: Not exactly, Pt is the last day's settlement price and St is last observed traded price.

Response: ηt has been defined.

(d) Line 167 and 169: The relationship between η t and Q t is not clear. Are they independent or uncorrelated?

Response: the relationship has been defined.

(e) Line 176: How to estimate the ‘spread’ from data is not clear.

Response: it has been clarified.

(f) Line 178: The ‘residual term’ mentioned here is not clearly defined.

Response: The ‘residual term’ has been clarified.

(g) Line 245: zt is not defined.

Response: zt has been defined

(h) Line 254 and 255: yt−1 is not defined. If yt denotes the day-t return, I suggest to replace it by rt as in (1) and (2) for consistency purposes.

Response: Not exactly, yt denotes the residual term, which contains the noise information that has not been captured by the GARCH-X model. yt-1 has been defined now.

According to the above reasons, I do not recommend the manuscript for publication in its current state

Additional Comments from Editor

Response: Thank you for your comments, we have updated our results to the year of 2019.

The authors should normalize both loss functions, MSE and QLIKE, to be the robust and homogeneous loss functions proposed by Patton (2011); see Chen and Watanabe (2019) for details.

Response: Thank you for your comments, we have used the normalize loss functions, for both MSE and QLIKE.

Chen, C.W.S. and Watanabe, T. (2019) Bayesian modeling and forecasting of Value-at-Risk via threshold realized volatility, Applied Stochastic Models in Business and Industry, 35, 747-765.

Patton A.J. (2011) Volatility forecast comparison using imperfect volatility proxies. Journal of Econometrics, 160, 246-256.

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(113.8KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0260289.r003

Decision Letter 1

Cathy W S Chen

16 Sep 2021

PONE-D-21-17939R1

Liquidity Effects on Oil Volatility Forecasting: From Fintech Perspective

PLOS ONE

Dear Dr. Cui,

==============================

1) All values shown in Tables 3 - 6 are extremely small. This situation can be improved by rescaling the time series {r_t}. Researchers use the following formula for the return.

Line 156: R_t=(ln(p_t)-ln(p_{t-1}) )\\times 100.

2) Eq. (2) should be rescaled too. Most related papers adopt the following formula for realized volatility in order to avoid small values.

RV*_t= (RV_t \\times 10,000) or

RV**_t=ln(RV_t \\times 10,000).

Please ensure that your decision is justified on PLOS ONE’s publication criteria and not, for example, on novelty or perceived impact.

For Lab, Study and Registered Report Protocols: These article types are not expected to include results but may include pilot data.

==============================

Please submit your revised manuscript by Oct 31 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Cathy W.S. Chen, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

Additional Editor Comments (if provided):

Thank you for responding to my previous comments. This revision needs to be improved in the following ways before it can be published.

1) All values shown in Tables 3 - 6 are extremely small. This situation can be improved by rescaling the time series {r_t}. Researchers use the following formula for the return.

Line 156: R_t=(ln(p_t)-ln(p_{t-1}) )\\times 100.

2) Eq. (2) should be rescaled too. Most related papers adopt the following formula for realized volatility in order to avoid small values.

RV*_t= (RV_t \\times 10,000) or

RV**_t=ln(RV_t \\times 10,000).

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Partly

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: This revision does not satisfactorily address some of my previous questions. Please see my report in the attachment.

Reviewer #2: The paper is now well written. However, there are still several comments as follows. See the attached file.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Attachment

Submitted filename: Plosone_17939_R1.pdf

Click here for additional data file.^{(77.3KB, pdf)}

Attachment

Submitted filename: PONE-D-21-17939R1_report.pdf

Click here for additional data file.^{(52.2KB, pdf)}

PLoS One. 2021 Nov 29;16(11):e0260289. doi: 10.1371/journal.pone.0260289.r004

Author response to Decision Letter 1

28 Sep 2021

See attached file

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(104.7KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0260289.r005

Decision Letter 2

Cathy W S Chen

18 Oct 2021

PONE-D-21-17939R2Liquidity Effects on Oil Volatility Forecasting: From Fintech PerspectivePLOS ONE

Dear Dr. Cui,

==============================

I provided two comments for the previous version. However, the authors do not follow my suggestions for the returns. We can see the range of returns is (-0.1654, 0.1640) in Table 1, which does not follow the formula. Please see the attached file. The authors will have one last chance to revise it and all related results accordingly.

==============================

Please submit your revised manuscript by Dec 02 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Cathy W. S. Chen, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

Additional Editor Comments (if provided):

To be clear and honest, I am not satisfied with the authors’ response. I provided two comments for the previous version. However, the authors do not follow my suggestions for the returns. We can see the range of returns is (-0.1654, 0.1640) in Table 1, which does not follow the formula. Please see the attached file. The authors will have one last chance to revise it and all related results accordingly.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Reviewer #1: The authors have satisfactorily responded to all my questions in this revision. Nevertheless, I find one typo and have a minor comment for the revision. Please see the details in the attachment. I recommend publication of this work after the authors correct the typo.

Reviewer #2: The paper is well written now. However, there is one more issue for the paper. The authors should clarify this. See the attached file.

Reviewer #3: I have no further comment on this revised manuscript. I think this paper is good enough to be published.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Attachment

Submitted filename: Plosone_17939_R2.pdf

Click here for additional data file.^{(110.1KB, pdf)}

Attachment

Submitted filename: PONE-D-21-17939R2_report.pdf

Click here for additional data file.^{(66.3KB, pdf)}

Attachment

Submitted filename: Report_PONE-D-21-17939R2.pdf

Click here for additional data file.^{(113.8KB, pdf)}

PLoS One. 2021 Nov 29;16(11):e0260289. doi: 10.1371/journal.pone.0260289.r006

Author response to Decision Letter 2

25 Oct 2021

Thank you for your comments and we have now rescaled the returns and volatility estimation based on the suggestions. Sorry for the mistake in the

last manuscript.

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(72.2KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0260289.r007

Decision Letter 3

Cathy W S Chen

8 Nov 2021

Liquidity Effects on Oil Volatility Forecasting: From Fintech Perspective

PONE-D-21-17939R3

Dear Dr. Cui,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Cathy W. S. Chen, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #2: The paper is now well written. The authors have fully addressed the points raised in the referee report.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

PLoS One. doi: 10.1371/journal.pone.0260289.r008

Acceptance letter

Cathy W S Chen

17 Nov 2021

PONE-D-21-17939R3

Liquidity Effects on Oil Volatility Forecasting: From Fintech Perspective

Dear Dr. Cui:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Prof. Cathy W. S. Chen

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: PONE-D-21-17939_report.pdf

Click here for additional data file.^{(90.7KB, pdf)}

Attachment

Submitted filename: Plosone_17939.pdf

Click here for additional data file.^{(110.1KB, pdf)}

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(113.8KB, pdf)}

Attachment

Submitted filename: Plosone_17939_R1.pdf

Click here for additional data file.^{(77.3KB, pdf)}

Attachment

Submitted filename: PONE-D-21-17939R1_report.pdf

Click here for additional data file.^{(52.2KB, pdf)}

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(104.7KB, pdf)}

Attachment

Submitted filename: Plosone_17939_R2.pdf

Click here for additional data file.^{(110.1KB, pdf)}

Attachment

Submitted filename: PONE-D-21-17939R2_report.pdf

Click here for additional data file.^{(66.3KB, pdf)}

Attachment

Submitted filename: Report_PONE-D-21-17939R2.pdf

Click here for additional data file.^{(113.8KB, pdf)}

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(72.2KB, pdf)}

Data Availability Statement

[pone.0260289.ref001] 1. Buchak G, Matvos G, Piskorski T, Seru A. Fintech, regulatory arbitrage, and the rise of shadow banks. Journal of Financial Economics. 2018;130(3):453–483. doi: 10.1016/j.jfineco.2018.03.011 [DOI] [Google Scholar]

[pone.0260289.ref002] 2. Chen MA, Wu Q, Yang B. How valuable is FinTech innovation? The Review of Financial Studies. 2019;32(5):2062–2106. doi: 10.1093/rfs/hhy130 [DOI] [Google Scholar]

[pone.0260289.ref003] 3. Lien D, Wilson BK. Multiperiod hedging in the presence of stochastic volatility. International Review of Financial Analysis. 2001;10(4):395–406. doi: 10.1016/S1057-5219(01)00060-6 [DOI] [Google Scholar]

[pone.0260289.ref004] 4. Andersen TG, Bollerslev T, Meddahi N. Correcting the errors: Volatility forecast evaluation using high-frequency data and realized volatilities. Econometrica. 2005;73(1):279–296. doi: 10.1111/j.1468-0262.2005.00572.x [DOI] [Google Scholar]

[pone.0260289.ref005] 5. Tian S, Hamori S. Modeling interest rate volatility: A Realized GARCH approach. Journal of Banking & Finance. 2015;61:158–171. doi: 10.1016/j.jbankfin.2015.09.008 [DOI] [Google Scholar]

[pone.0260289.ref006] 6. Wei Y, Wang Y, Huang D. Forecasting crude oil market volatility: Further evidence using GARCH-class models. Energy Economics. 2010;32(6):1477–1484. doi: 10.1016/j.eneco.2010.07.009 [DOI] [Google Scholar]

[pone.0260289.ref007] 7. Efimova O, Serletis A. Energy markets volatility modelling using {GARCH}. Energy Economics. 2014;43:264–273. 10.1016/j.eneco.2014.02.018 [DOI] [Google Scholar]

[pone.0260289.ref008] 8. Wang Y, Liu L. Crude oil and world stock markets: volatility spillovers, dynamic correlations, and hedging. Empirical Economics. 2016;50(4):1481–1509. doi: 10.1007/s00181-015-0983-2 [DOI] [Google Scholar]

[pone.0260289.ref009] 9. Xing Y, Wang J. Linkages between global crude oil market volatility and financial market by complexity synchronization. Empirical Economics. 2019; p. 1–17. [Google Scholar]

[pone.0260289.ref010] 10. Fleming MJ, Remolona EM. Price formation and liquidity in the US Treasury market: The response to public information. The Journal of Finance. 1999;54(5):1901–1915. doi: 10.1111/0022-1082.00172 [DOI] [Google Scholar]

[pone.0260289.ref011] 11. Feng SP, Hung MW, Wang YH. Option pricing with stochastic liquidity risk: Theory and evidence. Journal of Financial Markets. 2014;18:77–95. doi: 10.1016/j.finmar.2013.05.002 [DOI] [Google Scholar]

[pone.0260289.ref012] 12. Collin-Dufresne P, Fos V. Insider trading, stochastic liquidity, and equilibrium prices. Econometrica. 2016;84(4):1441–1475. doi: 10.3982/ECTA10789 [DOI] [Google Scholar]

[pone.0260289.ref013] 13. Yin Z, O’Sullivan C, Brabazon A. An analysis of the performance of genetic programming for realised volatility forecasting. Journal of Artificial Intelligence and Soft Computing Research. 2016;6(3):155–172. doi: 10.1515/jaiscr-2016-0012 [DOI] [Google Scholar]

[pone.0260289.ref014] 14. Ding S, Zhang Y, Duygun M. Modeling Price volatility based on a genetic programming approach. British Journal of Management. 2019;30(2):328–340. doi: 10.1111/1467-8551.12359 [DOI] [Google Scholar]

[pone.0260289.ref015] 15. Weng F, Zhang H, Yang C. Volatility forecasting of crude oil futures based on a genetic algorithm regularization online extreme learning machine with a forgetting factor: The role of news during the COVID-19 pandemic. Resources Policy. 2021;73:102148. doi: 10.1016/j.resourpol.2021.102148 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0260289.ref016] 16. Mademlis DK, Dritsakis N. Volatility Forecasting using Hybrid GARCH Neural Network Models: The Case of the Italian Stock Market. International Journal of Economics and Financial Issues. 2021;11(1):49. doi: 10.32479/ijefi.10842 [DOI] [Google Scholar]

[pone.0260289.ref017] 17. Deuskar P, Gupta A, Subrahmanyam MG. Liquidity effect in OTC options markets: Premium or discount? Journal of Financial Markets. 2011;14(1):127–160. doi: 10.1016/j.finmar.2010.08.003 [DOI] [Google Scholar]

[pone.0260289.ref018] 18. Pimenta A, Nametala CA, Guimarães FG, Carrano EG. An automated investing method for stock market based on multiobjective genetic programming. Computational Economics. 2018;52(1):125–144. doi: 10.1007/s10614-017-9665-9 [DOI] [Google Scholar]

[pone.0260289.ref019] 19. Michell K, Kristjanpoller W. Generating trading rules on US Stock Market using strongly typed genetic programming. Soft Computing. 2020;24(5):3257–3274. doi: 10.1007/s00500-019-04085-1 [DOI] [Google Scholar]

[pone.0260289.ref020] 20. Ding S, Cui T, Xiong X, Bai R. Forecasting stock market return with nonlinearity: a genetic programming approach. Journal of Ambient Intelligence and Humanized Computing. 2020; p. 1–13. [Google Scholar]

[pone.0260289.ref021] 21. Ma F, Liao Y, Zhang Y, Cao Y. Harnessing jump component for crude oil volatility forecasting in the presence of extreme shocks. Journal of Empirical Finance. 2019;52:40–55. doi: 10.1016/j.jempfin.2019.01.004 [DOI] [Google Scholar]

[pone.0260289.ref022] 22. Yin L. Does oil price respond to macroeconomic uncertainty? New evidence. Empirical Economics. 2016;51(3):921–938. doi: 10.1007/s00181-015-1027-7 [DOI] [Google Scholar]

[pone.0260289.ref023] 23. Chen Y, Ma F, Zhang Y. Good, bad cojumps and volatility forecasting: New evidence from crude oil and the US stock markets. Energy Economics. 2019;81:52–62. doi: 10.1016/j.eneco.2019.03.020 [DOI] [Google Scholar]

[pone.0260289.ref024] 24. Russell S, Norvig P. Artificial Intelligence: A Modern Approach. 3rd ed. Upper Saddle River, NJ, USA: Prentice Hall Press; 2009. [Google Scholar]

[pone.0260289.ref025] 25. Mackenzie A. THE FINTECH REVOLUTION. London Business School Review. 2015;26(3):50–53. doi: 10.1111/2057-1615.12059 [DOI] [Google Scholar]

[pone.0260289.ref026] 26. Bollerslev T, Wright JH. High-frequency data, frequency domain inference, and volatility forecasting. Review of Economics and Statistics. 2001;83(4):596–602. doi: 10.1162/003465301753237687 [DOI] [Google Scholar]

[pone.0260289.ref027] 27. Hansen PR, Lunde A. A forecast comparison of volatility models: does anything beat a GARCH (1, 1)? Journal of Applied Econometrics. 2005;20(7):873–889. doi: 10.1002/jae.800 [DOI] [Google Scholar]

[pone.0260289.ref028] 28. Klein T, Walther T. Oil price volatility forecast with mixture memory {GARCH}. Energy Economics. 2016;58:46–58. 10.1016/j.eneco.2016.06.004 [DOI] [Google Scholar]

[pone.0260289.ref029] 29. Lux T, Segnon M, Gupta R. Forecasting crude oil price volatility and value-at-risk: Evidence from historical and recent data. Energy Economics. 2016;56:117–133. 10.1016/j.eneco.2016.03.008 [DOI] [Google Scholar]

[pone.0260289.ref030] 30. Bollerslev T, Melvin M. Bid-ask spreads and volatility in the foreign exchange market: An empirical analysis. Journal of International Economics. 1994;36(3):355–372. doi: 10.1016/0022-1996(94)90008-6 [DOI] [Google Scholar]

[pone.0260289.ref031] 31. Haugom E, Langeland H, Molnár P, Westgaard S. Forecasting volatility of the US oil market. Journal of Banking & Finance. 2014;47:1–14. doi: 10.1016/j.jbankfin.2014.05.026 [DOI] [Google Scholar]

[pone.0260289.ref032] 32. Yu HC, Shin TL. Gold, crude oil and the weekend effect: a probability distribution approach. Investment Management and Financial Innovations. 2011;(8, Iss. 2):39–51. [Google Scholar]

[pone.0260289.ref033] 33. Geman H, Kharoubi C. WTI crude oil futures in portfolio diversification: The time-to-maturity effect. Journal of Banking & Finance. 2008;32(12):2553–2559. doi: 10.1016/j.jbankfin.2008.04.002 [DOI] [Google Scholar]

[pone.0260289.ref034] 34. Schwert GW. Why does stock market volatility change over time? The Journal of Finance. 1989;44(5):1115–1153. doi: 10.1111/j.1540-6261.1989.tb02647.x [DOI] [Google Scholar]

[pone.0260289.ref035] 35. Andersen TG, Bollerslev T. Deutsche mark–dollar volatility: intraday activity patterns, macroeconomic announcements, and longer run dependencies. The Journal of Finance. 1998;53(1):219–265. doi: 10.1111/0022-1082.85732 [DOI] [Google Scholar]

[pone.0260289.ref036] 36. Christensen BJ, Prabhala NR. The relation between implied and realized volatility. Journal of Financial Economics. 1998;50(2):125–150. doi: 10.1016/S0304-405X(98)00034-8 [DOI] [Google Scholar]

[pone.0260289.ref037] 37. Marshall BR, Nguyen NH, Visaltanachoti N. Commodity liquidity measurement and transaction costs. Review of Financial Studies. 2012;25(2):599–638. doi: 10.1093/rfs/hhr075 [DOI] [Google Scholar]

[pone.0260289.ref038] 38. Roll R. A simple implicit measure of the effective bid-ask spread in an efficient market. The Journal of Finance. 1984;39(4):1127–1139. doi: 10.1111/j.1540-6261.1984.tb03897.x [DOI] [Google Scholar]

[pone.0260289.ref039] 39. Goyenko RY, Holden CW, Trzcinka CA. Do liquidity measures measure liquidity? Journal of Financial Economics. 2009;92(2):153–181. [Google Scholar]

[pone.0260289.ref040] 40. Koza JR. Genetic programming as a means for programming computers by natural selection. Statistics and Computing. 1994;4(2):87–112. doi: 10.1007/BF00175355 [DOI] [Google Scholar]

[pone.0260289.ref041] 41. Banzhaf W, Francone FD, Keller RE, Nordin P. Genetic programming: an introduction: on the automatic evolution of computer programs and its applications. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; 1998. [Google Scholar]

[pone.0260289.ref042] 42. Hirsh H, Banzhaf W, Koza JR, Ryan C, Spector L, Jacob C. Genetic Programming. IEEE Intelligent Systems. 2000;15(3):74–84. doi: 10.1109/5254.846288 [DOI] [Google Scholar]

[pone.0260289.ref043] 43. Poli R, Langdon WB, McPhee NF. A Field Guide to Genetic Programming. Lulu Enterprises, UK Ltd; 2008. [Google Scholar]

[pone.0260289.ref044] 44. Lamoureux CG, Lastrapes WD. Heteroskedasticity in stock return data: volume versus GARCH effects. The Journal of Finance. 1990;45(1):221–229. doi: 10.1111/j.1540-6261.1990.tb05088.x [DOI] [Google Scholar]

[pone.0260289.ref045] 45. Nelson DB. Stationarity and persistence in the GARCH (l, l) model. Econometric Theory. 1990;6(3):318–334. doi: 10.1017/S0266466600005296 [DOI] [Google Scholar]

[pone.0260289.ref046] 46. Pong S, Shackleton MB, Taylor SJ, Xu X. Forecasting currency volatility: A comparison of implied volatilities and AR(FI)MA models. Journal of Banking & Finance. 2004;28(10):2541–2563. doi: 10.1016/j.jbankfin.2003.10.015 [DOI] [Google Scholar]

[pone.0260289.ref047] 47. Bollerslev T, Patton AJ, Quaedvlieg R. Exploiting the errors: A simple approach for improved volatility forecasting. Journal of Econometrics. 2016;192(1):1–18. doi: 10.1016/j.jeconom.2015.10.007 [DOI] [Google Scholar]

[pone.0260289.ref048] 48. Ismail S, Shabri A, Samsudin R. A hybrid model of self-organizing maps (SOM) and least square support vector machine (LSSVM) for time-series forecasting. Expert Systems with Applications. 2011;38(8):10574–10578. doi: 10.1016/j.eswa.2011.02.107 [DOI] [Google Scholar]

[pone.0260289.ref049] 49. Chen CW, Watanabe T. Bayesian modeling and forecasting of Value-at-Risk via threshold realized volatility. Applied Stochastic Models in Business and Industry. 2019;35(3):747–765. doi: 10.1002/asmb.2395 [DOI] [Google Scholar]

[pone.0260289.ref050] 50.Mincer JA, Zarnowitz V. The evaluation of economic forecasts. In: Economic forecasts and expectations: Analysis of forecasting behavior and performance. NBER; 1969. p. 3–46.

[pone.0260289.ref051] 51. Kristjanpoller W, Minutolo MC. Forecasting volatility of oil price using an artificial neural network-GARCH model. Expert Systems with Applications. 2016;65:233–241. doi: 10.1016/j.eswa.2016.08.045 [DOI] [Google Scholar]

[pone.0260289.ref052] 52. Kristjanpoller W, Hernández E. Volatility of main metals forecasted by a hybrid ANN-GARCH model with regressors. Expert Systems with Applications. 2017;84:290–300. doi: 10.1016/j.eswa.2017.05.024 [DOI] [Google Scholar]

[pone.0260289.ref053] 53. Bhattacharya S, Ahmed A. Forecasting crude oil price volatility in India using a hybrid ANN-GARCH model. International Journal of Business Forecasting and Marketing Intelligence. 2018;4(4):446–457. doi: 10.1504/IJBFMI.2018.095154 [DOI] [Google Scholar]

[pone.0260289.ref054] 54. Cortes C, Vapnik V. Support-vector networks. Machine learning. 1995;20(3):273–297. doi: 10.1007/BF00994018 [DOI] [Google Scholar]

[pone.0260289.ref055] 55. Pérez-Cruz F, Afonso-Rodriguez JA, Giner J. Estimating GARCH models using support vector machines. Quantitative Finance. 2003;3(3):163. doi: 10.1088/1469-7688/3/3/302 [DOI] [Google Scholar]

[pone.0260289.ref056] 56. Chen S, Härdle WK, Jeong K. Forecasting volatility with support vector machine-based GARCH model. Journal of Forecasting. 2010;29(4):406–433. [Google Scholar]

[pone.0260289.ref057] 57. Celik S, Ergin H. Volatility forecasting using high frequency data: Evidence from stock markets. Economic Modelling. 2014;36:176–190. doi: 10.1016/j.econmod.2013.09.038 [DOI] [Google Scholar]

PERMALINK

Liquidity effects on oil volatility forecasting: From fintech perspective

Shusheng Ding

Tianxiang Cui

Yongmin Zhang

Jiawei Li

Roles

Abstract

1 Introduction

1.1 Motivations and aims

1.2 Research contributions

2 Data and variable estimation

2.1 The data

Fig 1. Daily return plotting.

2.2 Variable estimation

Table 1. Statistical summary of variables used for the 18 year data.

3 Model development under genetic programming system

3.1 Preliminaries

3.2 Genetic programming system

3.3 Model development

Fig 2. Daily simplified realized variance plotting.

Fig 4. Daily bid-ask spread plotting.

Fig 3. Daily realized variance plotting.

4 Empirical results

4.1 Empirical models

Table 2. Regression results summary for simplified realized variance and realized variance.

4.2 In-sample data fitting

Table 3. In sample fitting with simplified realized variance.

Table 4. In sample fitting with realized variance.

4.3 Out-of-sample forecasting

Table 5. Out of sample forecasting with simplified realized variance.

Table 6. Out of sample forecasting with realized variance.

4.4 Robustness check

Table 7. Regression results for simplified realized variance (in-sample).

Table 10. Regression results for realized variance (out-of-sample).

Table 8. Regression results for realized variance (in-sample).

Table 9. Regression results for simplified realized variance (out-of-sample).

Table 11. QLIKE loss function results for four models.

Table 12. QLIKE loss function results for ANN and SVM models.

5 Conclusion and implications

Data Availability

Funding Statement

References

Decision Letter 0

Cathy W S Chen

Roles

Author response to Decision Letter 0

Decision Letter 1

Cathy W S Chen

Roles

Author response to Decision Letter 1

Decision Letter 2

Cathy W S Chen

Roles

Author response to Decision Letter 2

Decision Letter 3

Cathy W S Chen

Roles

Acceptance letter

Cathy W S Chen

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases