Modeling opening price spread of Shanghai Composite Index based on ARIMA-GRU/LSTM hybrid model

Yuancheng Si; Saralees Nadarajah; Zongxin Zhang; Chunmin Xu

doi:10.1371/journal.pone.0299164

. 2024 Mar 13;19(3):e0299164. doi: 10.1371/journal.pone.0299164

Modeling opening price spread of Shanghai Composite Index based on ARIMA-GRU/LSTM hybrid model

Yuancheng Si ^1,^*,^#, Saralees Nadarajah ^2,^#, Zongxin Zhang ^1,^#, Chunmin Xu ^3,^#

Editor: Muhammad Usman Tariq⁴

PMCID: PMC10936816 PMID: 38478502

Abstract

In the dynamic landscape of financial markets, accurate forecasting of stock indices remains a pivotal yet challenging task, essential for investors and policymakers alike. This study is motivated by the need to enhance the precision of predicting the Shanghai Composite Index’s opening price spread, a critical measure reflecting market volatility and investor sentiment. Traditional time series models like ARIMA have shown limitations in capturing the complex, nonlinear patterns inherent in stock price movements, prompting the exploration of advanced methodologies. The aim of this research is to bridge the gap in forecasting accuracy by developing a hybrid model that integrates the strengths of ARIMA with deep learning techniques, specifically Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. This novel approach leverages the ARIMA model’s proficiency in linear trend analysis and the deep learning models’ capability in modeling nonlinear dependencies, aiming to provide a comprehensive tool for market prediction. Utilizing a comprehensive dataset covering the period from December 20, 1990, to June 2, 2023, the study develops and assesses the efficacy of ARIMA, LSTM, GRU, ARIMA-LSTM, and ARIMA-GRU models in forecasting the Shanghai Composite Index’s opening price spread. The evaluation of these models is based on key statistical metrics, including Mean Squared Error (MSE) and Mean Absolute Error (MAE), to gauge their predictive accuracy. The findings indicate that the hybrid models, ARIMA-LSTM and ARIMA-GRU, perform better in forecasting the opening price spread of the Shanghai Composite Index than their standalone counterparts. This outcome suggests that combining traditional statistical methods with advanced deep learning algorithms can enhance stock market prediction. The research contributes to the field by providing evidence of the potential benefits of integrating different modeling approaches for financial forecasting, offering insights that could inform investment strategies and financial decision-making.

1 Introduction

In the stock market, an important phenomenon is price difference, commonly referred to as the opening price difference or morning price difference [1–3]. These terms describe the situation where there is a difference between the opening price of a security or index and its previous day’s closing price. This phenomenon is observed in various financial markets, including stock and commodity markets. When we talk about price difference, it can be further divided into positive price difference and negative price difference. A positive price difference indicates that the opening price is higher than the previous day’s closing price, which may be due to positive information received by the market after the close, leading buyers to be willing to pay a higher price at market open. On the other hand, a negative price difference indicates a lower opening price, which may be attributed to negative information received by the market after the close, causing sellers to be willing to sell at a lower price at market open [3, 4].

In summary, the phenomenon of price differences reflects an important characteristic of financial markets, namely, price volatility and uncertainty. The occurrence of price differences provides a unique perspective that helps us understand and explain the volatility of stock prices. In previous studies, the phenomenon of price differences has received extensive attention. Many researchers [3, 5, 6] have conducted in-depth research on price differences in different financial markets, attempting to understand the causes and impacts of this phenomenon and establish daily price difference models to validate relevant economic hypotheses [3, 7]. Among them, some studies have found that market information efficiency is an important factor affecting price differences [8]. In markets with high information efficiency, price differences may be less common because all information has been fully absorbed by market participants and reflected in prices. However, in markets with low information efficiency, price differences may be more frequent as market participants may not immediately access and react to all information. Some empirical analyses of the US stock market have also shown that adjustments to bad news are faster than those to good news [6, 9]. After a positive opening price difference, stock prices tend to have stronger upward momentum, providing opportunities for profitable trading. Overall, price differences are an important phenomenon in the stock market, and a deep understanding and research of this phenomenon are significant for revealing the volatility of stock prices, understanding and predicting market dynamics, and making investment decisions. The remainder of this document unfolds as follows: Section 2 reviews relevant literature, providing a theoretical foundation and contextual background. Section 3 presents the data collection methodology, including the sources, variables, and descriptive statistics of the dataset utilized. Section 4 delves into the rationale behind using overnight returns as a proxy for market sentiment. Section 5 explores the relationship between trading volume ratios and sentiment, offering empirical evidence. Section 6 subjects the study’s findings to a series of robustness checks to verify their validity. Finally, Section 7 concludes the paper, summarizing the key insights and suggesting directions for future research.

2 Literature review

From a theoretical perspective, according to the Efficient Market Hypothesis (EMH) [10, 11], in an efficient market, asset prices will fully and promptly reflect all available information. Thus, there is a close relationship between the Efficient Market Hypothesis and stock index price differences. In a perfectly efficient market, investors cannot obtain excess returns through any means, as all information is fully absorbed by the market and reflected in prices. However, the occurrence of opening price differences suggests a certain degree of market inefficiency. Specifically, if there is a difference between the closing price of the previous trading day and the opening price of the current day, it may indicate that the market did not fully incorporate all information into the closing price of the previous trading day or that there is a delay in the market’s reaction to new information [1, 12]. Although stock index price differences imply possible market inefficiency, it does not mean that investors can easily obtain excess returns. In practice, investors need to consider many other factors, including transaction costs, market volatility, and other factors that may affect investment decisions. Moreover, even if there is a certain degree of market inefficiency, the Efficient Market Hypothesis reminds us that this inefficiency may quickly disappear as market participants react.

From a practical perspective, [3] study rigorously examines the Ukrainian stock market, specifically analyzing the UX index from 2009 to 2018, to discern patterns and anomalies related to price gaps. Through comprehensive statistical and regression analyses, it finds no significant evidence of seasonality or abnormal behavior post-gaps, aligning with the Efficient Market Hypothesis, except for a momentum effect on days with negative gaps, suggesting a profitable trading strategy that contradicts market efficiency. Similarly, [13] undertakes an extensive analysis of price gaps across stock, FOREX, and commodity markets from 2000 to 2015, employing various statistical tests to explore six hypotheses regarding market efficiency. It concludes that while most market behaviors align with efficiency, FOREX markets exhibit an anomaly that allows for the generation of abnormal profits through a specific trading strategy, highlighting a distinct deviation from market efficiency in the FOREX sector. Adding to the complexity of market efficiency, recent studies have delved into the nuanced dynamics of stock market [5, 14, 15], revealing intriguing exceptions to the momentum effect. A novel examination into this market demonstrates that intraday and overnight returns significantly influence future stock returns in differing manners. Investors show a tendency to underreact to intraday information, while overreacting to overnight information, leading to the formulation of intraday momentum and overnight momentum strategies. This dichotomy not only challenges the traditional understanding of market reactions but also illustrates the persistence of profitability, showcasing its resilience against momentum crashes. Furthermore, the relationship between overnight returns and investor sentiment on the Taiwan Stock Exchange (TWSE) has been reassessed [15], corroborating the findings by Aboody et al. that overnight returns reflect investor sentiment. This study extends the understanding by highlighting how trading activities by different investor types amplify the patterns of overnight returns, with a significant role played by retail trading volume. It elucidates that overnight returns contribute to both short-term persistence and long-term return reversals, driven by investor sentiment. These insights not only validate the use of overnight returns as a measure of investor sentiment in the TWSE but also suggest the influence of market structure and investor behaviors as critical determinants in non-US markets [16]. These recent findings enrich the discourse on market efficiency by illustrating how specific market mechanisms and investor behaviors can lead to anomalies that both challenge and complement the Efficient Market Hypothesis. They underscore the importance of considering intraday and overnight information separately in analyzing market dynamics and formulating trading strategies. To bridge the insights from specific market behaviors and anomalies highlighted in [3, 13] with the broader considerations of market dynamics, it’s imperative to understand the underlying factors contributing to price differences. Factors contributing to price differences can be summarized as follows:

Announcement of macroeconomic information [1, 12]: Important macroeconomic information of a country or other countries may be announced after the trading market closes, such as adjustments in monetary policy or the release of GDP growth data. These announcements can influence investors’ expectations and result in differences between the opening price and the previous day’s closing price.
Leakage of internal corporate information [6, 12]: Significant leaks of corporate information, such as financial reports or major decisions, can have an impact on stock prices. Particularly, when this information is disclosed after the trading market closes or leaked before the market opens, it often leads to significant differences between the opening price and the previous day’s closing price.
Changes in market microstructure information [17]: Changes in market microstructure information, including trading volume and frequency, can affect stock prices. When these changes occur, they may result in differences between the opening price and the previous day’s closing price.
Changes in non-economic market sentiment [7, 18]: Market sentiment is an important factor influencing stock prices. Investor panic or excessive optimism can lead to abnormal price fluctuations. When market sentiment changes significantly after

the trading market closes due to non-economic factors such as geopolitical events or natural disasters, it often results in significant differences between the opening price and the previous day’s closing price for specific stocks or sectors.
Liquidity shocks [12]: Liquidity shocks refer to sudden changes in market liquidity, such as a concentration of buy or sell orders at the moment the market opens or a sudden inflow or outflow of funds. Such liquidity shocks can lead to significant differences between the opening price and the previous day’s closing price.

Despite the in-depth exploration of price differences in previous research from different perspectives, considering the complexity and diversity of their influences, as well as the continuously changing market environment and investor behavior, further deepening and expanding the research on this phenomenon is still needed. In the context of price difference modeling, we often encounter several core problems and demands:

Simplicity and effectiveness: Autoregressive models are simple and applicable in many situations. They only require knowledge of the historical data of a single variable to make predictions. This makes the model easy to understand and interpret, especially in the financial field, where stock prices are often believed to be influenced by past prices [19, 20].
Autocorrelation: In many financial time series data, there is autocorrelation between the data at a certain time point and its historical data, meaning that the current value may depend on past values. This dependency is the core assumption of autoregressive models, which often holds true in many financial time series.
Noise in high-frequency data: When using high-frequency financial data (such as data per minute or per second), the data may contain a large amount of noise. Introducing more covariates may make the model too complex and introduce the risk of overfitting. On the other hand, autoregressive models, due to their simplicity, can better handle this noise [21].
Data availability: In practical applications, we may not have access to data for all variables that may influence stock prices, or the cost of obtaining such data may be high.

Meanwhile, obtaining comprehensive datasets that cover all factors influencing stock prices presents significant challenges, including data availability and associated costs. High-quality, extensive financial data often requires substantial investment, limiting accessibility for individual researchers and smaller entities. Moreover, the complexity of accurately analyzing and integrating this data to reflect stock price dynamics poses additional hurdles. The integration of covariates from diverse frequencies—daily to quarterly—into a coherent predictive model requires advanced methodologies to maintain data integrity and avoid analytical bias. Exploring the impact of such covariates on the opening price spread offers a promising research direction, potentially uncovering the underlying mechanisms of stock market movements. This exploration could lead to more accurate stock price predictions and informed trading strategies, necessitating access to detailed, high-frequency data and the application of sophisticated statistical and machine learning techniques. Given these considerations, This study adopts a hybrid approach, combining autoregressive modeling with deep learning, to address the complexities of financial time series analysis effectively. This strategy is designed to harness the strengths of both methodologies, enhancing the ability to predict stock price variations despite the challenges posed by integrating diverse covariates. The novelty of this study is reflected in the following aspects:

Comprehensive Data Analysis of Shanghai Composite Index’s Opening Price Spread: This study represents one of the first comprehensive analyses focusing specifically on the opening price spread rate of the Shanghai Composite Index. We delve deep into the historical data, meticulously analyzing patterns and trends over an extensive period. This approach offers a unique perspective on how the opening price spread behaves in one of the world’s largest financial markets, providing new insights into market dynamics.
Hybrid Modeling Approach for Enhanced Forecast Accuracy: We adopt a novel hybrid modeling strategy, combining the strengths of autoregressive models and deep learning techniques. This approach effectively addresses the limitations of traditional time series models in capturing complex, nonlinear relationships inherent in stock prices. By integrating ARIMA with advanced deep learning models like LSTM and GRU, we significantly improve the accuracy of forecasting the opening price spread. This methodological innovation marks a substantial advancement over traditional forecasting models.
Focus on Practical Implications and Application: This study goes beyond theoretical exploration and actively addresses practical implications. We provide insights that are directly applicable for investors, traders, and financial analysts. By improving the accuracy of forecasting models, we offer tools that can aid in more informed decision-making processes in the realm of investment and trading strategies. This practical focus ensures that the research is not only academically relevant but also of tangible value to those operating in financial markets. These novel elements of this study contribute to the existing body of knowledge in financial time series analysis and forecasting. They demonstrate the effectiveness of this approach in addressing the unique challenges presented by the financial data and highlight the potential for future research in this area, particularly in exploring the impact of various covariates on stock market dynamics.

3 Date and methods

3.1 Data

We downloaded the dataset of the Shanghai Stock Exchange Composite Index from December 20, 1990, to June 2, 2023, comprising a total of 7,927 trading days (excluding non-trading days such as statutory holidays) from Sina Finance(https://finance.sina.com.cn/stock/). The dataset consists of 9 feature columns, where each sample represents a day of stock market trading data. The following are the feature columns in the dataset and their meanings:

’date’: Date
’pre_close’: Previous day’s closing price
’open’: Opening price on the current day
’high’: Highest price on the current day
’low’: Lowest price on the current day
’close’: Closing price on the current day
’changeinprice’: Price change
’changeinrate’: Price change rate
’diffrate’: Opening price difference rate

In this dataset, the focus is on the ‘diffrate’ (opening price difference rate) column. ‘diffrate’ represents the percentage difference between the opening price of the day and the previous day’s closing price. By calculating the difference between the daily opening price and the previous day’s closing price, dividing it by the previous day’s closing price, and multiplying by 100, we obtain the opening price difference rate as a percentage.

\begin{matrix} d i f f r a t e = (\frac{o p e n - p r e_c l o s e}{p r e_c l o s e}) \times 100 % \end{matrix}

(1)

3.2 Interpretation of opening price difference rate

The opening price difference rate provides information about the relative change between the daily opening price and the previous day’s closing price. A positive value indicates that the opening price is higher than the previous day’s closing price, while a negative value indicates that the opening price is lower. This indicator helps us understand market volatility, trends, and the magnitude of price changes. It reflects market sentiment: the opening price difference rate can be considered as one of the sentiment indicators of market participants. A positive value suggests that market participants have positive expectations for the stock or index, leading them to be willing to buy at a higher price. On the other hand, a negative value may reflect market participants’ concerns or cautious sentiment. Additionally, the opening price difference rate can also reflect market supply and demand dynamics. If the opening price is higher than the previous day’s closing price, it may indicate higher market demand with more buyers, resulting in price increases. Conversely, if the opening price is lower, it may indicate higher market supply with more sellers, leading to price declines. Analyzing the opening price difference rate helps explore market trends and volatility. For example, when the absolute value of the opening price difference rate is large, it may indicate significant market fluctuations or unexpected events that can have a significant impact on the stock market. This is valuable for investors, traders, and analysts in predicting market trends and formulating appropriate investment strategies.

3.3 Exploratory data analysis and unit root test

We conducted exploratory data analysis on the ‘diffrate’ (opening price difference rate) using statistical software to understand its economic and statistical significance. The ‘diffrate’ represents the percentage difference between the daily opening price and the previous day’s closing price. The data span from December 20, 1990, to June 2, 2023, comprising a total of 7,927 samples.

Descriptive statistics of the ‘diffrate’ are presented in Table 1. According to the statistics, the ‘diffrate’ has a mean of 0.008907 and a standard deviation of 1.572218. This indicates that, on average, the daily opening price has a relatively small difference compared to the previous day’s closing price, but it also exhibits significant volatility. The minimum value is -21.822000 (occurred on August 12, 1992, during a period when the market experienced a rare and drastic decline from its peak of 1429.01 points in May 1992, dropping to around 600 points in September before rebounding to around 700 points), and the maximum value is 104.269000 (occurred on May 21, 1992, when the Shanghai Stock Exchange lifted the price limit on the only 15 listed stocks, triggering a market surge. Without the limit-up rule, the Shanghai market rose 105% in a single day. Note: In the histogram, we excluded the corresponding extreme values). These values demonstrate the wide range of variations in the ‘diffrate’ throughout the study period.

Table 1. Group counts of ‘diffrate’.

Group	Count	Group	Count	Group	Count	Group	Count
[-10.0, -9.5)	1	[-9.5, -9.0)	2	[-9.0, -8.5)	2	[-8.5, -8.0)	2
[-8.0, -7.5)	2	[-7.5, -7.0)	2	[-7.0, -6.5)	3	[-6.5, -6.0)	3
[-6.0, -5.5)	5	[-5.5, -5.0)	1	[-5.0, -4.5)	5	[-4.5, -4.0)	9
[-4.0, -3.5)	14	[-3.5, -3.0)	20	[-3.0, -2.5)	21	[-2.5, -2.0)	48
[-2.0, -1.5)	92	[-1.5, -1.0)	186	[-1.0, -0.5)	528	[-0.5, 0.0)	3099
[0.0, 0.5)	2969	[0.5, 1.0)	549	[1.0, 1.5)	157	[1.5, 2.0)	76
[2.0, 2.5)	37	[2.5, 3.0)	24	[3.0, 3.5)	15	[3.5, 4.0)	8
[4.0, 4.5)	6	[4.5, 5.0)	6	[5.0, 5.5)	4	[5.5, 6.0)	4
[6.0, 6.5)	5	[6.5, 7.0)	4	[7.0, 7.5)	2	[7.5, 8.0)	3
[8.0, 8.5)	2	[9.0, 9.5)	2	[9.5, 10.0)	2	[10.0, 10.5)	1

Open in a new tab

The histogram of the ‘diffrate’ is shown in Fig 1 to visualize its distribution. The histogram illustrates the frequency distribution of the ‘diffrate’ within different intervals. From the histogram, we observe that the ‘diffrate’ is mainly concentrated around smaller values, but there are also some extreme values. This indicates that, during the study period, the opening price’s relative change compared to the previous day’s closing price was mostly small, but occasionally, there were significant changes. We also provide binned counts of the non-zero intervals of the entire ‘diffrate’ with a bin size of 0.5 as statistics of ‘diffrate’, as shown in Tables 1 and 2.

Table 2. Statistics of ‘diffrate’.

	count	mean	std	min	25%	50%	75%	max
diffrate	7927	0.009	1.572	-21.822	-0.215	-0.007	0.198	104.269

Open in a new tab

By plotting the line chart of the ‘diffrate’ over time as shown in Fig 1, we further explore its trends and volatility. From the line chart, we can observe that the ‘diffrate’ exhibits significant fluctuations at different time points, displaying clear upward and downward trends. This further confirms the importance of the ‘diffrate’ as a key indicator that can help us predict market trends and formulate corresponding investment strategies.

In order to conduct time series modeling on ‘diffrate’, we first performed a unit root test. The unit root test is a commonly used method in time series analysis to determine whether a series has a random drift or is non-stationary. For this study, the purpose of the unit root test is to determine whether ‘diffrate’ needs to be differenced to become a stationary time series, which is suitable for ARIMA modeling. In the unit root test, we used the Augmented Dickey-Fuller Test (ADF test), which is one of the commonly used unit root test methods. The null hypothesis of the ADF test is that the series has a unit root (non-stationary), while the alternative hypothesis is that the series is stationary. If the test result rejects the null hypothesis, indicating significant statistical evidence that the series is stationary, we can proceed with further modeling analysis using the differenced series. We used the ‘adfuller’ function from the ‘statsmodels’ library to perform the ADF test, and the results are as follows:

The ADF statistic is -17.5854 and the p-value is 0.0000. At a significance level of 0.05, the p-value is less than the significance level, indicating that we can reject the null hypothesis. This means that ‘diffrate’ is a stationary time series rather than having a unit root (non-stationary). Additionally, we compared the ADF statistic with critical values. For a 1% significance level, the critical value is -3.4312; for a 5% significance level, the critical value is -2.8619; and for a 10% significance level, the critical value is -2.5670. Since the ADF statistic is much smaller than the critical values, it further supports the conclusion that ‘diffrate’ is a stationary time series. Based on these results, we can confirm that ‘diffrate’ is a stationary time series. This means that we can directly use ‘diffrate’ as the input data in the modeling process without differencing. Therefore, we can proceed with ARIMA modeling using ‘diffrate’ to predict market trends and formulate related investment strategies.

In summary, ‘diffrate’ has significant economic and statistical significance. It reflects market sentiment and supply-demand dynamics, helping us predict market trends and volatility. Through exploratory data analysis, we gained a deeper understanding of the distribution characteristics and trends of ‘diffrate’, laying the foundation for further research and analysis.

3.4 Modeling and forecasting process

According to the time order, we sorted the entire dataset in ascending order, from the earliest to the most recent. This was done to ensure that the model predicts and evaluates based on the temporal continuity during the training and testing process.

Next, we divided the data into two parts: the first 80% of the data was used as the training set for the entire modeling process, and the remaining 20% was set aside as the validation set to evaluate the predictive performance of the model. To accurately assess the model’s predictive ability, we employed a rolling forecasting approach for validation. Specifically, we trained the model using a fixed window of historical data, then predicted the value for the next time point and compared it with the actual observed value. This approach simulates the forecasting scenario in practical applications and provides a better assessment of the model’s generalization ability on future data. See Fig 2 for a visualization of the process.

In this study, we will compare the performance of five models, including ARIMA, LSTM, GRU, and hybrid models ARIMA-LSTM and ARIMA-GRU. These models will be evaluated through rolling forecasts on the test set, and their predictive performance will be measured using evaluation metrics such as root mean square error, mean absolute error, prediction accuracy, etc. We will analyze and compare the forecasting results of each model to determine which model performs best in predicting the “opening price difference”.

4 Model composition

4.1 ARIMA model

The Autoregressive Integrated Moving Average Model(ARIMA) model, is a statistical model used for analyzing and forecasting time series data [21]. The ARIMA model consists of three main components: Autoregressive (AR) component, Integrated (I) component, and Moving Average (MA) component. Specifically, the AR component assumes that the current value depends on past p observations, where p is a parameter to be determined. The MA component assumes that the current value depends on past q error terms, where q is also a parameter to be determined. The I component is responsible for transforming non-stationary series into stationary ones. The general form of the ARIMA model can be written as ARIMA(p, d, q), where p is the order of the autoregressive component, d is the degree of differencing, and q is the order of the moving average component.

The mathematical expression of the ARIMA model is as follows:

\begin{matrix} (1 - \sum_{i = 1}^{p} ϕ_{i} L^{i}) {(1 - L)}^{d} X_{t} = (1 + \sum_{i = 1}^{q} θ_{i} L^{i}) ε_{t} \end{matrix}

(2)

where ϕ_i represents the autoregressive coefficients, θ_i represents the moving average coefficients, L is the lag operator, X_t is the time series, ε_t is the error term, and d is the degree of differencing.

Fitting the time series data with ARIMA(p, d, q) means utilizing a combination of different orders of AR, MA, and ARMA to capture various patterns and information in the time series, thus achieving effective time series forecasting. ARIMA models are commonly used to fit non-stationary financial time series ARIMA is often chosen as a baseline model in time series forecasting due to its robustness and simplicity. It is particularly useful for establishing a benchmark for comparison with more complex models. Below we detail the reasons for selecting ARIMA as the baseline model in this study:

Simplicity and Interpretability: ARIMA models are straightforward and their parameters have clear interpretations, which is valuable for initial analysis and benchmarking.
Maturity and Stability: The ARIMA model is a well-established method in time series analysis, providing reliable and consistent benchmarks.
Less Data Requirement: ARIMA can perform well even with smaller datasets, making it suitable for situations where data availability is limited.
Benchmarking: It provides a standard against which the performance of more complex models can be compared. This is crucial in assessing whether the additional complexity of newer models translates into better performance.
Theoretical Foundations: ARIMA is grounded in statistical theory, which helps in understanding the underlying processes in the time series data.

Given these reasons, ARIMA serves as the baseline model in this analysis, against which the performance of ARIMA-LSTM, LSTM, GRU, and ARIMA-GRU is evaluated in section 8.

4.2 Deep learning models

4.2.1 Long Short-Term Memory (LSTM) model

Long Short-Term Memory (LSTM) is a special type of Recurrent Neural Network (RNN) proposed by Hochreiter and Schmidhuber in 1997 [22]. It addresses the issue of handling long-term dependencies in traditional RNNs when dealing with long sequences. LSTM is particularly suitable for modeling and predicting important events in time series with relatively long intervals and delays. Many researchers have applied LSTM models to traditional time series forecasting and achieved promising results [19, 23].

LSTM introduces a structure called a “memory cell” to store and access long-term information. Each memory cell is controlled by an input gate, a forget gate, and an output gate. The control signals of these gates are computed using sigmoid functions, which produce values between 0 and 1, indicating the degree of gate opening.

Specifically, the computation process of LSTM can be expressed with the following mathematical equations:

Input Gate: i_t = σ(W_iix_t + b_ii + W_hih_t−1 + b_hi)
Forget Gate: f_t = σ(W_ifx_t + b_if + W_hfh_t−1 + b_hf)
Output Gate: o_t = σ(W_iox_t + b_io + W_hoh_t−1 + b_ho)
New Cell State: g_t = tanh(W_igx_t + b_ig + W_hgh_t−1 + b_hg)
Updated Cell State: c_t = f_t*c_t−1 + i_t*g_t
Final Hidden State: h_t = o_t*tanh(c_t)

In these equations:h_t represents the hidden state at time step t, x_t represents the input at time step t, c_t represents the cell state at time step t, W and b denote weight matrices and bias vectors, with subscripts indicating the input (i), hidden (h), and gate types (i, f, o, g), σ represents the sigmoid function, tanh represents the hyperbolic tangent function, * denotes element-wise multiplication (Hadamard product), i_t, f_t, o_t, and g_t represent the candidate values for the input gate, forget gate, output gate, and cell state, respectively.

The development of LSTM was aimed at addressing the challenges that traditional RNNs face when dealing with long sequential data. Traditional RNNs struggle to capture long-term dependencies in sequences, making it difficult to capture dependencies with long gaps. LSTM effectively addresses this issue by introducing memory cells and gate mechanisms.

The memory cell in LSTM allows for the storage and access of long-term information. It maintains a hidden state that can be updated selectively using gate mechanisms. The gate mechanisms include the input gate, forget gate, and output gate, which control the flow of information in and out of the memory cell. The input gate determines how much new information should be stored in the memory cell, while the forget gate determines how much old information should be discarded. The output gate controls the information that is outputted from the memory cell.

By incorporating memory cells and gate mechanisms, LSTM can effectively capture long-term dependencies in sequential data. It can learn and remember information over extended periods, allowing it to model and predict sequences with long gaps between relevant events. This makes LSTM a powerful tool for tasks such as language modeling, machine translation, speech recognition, and time series analysis.

4.2.2 Gated Recurrent Unit (GRU) model

Gated Recurrent Unit (GRU) is a variation of Recurrent Neural Networks (RNNs) used for processing sequential data such as time series, speech, and text [22, 23]. It addresses the issue of vanishing or exploding gradients that can occur in RNNs when dealing with long sequences. These issues arise from the fact that gradients may decrease (vanish) or increase (explode) with the increase in time steps during the backpropagation process, making it difficult for the network to learn and remember long-term information. GRU is designed to effectively store and process information over long sequences. It is a simplified version of LSTM, reducing the number of gates and merging the cell state and hidden state.

A key feature of GRU is its update gate and reset gate. The update gate determines the extent to which the old hidden state should be retained when updating the hidden state, while the reset gate determines the extent to which the old hidden state should be discarded when computing the new candidate hidden state. These gates allow GRU to capture long-term dependencies in time series effectively. See Fig 3 for the graphical representation.

The mathematical expressions for GRU are as follows [22]:

Reset Gate: r_t = σ(W_rx_t + U_rh_t−1 + b_r)
Update Gate: z_t = σ(W_zx_t + U_zh_t−1 + b_z)
Candidate Hidden State: ${\tilde{h}}_{t} = ϕ (W_{h} x_{t} + U_{h} (r_{t} ⊙ h_{t - 1}) + b_{h})$
Final Hidden State: $h_{t} = (1 - z_{t}) ⊙ h_{t - 1} + z_{t} ⊙ {\tilde{h}}_{t}$

where the operator ⊙ represents the element-wise multiplication (Hadamard product). In these equations: x_t is the input vector, h_t is the output vector, ${\tilde{h}}_{t}$ is the candidate activation vector, z_t is the update gate vector, r_t is the reset gate vector, W, U, and b are the parameter matrices and vectors.

GRU has the following advantages, making it more suitable for modeling and analyzing sequence data in certain scenarios compared to Long Short-Term Memory (LSTM) networks:

Simplified structure: GRU has two gates (reset gate and update gate) compared to LSTM’s three gates (forget gate, input gate, and output gate). With fewer parameters, GRU is typically easier to train and faster than LSTM.
Computational efficiency: Due to fewer parameters, GRU is computationally efficient, providing substantial benefits when dealing with large-scale data.
Suitable for short and low-complexity sequences: Studies have shown that GRU performs better than LSTM networks on low-complexity sequences.

5 Algorithm procedure for hybrid model

In many time series models, both linear and nonlinear relationships are considered. The ARIMA model performs well in capturing linear relationships in time series data, but it has limitations in modeling nonlinear relationships. The LSTM model, on the other hand, is capable of modeling both linear and nonlinear relationships, but its performance may vary when applied to different datasets. To achieve optimal prediction results, researchers have adopted hybrid models that leverage the principles of separately modeling the linear and nonlinear components of time series. These models have achieved great success in time series analysis and prediction by utilizing various deep learning algorithms to achieve better estimation performance than constructive learning algorithms. Additionally, these models belong to supervised learning algorithms, which can be used for training and prediction to achieve superior results. The process of predicting price differentials in financial time series using hybrid models is a complex and meticulous task that requires the application of various statistical and financial methods and models. The specific steps involved are as follows Fig 4.:

Perform descriptive statistical analysis on the high-frequency financial time series to observe its distributional characteristics. This step serves as the foundation of the prediction process, as understanding the basic properties of the data allows for a better grasp of its structure and patterns. To meet the requirements of the model, data preprocessing is performed to eliminate the influence of scale and improve the accuracy of predictions. Additionally, the dataset is divided into corresponding training and validation sets.
Construct an ARIMA model based on the preprocessed data. Initially, the order of differencing d is determined through an augmented Dickey-Fuller (ADF) test, and then the values of p and q are determined by the process involves iteration and searching through various possible values to find the optimal parameters. In this process, an ARIMA model is constructed for each combination of [0, p] and [0, q], and a grid search is conducted until the Akaike information criterion (AIC) reaches its minimum value. The AIC is a widely used model selection criterion that takes into account both the goodness of fit and the complexity of the model. Ultimately, the values of p and q that minimize the AIC are chosen as the parameters, serving as the benchmark for determining the best model parameters [21].
Calculate the residuals of the optimized ARIMA model and model them using GRU and LSTM models. Taking ARIMA-LSTM as an example, the process involves constructing feature matrices for the training and testing sets based on the time series data. The training set includes historical observations and lagged terms as input features, while the testing set only includes lagged terms as input features. Subsequently, the feature matrices of the training and testing sets are appropriately formatted to meet the input requirements of the LSTM model. A LSTM model with 32 neurons is then built, with a fully connected layer added to the model. The mean absolute error (MAE) is chosen as the loss function for the model, and the Adam optimizer is used for parameter optimization. During the training phase, the model is trained through multiple iterations using the training set, with 50 epochs and a batch size of 16. Finally, the trained model is used to make predictions on the testing set, resulting in the predicted outcomes. This approach represents the concept of a “hybrid model,” where the strengths of both statistical models and machine learning models are combined. The ARIMA model captures the autocorrelation of the data, while the GRU and LSTM models are used to simulate and predict complex nonlinear relationships [23, 24].
Lastly, the GRU and LSTM models trained on the residuals are combined with the optimized ARIMA model to obtain the final modified model. This approach represents a typical ensemble learning method, which aims to combine the predictions of multiple models to achieve better prediction performance. The advantage of this approach is that it leverages the strengths of multiple models and can reduce bias and variance, thereby improving the accuracy and robustness of predictions [22].

6 Evaluation criteria

In evaluating the performance of prediction models, several evaluation criteria are commonly used. The following are some of the evaluation metrics commonly employed. In the formulas below, f_i represents the predicted value from the ith model, y_i represents the actual value, n represents the number of sample data points, and L denotes the maximum likelihood function value of the model:

The Akaike Information Criterion (AIC) is a criterion used for model selection [25]. It is calculated as follows:
$\begin{matrix} AIC = 2 k - 2 ln L (\hat{Θ}); \end{matrix}$ (3)
where k is the number of parameters in the model. A lower AIC indicates a better fit or a lower complexity of the model. Therefore, the goal of model selection is to find the model that minimizes the AIC value.
The Mean Squared Error (MSE) is a commonly used metric for assessing the accuracy of prediction models. It measures the average of the squared differences between predicted values and actual observed values. A smaller MSE value indicates better predictive performance of the model, as it implies a smaller gap between the predicted values and the actual observed values. Conversely, a larger MSE value indicates poorer predictive performance of the model.
$\begin{matrix} MSE = \frac{1}{n} \sum_{i = 1}^{n} {(f_{i} - y_{i})}^{2} \end{matrix}$ (4)
Mean Absolute Error (MAE) is a commonly used metric for evaluating the accuracy of a prediction model. It represents the average of the absolute differences between the predicted values and the actual observations. Compared to Mean Squared Error (MSE), MAE does not heavily penalize large errors in the model’s predictions since it does not involve squaring the differences. Therefore, MAE is equally sensitive to both large and small errors in the model’s predictions, making it more reflective of the actual prediction performance of the model, especially in the presence of noisy data.
$\begin{matrix} MAE = \frac{1}{n} \sum_{i = 1}^{n} | f_{i} - y_{i} | \end{matrix}$ (5)
The Root Mean Squared Error (RMSE) is a commonly used metric for measuring the accuracy of prediction models. It represents the square root of the average of the squared differences between the predicted values and the actual observed values.
$\begin{matrix} RMSE = \sqrt{M S E} \end{matrix}$ (6)
The Mean Absolute Percentage Error (MAPE) calculates the average of the absolute differences between the predicted values and the actual values, divided by the actual values, and multiplied by 100. MAPE has an important limitation, which is that it can produce extremely large errors or undefined results when the actual values are close to or equal to zero, as dividing by zero is not possible. Therefore, MAPE is not suitable for all situations, especially when there are actual values that are zero or close to zero. In such cases, alternative error metrics may be needed.
$\begin{matrix} MAPE = \frac{100}{n} \sum_{i = 1}^{n} | \frac{f_{i} - y_{i}}{y_{i}} | \end{matrix}$ (7)
The Root Mean Square Percentage Error (RMSPE) is a metric used to measure prediction errors, quantifying the root mean square of the relative errors between predicted and actual values. Similar to Root Mean Squared Error (RMSE), RMSPE also encounters a similar issue as MAPE when the actual values approach or equal zero, which may result in significant errors or undefined results.
$\begin{matrix} RMSPE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(1 - \frac{y_{i}}{f_{i}})}^{2}} \end{matrix}$ (8)
The Symmetric Mean Absolute Percentage Error (SMAPE) is a metric that differs from the traditional Mean Absolute Percentage Error (MAPE) in its symmetrical nature. SMAPE assigns equal penalty to cases where the predicted values are higher or lower than the actual values. However, it is important to note that while SMAPE provides fairness in handling overestimation and underestimation of the predictions compared to the actual values, it can become highly sensitive when the actual values approach zero.
$\begin{matrix} SMAPE = \frac{100}{n} \sum_{i = 1}^{n} \frac{| f_{i} - y_{i} |}{(| y_{i} | + | f_{i} |) / 2} \end{matrix}$ (9)

7 Results and discussion

The performance metrics are presented in the following Table 3:

Table 3. Model performance metrics.

Index	ARIMA	ARIMA-LSTM	LSTM	GRU	ARIMA-GRU
MSE	0.462470	0.399819	0.4205262	0.426561	0.399764
RMSE	0.680052	0.632312	0.64848	0.653116	0.632269
MAE	0.383785	0.346311	0.3621097	0.367069	0.346333
MAPE	Inf	Inf	Inf	Inf	Inf
SMAPE	0.352197	0.334971	0.37839344	0.376948	0.335221
RMSPE	Inf	Inf	Inf	Inf	Inf

Open in a new tab

Note: Inf indicates cases where the MAPE and RMSPE calculations were not feasible due to division by near-zero values. Bold values represent the best performance metrics across the models.

According to the MSE metric, the ARIMA-GRU model exhibits the lowest prediction error, indicating that this model has a smaller average error in forecasting the ‘opening price difference’ compared to other models. Therefore, based on this metric, the ARIMA-GRU model can be considered as the optimal model.

Considering the RMSE metric, the ARIMA-GRU model has the lowest standard deviation of prediction errors, which is 0.632269. Therefore, based on this metric, the ARIMA-GRU model is considered to have the best predictive performance, with relatively stable predictions that closely align with the actual observed values.

Using the Mean Absolute Error (MAE) metric to measure the average magnitude of prediction errors, we observe that the ARIMA model has an MAE of 0.383785, ARIMA-LSTM model has 0.346311, LSTM model has 0.3621097, GRU model has 0.367069, and ARIMA-GRU model has 0.346333. Based on the MAE metric, the ARIMA-LSTM model has the smallest average absolute error, indicating that its predictions have a smaller average deviation from the actual observed values. Therefore, it is considered the optimal model based on this metric.

Using the Symmetric Mean Absolute Percentage Error (SMAPE) metric to measure the relative magnitude of prediction errors, we observe that the ARIMA model has an SMAPE value of 0.352197, ARIMA-LSTM model has 0.334971, LSTM model has 0.37839344, GRU model has 0.376948, and ARIMA-GRU model has 0.335221. According to the SMAPE metric, the ARIMA-LSTM model has a relatively lower prediction error, with an SMAPE of 0.334971, indicating smaller relative errors between its predictions and the actual observed values. Therefore, based on this metric, the ARIMA-LSTM model is considered to have the best predictive performance.

8 Conclusion

In conclusion, different conclusions can be drawn based on different evaluation metrics. The ARIMA-GRU model exhibits superior predictive performance in terms of MSE and RMSE metrics, while the ARIMA-LSTM model performs better in terms of MAE and SMAPE metrics. The hybrid models, ARIMA-LSTM and ARIMA-GRU, outperform the individual deep learning models (LSTM and GRU) and the time series model (ARIMA) in predicting the opening price difference dataset. This indicates that combining time series models and deep learning models can enhance predictive performance by leveraging their respective strengths. Hybrid models can better capture long-term trends and short-term fluctuations in the data while possessing stronger nonlinear modeling capabilities. Therefore, in this study, the ARIMA-LSTM and ARIMA-GRU models are considered more effective predictive models. The reasons for this phenomenon can be attributed to the following three main factors:

Advantages of the hybrid models: The hybrid models, ARIMA-LSTM and ARIMA-GRU, combine the strengths of time series models and deep learning models, effectively leveraging their distinct characteristics in prediction. The time series model, ARIMA, is capable of capturing trends and seasonal variations in the data, while the deep learning models, LSTM and GRU, possess powerful nonlinear modeling capabilities. By combining them, the hybrid models can better capture the long-term trends and short-term fluctuations in the data, thereby improving prediction accuracy and stability.
Adaptability to data characteristics: In the given dataset, the opening price difference may exhibit certain nonlinear relationships and temporal dependencies. Deep learning models are better at capturing and modeling such nonlinear relationships, while time series models can account for the temporal dependencies in the data. By combining these two types of models, the hybrid models can comprehensively consider the data’s characteristics, thereby enhancing prediction accuracy.
Parameter optimization and tuning: During the model development process, we performed parameter optimization and tuning for each model to achieve optimal performance. For the ARIMA model, we employed automated parameter selection methods to ensure a good fit of the model. For the deep learning models, we optimized the network structure and adjusted hyperparameters to achieve the best predictive results.

In conclusion, the aforementioned conclusions are attributed to the ability of the hybrid models, ARIMA-LSTM and ARIMA-GRU, to effectively leverage the advantages of time series models and deep learning models, thereby capturing the trends and fluctuations in the data more accurately. Additionally, the optimization of parameters and experimental design provide support for the reliability of these conclusions. The findings of this study hold significant value for decision-making and investment strategy formulation in related fields. In addressing the weaknesses and limitations of this study, it is essential to critically assess the contribution and interpretability of the deep learning models within our hybrid modeling approach. While the integration of ARIMA with LSTM and GRU models has demonstrated enhanced predictive accuracy for the opening price spread, a notable challenge lies in the opaque nature of deep learning models, often referred to as “black boxes.” This aspect can hinder our ability to fully understand and explain the specific factors driving the model’s predictions. Consequently, although our hybrid models capitalize on the strengths of both time series and deep learning techniques to capture market dynamics effectively, the inherent complexity and lack of transparency in the deep learning components may limit the interpretability of the results. The future research directions include:

Identification of Key Covariates Influencing the Opening Price Spread: Future studies should focus on uncovering the critical covariates that significantly impact the opening price difference rate. This entails conducting comprehensive analyses to isolate and understand the effects of various economic, financial, and socio-political factors that may influence the opening price spread. Identifying these key covariates will not only enhance the understanding of the dynamics at play but also improve the predictive accuracy of models concerning opening price behavior.
More Effective Autoregressive Models for Predicting Opening Price Spread: Building upon the foundation laid by this study, future work should aim at innovating and refining autoregressive models specifically tailored for predicting the opening price spread. This includes exploring advanced statistical techniques, incorporating machine learning algorithms, and integrating high-frequency trading data to capture the nuances of price movements more accurately. The goal is to develop models that are not only robust and reliable but also capable of adapting to the evolving nature of financial markets.
Exploring the Fundamental Mechanisms Behind Opening Price Spread Formation: Beyond predictive modeling, it is imperative to investigate the underlying mechanisms that give rise to the opening price spread from a price formation perspective. This research avenue involves a detailed examination of the market microstructure, the role of investor sentiment, and the impact of overnight news and events on price setting at market open. Understanding the essence of how opening price spreads are formed will contribute significantly to the knowledge of market efficiency, liquidity dynamics, and the broader economic implications tied to these phenomena.

Through a combination of empirical investigation and theoretical exploration, we aim to uncover deeper insights into the opening price spread, thereby aiding investors, market analysts, and policymakers in making more informed decisions.

Supporting information

S1 Data

(CSV)

pone.0299164.s001.CSV^{(644.2KB, CSV)}

Data Availability

All relevant data are within the paper and its Supporting information files.

Funding Statement

The author(s) received no specific funding for this work.;

References

1. Jiang GJ, Zhu KX. Information Shocks and Short-Term Market Underreaction. Journal of Financial Economics. 2017;124:43–64. doi: 10.1016/j.jfineco.2016.06.006 [DOI] [Google Scholar]
2.Lo AW. Long-Term Memory in Stock Market Prices; 1991.
3. Plastun A, Kozmenko S, Plastun V, Filatova H. Market anomalies and data persistence: the case of the day-of-the-week effect. Journal of International Studies. 2019;12:122–130. doi: 10.14254/2071-8330.2019/12-3/10 [DOI] [Google Scholar]
4.Changtai L, Huang W, Wei-Siang, Wang, mun Chia W. Price Change, Trading Volume and Heterogeneous Beliefs in Stock Market. 2017;.
5. fang Su Z, Bao H, Li Q, Xu B, Cui X. The prediction of price gap anomaly in Chinese stock market: Evidence from the dependent functional logit model. Finance Research Letters. 2022;null:null. 10.1016/j.frl.2022.102702 [DOI] [Google Scholar]
6. Avishay A, Gil C, Vladimir G. Stocks Opening Price Gaps and Adjustments to New Information. Computational Economics. 2023. doi: 10.1007/s10614-023-10363-w [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Guo K, Sun Y, Qian X. Can investor sentiment be used to predict the stock price? Dynamic analysis based on China stock market. Physica A: Statistical Mechanics and its Applications. 2017;469:390–396. doi: 10.1016/j.physa.2016.11.114 [DOI] [Google Scholar]
8. Plastun A, Sibande X, Gupta R, Wohar ME. Price gap anomaly in the US stock market: The whole story. North American Journal of Economics and Finance. 2020;52. doi: 10.1016/j.najef.2020.101177 [DOI] [Google Scholar]
9.Ayyappa Y, Kumar A. A Compact Literature Review on Stock Market Prediction. 2022 4th International Conference on Inventive Research in Computing Applications (ICIRCA). 2022;null:1336–1347. 10.1109/ICIRCA54612.2022.9985501 [DOI]
10.Fama EF. The Behavior of Stock-Market Prices; 1965. Available from: https://www.jstor.org/stable/2350752.
11.Fama EF. Efficient Capital Markets: A Review of Theory and Empirical Work; 1970.
12. Tetlock PC. Does public financial news resolve asymmetric information? Review of Financial Studies. 2010;23:3520–3557. doi: 10.1093/rfs/hhq052 [DOI] [Google Scholar]
13. Caporale GM, Plastun A. Price gaps: Another market anomaly? Investment Analysts Journal. 2017;46:279–293. doi: 10.1080/10293523.2017.1333563 [DOI] [Google Scholar]
14. Ho HW, Hsiao YJ, Lo WC, Yang NT. Momentum investing and a tale of intraday and overnight returns: Evidence from Taiwan. Pacific-Basin Finance Journal. 2023;82:102151. doi: 10.1016/j.pacfin.2023.102151 [DOI] [Google Scholar]
15. Zhang H, Tsai WC, Weng PS, Tsai PC. Overnight returns and investor sentiment: Further evidence from the Taiwan stock market. Pacific-Basin Finance Journal. 2023;80:102093. doi: 10.1016/j.pacfin.2023.102093 [DOI] [Google Scholar]
16. Aboody D, Even-Tov O, Lehavy R, Trueman B. Overnight Returns and Firm-Specific Investor Sentiment. Journal of Financial and Quantitative Analysis. 2018;53(2):485–505. doi: 10.1017/S0022109017000989 [DOI] [Google Scholar]
17. Li C, Huang W, Wang WS, mun Chia W. Price Change and Trading Volume: Behavioral Heterogeneity in Stock Market. Computational Economics. 2021;61:677–713. doi: 10.1007/s10614-021-10224-4 [DOI] [Google Scholar]
18. Chi L, Zhuang X, Song D. Investor sentiment in the Chinese stock market: An empirical analysis. Applied Economics Letters. 2012;19:345–348. doi: 10.1080/13504851.2011.577003 [DOI] [Google Scholar]
19. Li AW, Bastos GS. Stock market forecasting using deep learning and technical analysis: a systematic review. IEEE access. 2020;8:185232–185242. doi: 10.1109/ACCESS.2020.3030226 [DOI] [Google Scholar]
20. Si Y, Nadarajah S. A Statistical Analysis of Chinese Stock Indices Returns From Approach of Parametric Distributions Fitting. Annals of Data Science. 2023;10:73–88. doi: 10.1007/s40745-022-00421-9 [DOI] [Google Scholar]
21. Hamilton JD. Time series analysis. Princeton university press; 2020. [Google Scholar]
22. Gulli A, Pal S. Deep learning with Keras. Packt Publishing Ltd; 2017. [Google Scholar]
23. Moghar A, Hamiche M. Stock market prediction using LSTM recurrent neural network. Procedia Computer Science. 2020;170:1168–1173. doi: 10.1016/j.procs.2020.03.049 [DOI] [Google Scholar]
24. Brownlee J. Deep learning for time series forecasting: predict the future with MLPs, CNNs and LSTMs in Python. Machine Learning Mastery; 2018. [Google Scholar]
25. Akaike H. A new look at the statistical model identification. IEEE transactions on automatic control. 1974;19(6):716–723. doi: 10.1109/TAC.1974.1100705 [DOI] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0299164.r001

Decision Letter 0

Muhammad Usman Tariq

26 Jan 2024

PONE-D-23-36601Modeling Opening Price Spread of Shanghai Composite Index Based on ARIMA-GRU/LSTM Hybrid ModelPLOS ONE

Dear Dr. Si,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

ACADEMIC EDITOR:

The reviewers have provided detailed feedback that highlights several areas needing significant improvement. Below, I have summarised the key points from the reviewers, along with additional guidance on how to address these concerns effectively.

Reviewer #1 and #3 have pointed out the need for a more concise and structured abstract. Ensure that the abstract clearly states the research problem, methodology, main findings, and significance. It should not exceed 250 words.
Reviewer #1 suggests incorporating a data link within the manuscript for transparency and reproducibility. Please provide a direct link or a DOI where the data can be accessed.
There is inconsistency noted in section numbering. Decide on a format and apply it consistently throughout the document.
Table 3 should highlight the most significant values. Consider using bold text or color-coding to draw attention to key data.
Clarify the choice of the ARIMA model in the Model Composition section as recommended by Reviewer #1.
Include more recent studies in your references to ensure the research is grounded in current knowledge, as suggested by Reviewer #1.
Review and correct the citation indexing, starting with [1] as opposed to [13], and ensure all citations are present and correctly numbered.
Reviewer #2 requests a revision of the abstract to include the introduction/significance of the study, research aims, methodology, and main conclusions.
Ensure that acronyms are defined at first mention and used consistently thereafter.
A section outlining the paper's organization should be included at the end of the introduction. Also, add a comprehensive literature review to demonstrate the novelty of your study.
All variables used in equations should be clearly explained for the reader's understanding.
Discuss the implications, limitations, and directions for future research to provide context and potential for further study.
The paper currently lacks a dedicated conclusion section, which is imperative for summarising the study and its findings.
Address the technical issues mentioned by Reviewer #2 and conduct a meticulous proofread to enhance the paper's quality.
Reviewer #3 suggests revising the title to make it more engaging. Also, the abstract should be rewritten to focus more on your research rather than the background.
Include a clear algorithm or process flow to support the discussions in Section 2, as indicated by Reviewer #3.
Ensure that all tables follow the PLoS ONE format, or provide justification for the current format.
Minimize the use of personal pronouns such as "Our" and replace them with an impersonal language that suits academic writing.

==============================

Please submit your revised manuscript by Mar 11 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Dr. Muhammad Usman Tariq

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. When completing the data availability statement of the submission form, you indicated that you will make your data available on acceptance. We strongly recommend all authors decide on a data sharing plan before acceptance, as the process can be lengthy and hold up publication timelines. Please note that, though access restrictions are acceptable now, your entire data will need to be made freely accessible if your manuscript is accepted for publication. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If you are unable to adhere to our open data policy, please kindly revise your statement to explain your reasoning and we will seek the editor's input on an exemption. Please be assured that, once you have provided your new statement, the assessment of your exemption will not hold up the peer review process.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: • Begin with a concise Abstract that addresses the identified issues in the forecast stock market, outlining the objectives aimed at addressing these gaps. Subsequently, provide an overview of the chosen methods, as detailed in the current Abstract. Conclude with a brief summary of the results, emphasizing their significance clearly and straightforwardly.

• It is recommended to incorporate the data link, specifying where it can be downloaded, into the manuscript.

• Please verify the formatting to determine whether each section should be numbered or not

• The author is encouraged to emphasize the highest value in Table 3 through techniques such as highlighting, color-coding, or bold formatting.

• It is suggested that the reasons for selecting ARIMA as the baseline model in Section Results and Discussion, paragraph 1, be elucidated in the Model Composition-ARIMA section.

• Regarding the references provided, the author has 10 recent publications (47%) out of 21 listed. A recent publication is counted from 2019 onwards. Thus, authors are advised to add more recent publications to support their literature.

• The citation indexing seems wrong. It should start with [1] instead of [13]. Please redo the citation.

• Please review all your citations, as the citation index for bullet point number 2 in the Introduction Section appears to be missing.

Reviewer #2: The chapter is in good shape, but it needs to be modified before resubmission. There are a few comments that may help to enhance the quality of the paper.

1. An abstract need to be revised and expanded: Abstract of a research paper is typically 200 to 400 words in length, and 150 to 300 words for a review paper.

2. The abstract should highlight the objectives of the contribution. Remember, an abstract is often the first and sometimes the only part of a document that people read, so it should effectively convey the main points and encourage further exploration.

3. The abstract is wordy and not informative. The structure of the abstract needs revision. Revise the abstract to provide.

a. the introduction/significance of the study,

b. the aim of the study,

c. the research methodology,

d. the major conclusion of the study

4. To ensure clarity and consistency, Gated Recurrent Unit (GRU) is spelled out initially, and then "GRU" is used as the abbreviation throughout the rest of the chapter.

5. The organization of the paper is missing that needs to be added at end of the introduction section after the main contributions.

6. The related work (literature review) part is missing. It should be added and discuss the current result with previous studies to show the novelty of the study.

7. Related work or literature review can be highlighted. However, it is difficult to compare the current study with previous studies.

8. The author should explain all the variables used in the equation.

9. implications, limitations and future work can be included.

10. The paper lacks a dedicated conclusion section, which is an essential component of scholarly writing.

11. The manuscript should be read more carefully to improve the paper quality because some technical weaknesses are found in it.

Reviewer #3: A. Main title is not attractive – the author should change the main title of the article.

B. Authors should follow the following procedure for abstract writing

Abstract is vague: The abstract needs to be rewritten professionally; there should be some background knowledge of the area, existing problems, and novel ideas that address the problem.

a) Firstly, an abstract should summarize the major aspects of the entire paper:

b) The overall purpose of the study

c) Basic methodology of your research

d) Major findings as a result of your analysis

e) A brief summary of your interpretations and conclusions.

Please adjust your Abstract according to the aforementioned logic and ensure that most of your Abstract is about your research, not the context. Most of your abstract states the inappropriate research background. Please add more content about your research. What's more, an abstract of a research paper is typically 150 to 250 words; please modify it. Moreover, it is suggested that remove the personal pronoun "Our", which is found extensively throughout the paper, and replace it with something like "This paper..." or "This research work...".

C. Contribution statements are vague

D. Motivation is missing

E. The figure 3 doesn’t elaborate the main idea

F. In section 2, algorithm is not given. The author must demonstrate the algorithm for which they expressed their views.

G. Weather the tables are according to PLoS ONE format, if not, then tell us what you meant by table 3

H. Separate conclusion is required.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: ASRAFUL SYIFAA AHMAD

Reviewer #2: Yes: Samina Amin

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Mar 13;19(3):e0299164. doi: 10.1371/journal.pone.0299164.r002

Author response to Decision Letter 0

1 Feb 2024

Response to Reviewers

Response to Reviewer #1:We sincerely appreciate your valuable and constructive feedback on our manuscript. Your insightful suggestions have been instrumental in guiding us to substantially improve the quality of our work. We have carefully considered each of your recommendations and have made corresponding revisions to our manuscript, particularly focusing on enhancing the clarity, depth, and rigor of our abstract.

• It is recommended to incorporate the data link, specifying where it can be downloaded, into the manuscript.

Response to Reviewer #1:Thank you very much for your valuable suggestion regarding the inclusion of a data link in our manuscript. We deeply appreciate your attention to detail and your commitment to ensuring the accessibility and reproducibility of our research.

In response to your recommendation, we have updated the manuscript to include specific information on where the data can be downloaded. We have provided the website address where the dataset for the Shanghai Composite Index, covering the period from December 20, 1990, to June 2, 2023, is publicly available.

Please note that accessing the dataset requires registration on the website or contacting the corresponding author via email to request the data. We believe this procedure ensures the data's integrity while making it accessible to researchers and practitioners interested in replicating or extending our study.

We have made the necessary modifications to the manuscript to reflect this change and hope it meets your approval. Once again, we are grateful for your insightful comments and guidance, which have undeniably enhanced the quality of our work.

• Please verify the formatting to determine whether each section should be numbered or not

Response to Reviewer #1:Thank you very much for your guidance regarding the formatting of our manuscript, specifically the numbering of each section.

Following your suggestion, we have carefully reviewed the formatting guidelines of the journal and adjusted the manuscript accordingly. Each section is now numbered consistently throughout the document to ensure clarity and ease of navigation for readers.

• The author is encouraged to emphasize the highest value in Table 3 through techniques such as highlighting, color-coding, or bold formatting.

Response to Reviewer #1:Thank you very much for your constructive suggestion regarding the presentation of Table 3 in our manuscript. Your advice to emphasize the highest value in each performance metric through highlighting, color-coding, or bold formatting is greatly appreciated and has been instrumental in enhancing the clarity and impact of our findings.

In response to your recommendation, we have revised Table 3 by marking the most significant model figures for each indicator in bold red. This adjustment not only makes it easier for readers to identify the key results at a glance but also accentuates the superior performance of certain models in our analysis.

• It is suggested that the reasons for selecting ARIMA as the baseline model in Section Results and Discussion, paragraph 1, be elucidated in the Model Composition-ARIMA section.

Response to Reviewer #1: Thank you for your valuable suggestion. We recognize the importance of providing a comprehensive rationale for our methodological choices to our readers and appreciate your guidance in this regard.

In response to your recommendation, we have revised the manuscript structure accordingly. The "Model Composition-ARIMA" section now includes a detailed explanation of our decision to utilize ARIMA as the baseline model for our analysis. This amendment ensures a clearer understanding of the model's significance and its foundational role in our research.

Response to Reviewer #1: Thank you sincerely for your valuable suggestion regarding the inclusion of more recent publications in our references to strengthen and update our literature review. We deeply appreciate your attention to the relevance and timeliness of the research we cite, which is crucial for maintaining the academic rigor and currency of our work. In response to your advice, we have carefully reviewed the latest literature and are pleased to inform you that we have incorporated an additional three recent publications into our reference list. Notably, two of these publications focus on the analysis of the Taiwan stock market, with both being published after September 2023. These recent studies provide cutting-edge insights into the dynamics of the Taiwan stock market, further enriching our discussion and supporting our analysis with the most current research findings.

We are grateful for your guidance, which has significantly contributed to enhancing the quality and relevance of our manuscript. Your suggestion has helped us ensure that our literature review reflects the latest developments and scholarly discussions in the field.

• The citation indexing seems wrong. It should start with [1] instead of [13]. Please redo the citation.

• Please review all your citations, as the citation index for bullet point number 2 in the Introduction Section appears to be missing.

Response to Reviewer #1:Thank you for bringing the issues regarding citation indexing and the missing citation in bullet point number 2 of the Introduction Section to our attention. Your meticulous review of our manuscript and detailed feedback are truly invaluable to us.

Following your suggestions, we have thoroughly reviewed and corrected the citation indexing throughout our manuscript, ensuring that it now starts with [1] as it should. Additionally, we have addressed the missing citation in the Introduction Section, ensuring that all statements are appropriately supported by relevant references.

Reviewer #2: The chapter is in good shape, but it needs to be modified before resubmission. There are a few comments that may help to enhance the quality of the paper.

1. An abstract need to be revised and expanded: Abstract of a research paper is typically 200 to 400 words in length, and 150 to 300 words for a review paper.

3. The abstract is wordy and not informative. The structure of the abstract needs revision. Revise the abstract to provide.

a. the introduction/significance of the study,

b. the aim of the study,

c. the research methodology,

d. the major conclusion of the study

Response to Reviewer #2:Thank you immensely for your constructive comments and suggestions aimed at enhancing the quality of our paper. We are truly appreciative of the time and effort you have devoted to reviewing our work and providing such detailed feedback.

In response to your valuable guidance, we have undertaken a thorough revision of our abstract. Acknowledging your observation that our initial abstract was wordy and lacked clarity, we have expanded and restructured it to ensure it falls within the recommended length of 200 to 400 words for a research paper.

4. To ensure clarity and consistency, Gated Recurrent Unit (GRU) is spelled out initially, and then "GRU" is used as the abbreviation throughout the rest of the chapter.

Response to Reviewer #2:We are grateful for your suggestion regarding the use of abbreviations in our chapter, particularly your advice on the consistent use of "Gated Recurrent Unit (GRU)" and its abbreviation. Your attention to detail and emphasis on clarity and consistency are highly valued and have significantly contributed to enhancing the readability and professionalism of our manuscript.

In line with your recommendation, we have carefully revised our chapter to ensure that "Gated Recurrent Unit (GRU)" is fully spelled out at its first mention, with the abbreviation "GRU" consistently used thereafter. Similarly, we have applied this approach to other key terms within our chapter, such as "Autoregressive Integrated Moving Average (ARIMA)" and "Long Short-Term Memory (LSTM)", to maintain uniformity and clarity throughout the text.

5. The organization of the paper is missing that needs to be added at end of the introduction section after the main contributions.

Response to Reviewer #2: We are grateful for your suggestion, in response to the valuable feedback provided by the reviewer, we have carefully revised the manuscript to include a detailed outline of the paper's organization at the end of the introduction section. This addition aims to enhance the clarity and navigability of the paper for our readers. We sincerely appreciate the reviewer's constructive suggestions and have endeavored to address them thoroughly in our revision. Thank you for the opportunity to improve our work.

6. The related work (literature review) part is missing. It should be added and discuss the current result with previous studies to show the novelty of the study.

Response to Reviewer #2: We sincerely appreciate the reviewer's insightful suggestions and have incorporated a detailed literature review section to address this feedback. This addition not only contextualizes our research within the existing body of knowledge but also highlights the novel contributions of our study. We are grateful for the guidance provided, which has undoubtedly strengthened the quality and depth of our manuscript.

7. Related work or literature review can be highlighted. However, it is difficult to compare the current study with previous studies.

Response to Reviewer #2: We deeply appreciate the reviewer's observations and understand the concern regarding the comparison of our study with previous research. Our investigation targets a relatively new area within financial modeling, employing novel modeling techniques that diverge from traditional approaches. Consequently, the scarcity of literature reviews directly related to our specific research domain is notable. Most existing studies in this area have primarily focused on exploring the underlying causes of opening price differences, which does not align directly with our research direction, emphasizing predictive modeling using advanced techniques. We acknowledge this gap and are actively seeking to bridge it with our current and forthcoming research. We have similar works under review and would greatly value any insightful comments and suggestions from the reviewer in the future. This feedback will be instrumental in refining our research and contributing meaningfully to the field.

8. The author should explain all the variables used in the equation.

Response to Reviewer #2:We sincerely appreciate the reviewer's constructive feedback regarding the explanation of variables within our equations. Following your suggestion, we have meticulously reviewed our manuscript and ensured that each variable used in our equations is now clearly defined and explained. This enhancement aims to improve the clarity and comprehensibility of our mathematical modeling, facilitating a better understanding of our research methodology for readers. We are grateful for this opportunity to refine our work and thank the reviewer for their valuable input.

9. implications, limitations and future work can be included.

Response to Reviewer #2: In response to the reviewer's valuable feedback, we have meticulously incorporated sections on implications, limitations, and future work into the conclusion of our manuscript. We express our sincere gratitude for the constructive suggestions provided, as they have significantly enriched the depth and scope of our study. Through these additions, we aim to not only highlight the practical relevance of our findings but also acknowledge the constraints of our research approach and outline promising avenues for subsequent investigations. We hope these revisions meet the reviewer's expectations and contribute to a more comprehensive understanding of our study's contributions to the field.

10. The paper lacks a dedicated conclusion section, which is an essential component of scholarly writing.

Response to Reviewer #2:In response to the reviewer's insightful observation, we have now added a dedicated conclusion section to our manuscript. We appreciate the guidance provided, recognizing the importance of a conclusion in encapsulating the key findings, implications, and future directions of our research. This addition aims to succinctly summarize the study's contributions to the field, offering readers a clear understanding of its significance and potential impact. We are grateful for the opportunity to enhance our manuscript based on the reviewer's valuable feedback.

11. The manuscript should be read more carefully to improve the paper quality because some technical weaknesses are found in it.

Response to Reviewer #2:In response to the reviewer's comment, we have conducted a thorough review and revision of our manuscript to address the technical weaknesses identified. We acknowledge that there may still be areas for improvement and sincerely appreciate the feedback provided. We commit to continuous efforts in refining our work and enhancing the quality of our paper. Thank you for bringing these issues to our attention, and we welcome any further suggestions that can aid in our manuscript's improvement.

Reviewer #3: A. Main title is not attractive – the author should change the main title of the article.

Response to Reviewer #3: We sincerely appreciate your valuable suggestion to make the main title of our article more attractive. Currently, our manuscript is undergoing a major revision process. We are uncertain about the PLOS ONE policy regarding title changes at this stage of the revision. Therefore, we have not modified the title as of now. However, we are open to reconsidering and adjusting the title to better reflect the content and appeal of our research, should the opportunity arise during the later stages of the review process or as permitted by the journal's guidelines.

B. Authors should follow the following procedure for abstract writing

Abstract is vague: The abstract needs to be rewritten professionally; there should be some background knowledge of the area, existing problems, and novel ideas that address the problem.

a) Firstly, an abstract should summarize the major aspects of the entire paper:

b) The overall purpose of the study

c) Basic methodology of your research

d) Major findings as a result of your analysis

e) A brief summary of your interpretations and conclusions.

Response to Reviewer #3: We sincerely thank you for your constructive comments and have revised our abstract accordingly. We have ensured that the abstract now succinctly summarizes the major aspects of our paper, including the study's purpose, the basic methodology employed, the major findings from our analysis, and a brief summary of our interpretations and conclusions. We have also adjusted the length of the abstract to fit within the recommended range of 150 to 250 words and carefully replaced personal pronouns such as "Our" with "This paper" or "This research work" to maintain a professional and objective tone throughout the manuscript. We believe these revisions have significantly improved the clarity and professionalism of the abstract, and we are grateful for the guidance provided.

C. Contribution statements are vague

Response to Reviewer #3: In response to the reviewer's feedback regarding the vagueness of our contribution statements, we have taken your suggestions into careful consideration and have made the necessary revisions. We have now explicitly outlined our contribution statements in Section 3 of the paper, ensuring they are clearly defined and articulated. This adjustment aims to highlight the unique contributions of our research more effectively and to provide readers with a concise understanding of the value and novelty of our work. We are grateful for your insightful recommendations and believe that these changes significantly enhance the clarity and impact of our manuscript.

D. Motivation is missing

Response to Reviewer #3:In response to Reviewer #3's feedback regarding the lack of explicit motivation in our manuscript, we express our sincere appreciation for bringing this to our attention. We have now included a section dedicated to elucidating the motivation behind our study. This section highlights the significance of exploring the opening price spread of the Shanghai Composite Index and the potential impact of our findings on financial modeling and forecasting practices. By addressing the challenges associated with traditional and contemporary modeling techniques, we aim to underscore the necessity and relevance of our research approach.

E. The figure 3 doesn’t elaborate the main idea

Response to Reviewer #3:

Many thanks for your suggestion, In response to your feedback, we have carefully considered the role of Figure 3 within our chapter and have decided to remove it.

F. In section 2, algorithm is not given. The author must demonstrate the algorithm for which they expressed their views.

Response to Reviewer #3: We greatly appreciate your valuable feedback on our manuscript. Following your suggestion, we have now included a comprehensive flowchart to visually depict the algorithm's workflow. Additionally, we have elaborated on the specifics of the algorithm within the text, ensuring a detailed explanation of its functionality and application in our research. These enhancements are aimed at providing a clearer understanding of the algorithmic approach we adopted. We hope that these revisions will adequately address your concerns and improve the manuscript's clarity and depth of technical detail.

G. Weather the tables are according to PLoS ONE format, if not, then tell us what you meant by table 3

Response to Reviewer #3: We greatly appreciate your attention to detail and guidance regarding the formatting of our tables. Following your recommendation, we have meticulously revised all tables, including Table 3, to ensure full compliance with the PLOS ONE formatting guidelines. This includes adjustments to table structure, captioning, and the addition of necessary legends and footnotes for clarity. We hope these revisions adequately address your concerns and enhance the presentation and readability of our data in alignment with the journal's standards.

H. Separate conclusion is required.

Response to Reviewer #3:

In response to the reviewer's feedback, we sincerely appreciate the guidance provided and have accordingly incorporated a dedicated conclusion section into our manuscript and we are grateful for the opportunity to improve our work based on your valuable feedback.

We express our deepest gratitude to all reviewers for their meticulous review and insightful comments on our manuscript. Your detailed feedback has been invaluable in guiding the enhancements and revisions of our work. We have taken each suggestion and critique into careful consideration, making dedicated efforts to address the identified issues and improve the overall quality, clarity, and impact of our research. The constructive feedback provided by the reviewers has undeniably enriched our manuscript, making it a more comprehensive and robust contribution to the field. We hold immense respect for the review process and sincerely appreciate the time and expertise that the reviewers have contributed to refining our work. Thank you for your support and guidance, which have been instrumental in elevating the quality of our paper.

Attachment

Submitted filename: Response to Reviewers.docx

pone.0299164.s002.docx^{(31.2KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0299164.r003

Decision Letter 1

Muhammad Usman Tariq

6 Feb 2024

Modeling Opening Price Spread of Shanghai Composite Index Based on ARIMA-GRU/LSTM Hybrid Model

PONE-D-23-36601R1

Dear Dr. Si,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Muhammad Usman Tariq

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0299164.r004

Acceptance letter

Muhammad Usman Tariq

27 Feb 2024

PONE-D-23-36601R1

PLOS ONE

Dear Dr. Si,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Muhammad Usman Tariq

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Data

(CSV)

pone.0299164.s001.CSV^{(644.2KB, CSV)}

Attachment

Submitted filename: Response to Reviewers.docx

pone.0299164.s002.docx^{(31.2KB, docx)}

Data Availability Statement

All relevant data are within the paper and its Supporting information files.

[pone.0299164.ref001] 1. Jiang GJ, Zhu KX. Information Shocks and Short-Term Market Underreaction. Journal of Financial Economics. 2017;124:43–64. doi: 10.1016/j.jfineco.2016.06.006 [DOI] [Google Scholar]

[pone.0299164.ref002] 2.Lo AW. Long-Term Memory in Stock Market Prices; 1991.

[pone.0299164.ref003] 3. Plastun A, Kozmenko S, Plastun V, Filatova H. Market anomalies and data persistence: the case of the day-of-the-week effect. Journal of International Studies. 2019;12:122–130. doi: 10.14254/2071-8330.2019/12-3/10 [DOI] [Google Scholar]

[pone.0299164.ref004] 4.Changtai L, Huang W, Wei-Siang, Wang, mun Chia W. Price Change, Trading Volume and Heterogeneous Beliefs in Stock Market. 2017;.

[pone.0299164.ref005] 5. fang Su Z, Bao H, Li Q, Xu B, Cui X. The prediction of price gap anomaly in Chinese stock market: Evidence from the dependent functional logit model. Finance Research Letters. 2022;null:null. 10.1016/j.frl.2022.102702 [DOI] [Google Scholar]

[pone.0299164.ref006] 6. Avishay A, Gil C, Vladimir G. Stocks Opening Price Gaps and Adjustments to New Information. Computational Economics. 2023. doi: 10.1007/s10614-023-10363-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0299164.ref007] 7. Guo K, Sun Y, Qian X. Can investor sentiment be used to predict the stock price? Dynamic analysis based on China stock market. Physica A: Statistical Mechanics and its Applications. 2017;469:390–396. doi: 10.1016/j.physa.2016.11.114 [DOI] [Google Scholar]

[pone.0299164.ref008] 8. Plastun A, Sibande X, Gupta R, Wohar ME. Price gap anomaly in the US stock market: The whole story. North American Journal of Economics and Finance. 2020;52. doi: 10.1016/j.najef.2020.101177 [DOI] [Google Scholar]

[pone.0299164.ref009] 9.Ayyappa Y, Kumar A. A Compact Literature Review on Stock Market Prediction. 2022 4th International Conference on Inventive Research in Computing Applications (ICIRCA). 2022;null:1336–1347. 10.1109/ICIRCA54612.2022.9985501 [DOI]

[pone.0299164.ref010] 10.Fama EF. The Behavior of Stock-Market Prices; 1965. Available from: https://www.jstor.org/stable/2350752.

[pone.0299164.ref011] 11.Fama EF. Efficient Capital Markets: A Review of Theory and Empirical Work; 1970.

[pone.0299164.ref012] 12. Tetlock PC. Does public financial news resolve asymmetric information? Review of Financial Studies. 2010;23:3520–3557. doi: 10.1093/rfs/hhq052 [DOI] [Google Scholar]

[pone.0299164.ref013] 13. Caporale GM, Plastun A. Price gaps: Another market anomaly? Investment Analysts Journal. 2017;46:279–293. doi: 10.1080/10293523.2017.1333563 [DOI] [Google Scholar]

[pone.0299164.ref014] 14. Ho HW, Hsiao YJ, Lo WC, Yang NT. Momentum investing and a tale of intraday and overnight returns: Evidence from Taiwan. Pacific-Basin Finance Journal. 2023;82:102151. doi: 10.1016/j.pacfin.2023.102151 [DOI] [Google Scholar]

[pone.0299164.ref015] 15. Zhang H, Tsai WC, Weng PS, Tsai PC. Overnight returns and investor sentiment: Further evidence from the Taiwan stock market. Pacific-Basin Finance Journal. 2023;80:102093. doi: 10.1016/j.pacfin.2023.102093 [DOI] [Google Scholar]

[pone.0299164.ref016] 16. Aboody D, Even-Tov O, Lehavy R, Trueman B. Overnight Returns and Firm-Specific Investor Sentiment. Journal of Financial and Quantitative Analysis. 2018;53(2):485–505. doi: 10.1017/S0022109017000989 [DOI] [Google Scholar]

[pone.0299164.ref017] 17. Li C, Huang W, Wang WS, mun Chia W. Price Change and Trading Volume: Behavioral Heterogeneity in Stock Market. Computational Economics. 2021;61:677–713. doi: 10.1007/s10614-021-10224-4 [DOI] [Google Scholar]

[pone.0299164.ref018] 18. Chi L, Zhuang X, Song D. Investor sentiment in the Chinese stock market: An empirical analysis. Applied Economics Letters. 2012;19:345–348. doi: 10.1080/13504851.2011.577003 [DOI] [Google Scholar]

[pone.0299164.ref019] 19. Li AW, Bastos GS. Stock market forecasting using deep learning and technical analysis: a systematic review. IEEE access. 2020;8:185232–185242. doi: 10.1109/ACCESS.2020.3030226 [DOI] [Google Scholar]

[pone.0299164.ref020] 20. Si Y, Nadarajah S. A Statistical Analysis of Chinese Stock Indices Returns From Approach of Parametric Distributions Fitting. Annals of Data Science. 2023;10:73–88. doi: 10.1007/s40745-022-00421-9 [DOI] [Google Scholar]

[pone.0299164.ref021] 21. Hamilton JD. Time series analysis. Princeton university press; 2020. [Google Scholar]

[pone.0299164.ref022] 22. Gulli A, Pal S. Deep learning with Keras. Packt Publishing Ltd; 2017. [Google Scholar]

[pone.0299164.ref023] 23. Moghar A, Hamiche M. Stock market prediction using LSTM recurrent neural network. Procedia Computer Science. 2020;170:1168–1173. doi: 10.1016/j.procs.2020.03.049 [DOI] [Google Scholar]

[pone.0299164.ref024] 24. Brownlee J. Deep learning for time series forecasting: predict the future with MLPs, CNNs and LSTMs in Python. Machine Learning Mastery; 2018. [Google Scholar]

[pone.0299164.ref025] 25. Akaike H. A new look at the statistical model identification. IEEE transactions on automatic control. 1974;19(6):716–723. doi: 10.1109/TAC.1974.1100705 [DOI] [Google Scholar]

PERMALINK

Modeling opening price spread of Shanghai Composite Index based on ARIMA-GRU/LSTM hybrid model

Yuancheng Si

Saralees Nadarajah

Zongxin Zhang

Chunmin Xu

Roles

Abstract

1 Introduction

2 Literature review

3 Date and methods

3.1 Data

3.2 Interpretation of opening price difference rate

3.3 Exploratory data analysis and unit root test

Table 1. Group counts of ‘diffrate’.

Fig 1. Histogram and time series analysis of diffrate.

Table 2. Statistics of ‘diffrate’.

3.4 Modeling and forecasting process

Fig 2. Illustration of rolling forecast methodology.

4 Model composition

4.1 ARIMA model

4.2 Deep learning models

4.2.1 Long Short-Term Memory (LSTM) model

4.2.2 Gated Recurrent Unit (GRU) model

Fig 3. Capture structure of GRU model.

5 Algorithm procedure for hybrid model

Fig 4. Algorithmic procedure overview.

6 Evaluation criteria

7 Results and discussion

Table 3. Model performance metrics.

8 Conclusion

Supporting information

Data Availability

Funding Statement

References

Decision Letter 0

Muhammad Usman Tariq

Roles

Author response to Decision Letter 0

Decision Letter 1

Muhammad Usman Tariq

Roles

Acceptance letter

Muhammad Usman Tariq

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases