A dual-path convolutional neural network combined with an attention-based bidirectional long short-term memory network for stock price prediction

Guiyan Zhao; Yunfei Cheng; Jianhui Yang; Jiayuan Ouyang

doi:10.1371/journal.pone.0319775

. 2025 Apr 22;20(4):e0319775. doi: 10.1371/journal.pone.0319775

A dual-path convolutional neural network combined with an attention-based bidirectional long short-term memory network for stock price prediction

Guiyan Zhao ¹, Yunfei Cheng ^2,^*, Jianhui Yang ¹, Jiayuan Ouyang ¹

Editor: Jiwei Tian³

PMCID: PMC12014148 PMID: 40262076

Abstract

The complexities of stock price data, characterized by its nonlinearity, non-stationarity, and intricate spatiotemporal patterns, make accurate prediction a substantial challenge. To address this, we propose the DCA-BiLSTM model, which combines dual-path convolutional neural networks with an attention mechanism (DCA) and bidirectional long short-term memory networks (BiLSTM). This model captures deep information and complex dependencies within time-series data. First, wavelet packet decomposition extracts high- and low-frequency features, followed by DCA for robust deep feature extraction, and finally, BiLSTM models bidirectional dependencies. Validated on datasets from Yahoo Finance, including Apple, Google, Tesla stocks, and the Nasdaq index, the model consistently outperforms traditional approaches. The DCA-BiLSTM achieves an $R^{2}$ of 0.9507 for Apple, 0.9595 for Google, 0.9077 for Tesla, and 0.9594 for the Nasdaq index, with significant reductions in error metrics across all datasets. These results demonstrate the model’s robustness and improved predictive accuracy, offering reliable insights for stock price forecasting.

1 Introduction

The financial market, as a complex macro system, is influenced by various macroeconomic, political, and social factors. Stock price fluctuations are widely regarded as effective indicators of overall economic health, and changes in the stock market significantly impact economic activities and corporate development. Consequently, researchers have long sought to develop effective models for predicting stock prices [1,2]. However, stock price prediction is inherently complex due to the multitude of factors affecting prices, which renders traditional models inadequate for capturing dynamic market patterns [3]. With advancements in big data processing technologies and AI algorithms, data science methodologies have achieved notable success in the financial domain. Accordingly, research leveraging diverse AI algorithms is expected to become a vital tool for financial researchers analyzing market dynamics, particularly in overcoming the limitations of traditional approaches.

Stock price prediction often involves intricate nonlinear relationships that traditional linear models struggle to capture effectively. Multiple factors, including company fundamentals, technical indicators, and market sentiment, influence price movements. AI algorithms, with their ability to process vast amounts of data and uncover latent patterns, provide a flexible and powerful solution. Unlike traditional economic models bound by rigid assumptions, these algorithms learn from historical data without constraints, adapting to complex data patterns. This adaptability is particularly valuable for real-time decision-making in financial trading.

In recent years, deep learning-based models for stock prediction have emerged as a focal area of research, with specific emphasis on convolutional neural networks [4–6], recurrent neural networks [7,8], and Transformers [9,10]. Hybrid approaches have garnered considerable attention, aiming to combine the strengths of multiple models to achieve higher predictive accuracy compared to traditional single-model approaches. Jaiswal and Singh introduced a novel hybrid convolutional-recurrent model that integrates a one-dimensional convolutional neural network (CNN) with a gated recurrent unit (GRU) model. In this model, CNN is used for feature extraction, while GRU manages temporal regression, showing superior performance over traditional models [11]. Rostamian and Hara proposed a model that incorporates long short-term memory (LSTM) with CNN, effectively combining LSTM’s robustness in temporal prediction with CNN’s strengths in feature extraction, leading to significant improvements over conventional models [12]. Nevertheless, despite the advancements offered by these hybrid models [13], there is still room for improvement. Some research advocates for the use of bidirectional recurrent neural network (RNN) models [14,15] and the integration of attention mechanisms [16] to further enhance feature extraction. Against this background, Lu et al. and Luo et al. presented attention-based models that adopt a CNN-BiLSTM fusion structure, where CNN is utilized for feature extraction and BiLSTM for regression prediction [17,18]. By incorporating attention mechanisms, these models effectively capture the influence of different temporal feature states on stock closing prices, thereby improving prediction accuracy. Wang proposed an alternative approach involving a temporal convolutional network that combines BiLSTM with enhanced Transformers, leveraging Transformers to capture global information and BiLSTM to learn bidirectional short-sequence information. Experimental results reveal its significant advantages over standalone Transformer models [19]. In summary, combining multiple models for prediction research offers distinct advantages over single-model approaches.

In stock prediction tasks, studies suggest that dual-channel or even multi-channel structures can significantly enhance model performance by performing deeper analyses of data features [20–25]. To further improve the effectiveness of predictive models, this paper proposes an enhanced approach based on the traditional hybrid model. This model incorporates a neural network that integrates dual-path convolutional attention mechanisms with BiLSTM. Furthermore, to enhance the dual-path CNN’s ability to extract multi-scale features, we employ a wavelet packet coefficient decomposition method for dataset preprocessing. This allows the dual-channel CNN, operating at multiple scales, to extract more comprehensive features by separating high and low-frequency data through filtering. However, due to the relatively complex multi-scale feature structures extracted by the dual-channel CNN, directly using BiLSTM for regression poses challenges for datasets with longer periods. Therefore, we introduce an improved multi-scale feature fusion approach leveraging attention mechanisms to enhance performance on stock price datasets.

The structure of this paper is as follows: Section 2 provides a detailed overview of the proposed DCA-BiLSTM stock prediction model. Section 3 discusses the experimental validation and results. Section 4 concludes the paper, summarizing key findings.

2 Stock price prediction model

Given the nonlinearity, non-stationarity, and complex spatiotemporal relationships inherent in stock price data, this study introduces a stock price prediction model based on DCA-BiLSTM. The goal is to improve prediction accuracy by extracting and effectively leveraging highly correlated and diverse deep features during the regression process.

This study examines stock price prediction under a scenario involving n observation samples. For each target date i, the closing price $y_{i}$ on day i is treated as the dependent variable in regression analysis. The independent variable data $x_{i}$ includes metrics such as opening price, highest price, lowest price, price change, among others. By combining the independent variable data $x_{i}$ with the dependent variable data $y_{i}$ , we form the observation dataset ${y_{i}, x_{i}}_{i = 1}^{n}$ .

The stock price prediction model proposed in this paper consists of three main stages:

(1) Decomposing the raw input time series data $x_{i}$ into high-frequency domain signals $x_{j, u}^{2 m}$ and low-frequency domain signals $x_{j, u}^{2 m + 1}$ using wavelet packet decomposition, and transforming them into shallow high-low frequency domain features through a feature extraction layer;
(2) Using a dual-path convolutional neural network with an attention mechanism to integrate the shallow high-low frequency features into deep features with high reliability;
(3) Modeling the extracted deep features from both forward and backward directions using a bidirectional long short-term memory network, and performing regression prediction through a fully connected layer.

The specific algorithmic framework is depicted in Fig 1. Below, we elaborate on the three stages involved in constructing this model.

2.1 Wavelet packet-based feature decomposition

Stock price data often display marked nonlinearity and non-stationarity, resulting in varied statistical characteristics over time. This challenge complicates the ability of traditional linear models to accurately capture and predict stock price fluctuations. To address this, we employ wavelet packet decomposition, which effectively handles non-stationary time series signals and provides multi-scale information to capture the data’s nonlinear properties [26].

Wavelet packet analysis, a signal processing technique based on wavelet transforms, extends the capabilities of traditional methods in terms of time-frequency analysis. While conventional wavelet transforms focus on decomposing signals into approximation and detail components, with a primary emphasis on the low-frequency approximation, wavelet packet analysis provides a more comprehensive decomposition that includes both low- and high-frequency components. This approach extracts features from the entire scale domain, enhancing the scope of feature extraction. In this study, wavelet packet decomposition is applied to initially extract features, decomposing the signals into high- and low-frequency domain information. These features are then further processed by a dual-path convolutional neural network. The architecture of the wavelet packet decomposition layer within the network is illustrated in Fig 2.

As depicted in Fig 2, the wavelet packet decomposition-based shallow feature extraction method uses time series data as input. During processing, multiple high-pass and low-pass linear filtering operations are performed, resulting in the extraction of wavelet packet coefficients from the time series data. These coefficients capture essential signal information and represent the decomposition of signal x at scale j in the wavelet packet function.

The calculation formula is as follows:

\begin{array}{l} {\begin{matrix} x_{j, u}^{2 m} = \sqrt{2} \sum_{k \in Z} g (k - 2 u) x_{j - 1, k}^{m} \\ x_{j, u}^{2 m + 1} = \sqrt{2} \sum_{k \in Z} h (k - 2 u) x_{j - 1, k}^{m} \end{matrix}, \end{array}

where $x_{j - 1, k}^{m}$ represents the original signal, while $x_{j, u}^{2 m}$ and $x_{j, u}^{2 m + 1}$ denote the decomposed wavelet signals. Here, m represents the frequency band, where higher values correspond to higher-frequency bands; u represents the time-domain position, k indicates the time-domain shift of the filtering function, and Z denotes the set of integers. The symbols g ( ⋅ ) and h ( ⋅ ) represent the high-pass and low-pass filtering functions, respectively, which weight the wavelet packet function values across different frequency bands. Their formulas are as follows:

For the high-pass filtering function, denoted as g(a):

\begin{array}{l} g (a) = \frac{a_{k} + a_{k + 1}}{2} . \end{array}

For the low-pass filtering function, denoted as h(a):

\begin{array}{l} h (a) = \frac{a_{k} - a_{k + 1}}{2} . \end{array}

After passing through the feature extraction layer, the high-frequency time-domain signal $x_{j, u}^{2 m}$ and low-frequency time-domain signal $x_{j, u}^{2 m + 1}$ are transformed into shallow high- and low-frequency features, respectively. While wavelet packet coefficients can effectively reflect the high- and low-frequency features in stock price samples, directly using these coefficients for stock price prediction is challenging [27,28]. The following sections present a model to address the handling of these extracted high- and low-frequency features for more effective application in stock price prediction.

2.2 Dual-path convolutional neural network with attention mechanism

Following wavelet packet decomposition, the input data is transformed into high- and low-frequency time-domain features, necessitating further extraction of deeper features. Given the multifaceted impact of factors such as market sentiment, macroeconomic indicators, and company fundamentals on stock prices, a model capable of effectively capturing these relationships is required. To address these challenges, this paper employs a DCA network architecture. This architecture integrates an attention mechanism into a dual-path convolutional neural network, allowing the extraction of highly reliable deep features from high- and low-frequency features via encoder structures with attention mechanisms [29].

The main contributions of this model are as follows:

(1) Due to the differing feature scales of high- and low-frequency signals, optimizing the hyperparameters of different convolutional layers is essential. Therefore, this paper adopts a multi-path convolutional neural network structure;
(2) To address the increased parameter size during convolution, which can prolong training times and reduce model efficiency, global average pooling is used after each convolutional layer to reduce feature dimensionality;
(3) To enhance model accuracy and convergence rate, group normalization is applied to standardize each channel of the feature maps;
(4) In the attention mechanism, scaled dot-product attention is employed to mitigate the complexity and reduce processing speed declines caused by multilinearity.

This network architecture, integrating attention mechanisms and dual-channel convolutional networks, aims to process time series inputs more effectively, enhance feature extraction, and support the BiLSTM layer in capturing critical information at various points in sequences. This architecture exhibits impressive deep feature extraction capabilities, as shown in Fig 3.

The multi-head self-attention mechanism includes several self-attention modules, each designed to compute attention weights for every element across positions in the input sequence, resulting in a weighted summation as the output. Illustrated in Fig 4, the architecture unfolds as follows:

(1) First, attention scores are calculated by projecting the input sequence into different spaces using Query $Q_{i}$ , Key $K_{i}$ , and Value $V_{i}$ matrices. The relevance of each element within the sequence is assessed by computing the attention scores based on the dot product between the query $Q_{i}$ and key K_i pairs.
(2) Next, attention weights are obtained by applying the softmax function to the attention scores, creating a distribution that highlights the significance of each part of the sequence.
(3) A weighted summation is then produced by combining the input sequence with the calculated attention weights.
(4) This weighted sum serves as the output of a single self-attention module.

In a multi-head self-attention mechanism, these steps are repeated independently across multiple attention heads, each with its own parameters. The outputs from each head are then combined, typically by concatenation, to produce the final result, enhancing the model’s ability to capture complex relationships within the sequence.

2.3 Bidirectional long short-term memory

Stock prices embody information at different time scales, including short-term fluctuations and long-term trends. Simultaneously considering these scales in prediction is complex. Wavelet packet decomposition aids in modeling data across time scales by providing multi-scale information. BiLSTM, an improvement over LSTM networks, captures both long-term and short-term dependencies, thus addressing time scale issues more comprehensively.

BiLSTM combines forward and backward models, allowing it to capture contextual information bidirectionally [30]. Since a single LSTM only captures context in one direction, using a single LSTM may lead to the loss of features related to the opposite direction, particularly in sequential data like stock prices where context is often bidirectional. Hence, using BiLSTM helps in capturing dependencies more effectively. For stock price time-series data, which are sensitive to time and deeply dependent on past and future periods, BiLSTM improves predictive performance by capturing information across different time scales, including both short-term and long-term trends [31,32].

The architecture of a single LSTM unit is illustrated in Fig 5.

The relevant formula definitions are as follows:

\begin{array}{l} i_{t} & = sigmoid (W_{i} \cdot [h_{t - 1}, z_{t}] + b_{i}), \\ f_{t} & = sigmoid (W_{f} \cdot [h_{t - 1}, z_{t}] + b_{f}), \\ o_{t} & = sigmoid (W_{o} \cdot [h_{t - 1}, z_{t}] + b_{o}), \\ {\tilde{c}}_{t} & = \tanh (W_{c} \cdot [h_{t - 1}, z_{t}] + b_{c}), \\ c_{t} & = f_{t} \times c_{t - 1} + i_{t} * {\tilde{c}}_{t}, \\ h_{t} & = o_{t} \times \tanh (c_{t}), \end{array}

where $z_{t}$ represents the input data at the current time step t. $i_{t}$ represents the input gate, controlling information flow into the memory cell $c_{t}$ . $f_{t}$ is the forget gate, deciding whether information from the previous time step t–1 should be retained in the memory cell $c_{t}$ . $o_{t}$ determines whether the memory cell information impacts the current hidden state $h_{t}$ . $c_{t}$ denotes the memory cell, embodying the neural state’s memory, and giving LSTM units the capacity to store, read, reset, and update historical information. The $W_{*}$ are weights associated with respective gates. Sigmoid and tanh are the commonly used activation functions [33].

3 Experiments

To assess the performance of the proposed stock price prediction model based on DCA-BiLSTM, datasets from four sources were selected: Apple stock, Google stock, Tesla stock, and the Nasdaq index data. The proposed model was compared with four alternative models: CNN-BiLSTM, DCA, CNN, and BiLSTM [34–36].

3.1 Experimental environment and parameters

The experiments were conducted in a Windows 10 22H2 environment. Python 3 and the PyTorch neural network framework were used to implement the DCA-BiLSTM model. CUDA 11.8 was employed to accelerate model training.

3.2 Experimental datasets

The datasets for this study, sourced from Yahoo Finance (https://finance.yahoo.com/), contain historical daily price data for Apple, Google, Tesla stocks, and the Nasdaq index, with all data collection and usage complying with Yahoo Finance’s Terms of Service. Apple, as a global leader in technology, serves as a key market indicator. Google, one of the largest players in the internet industry, impacts the advertising sector and the broader IT field. Tesla, a pioneer in electric vehicles and renewable energy, offers insights into emerging markets, while the Nasdaq index, representing a range of technology companies, provides a broad view of technology sector trends.

The study spans the following timeframes: Apple stock (2013/12/9-2023/12/7), Google stock (2014/1/13-2024/1/12), Tesla stock (2014/1/13-2024/1/12), and the Nasdaq index (2013/12/9-2023/12/7). The selected features include indicators such as opening price, highest price, lowest price, price change, percentage change, trading volume, turnover, market capitalization, and total market value, among others. These datasets comprise a diverse range of fundamental data, supporting the construction of multifactorial time-series samples and enabling comprehensive predictive analysis. The experimental models primarily utilize closing prices to forecast future market trends.

3.3 Evaluation metrics

In the experiment, we utilize four metrics to assess the predictive performance of the model, namely the coefficient of determination ( $R^{2}$ ), mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean squared error (RMSE). The formulas for calculating these metrics are as follows:

\begin{array}{l} R^{2} & = 1 - \frac{\sum_{i = 1}^{n_{t e s t}} {(N_{t e s t, i} - {\hat{N}}_{t e s t, i})}^{2}}{\sum_{i = 1}^{n_{t e s t}} {({\bar{N}}_{t e s t} - N_{t e s t, i})}^{2}}, \\ M A E & = \frac{1}{n_{t e s t}} \sum_{i = 1}^{n_{t e s t}} | N_{t e s t, i} - {\hat{N}}_{t e s t, i} |, \\ M A P E & = \frac{1}{n_{t e s t}} \sum_{i = 1}^{n_{t e s t}} | \frac{N_{t e s t, i} - {\hat{N}}_{t e s t, i}}{N_{t e s t, i}} |, \\ R M S E & = \sqrt{\frac{1}{n_{t e s t}} \sum_{i = 1}^{n_{t e s t}} {(N_{t e s t, i} - {\hat{N}}_{t e s t, i})}^{2}}, \end{array}

where $n_{t e s t}$ represents the sample size of the test set, and $N_{t e s t, i}$ , ${\bar{N}}_{t e s t}$ , and ${\hat{N}}_{t e s t, i}$ denote the true values, the mean of all true values in the test set, and the predicted values of the test set, respectively. These four metrics individually quantify the disparities between the model’s predicted values and the actual observations. $R^{2}$ indicates the model’s explanatory power, while MAE, MAPE, and RMSE quantify the magnitude of prediction errors. Higher $R^{2}$ values (closer to 1) and lower MAE, MAPE, and RMSE values generally indicate superior predictive performance.

3.4 Experimental results

To validate the proposed DCA-BiLSTM model, we compared it against four alternative models: CNN-BiLSTM, DCA, CNN, and BiLSTM. To enhance the predictive precision of the model, we allocated 90% of the dataset for training to refine parameter tuning, while the remaining 10% was reserved for testing to assess the model’s performance.

Fig 6 shows the loss function curves of these models, with the horizontal axis indicating the iteration count and the vertical axis representing the loss value. Figs 7 to 10 illustrate the closing price predictions of these models, with the horizontal axis representing the number of iterations and the vertical axis representing the closing price.

In Fig 6, the proposed DCA-BiLSTM model demonstrates relatively lower initial loss values across all datasets compared to the other models. With increasing iterations, all models exhibit a gradual reduction in loss values; however, the DCA-BiLSTM model maintains a more stable loss curve, highlighting its robustness and ability to achieve consistently lower loss levels across datasets.

Observing Figs 7 to 10 reveals that in the early iterations, the DCA-BiLSTM model trends closer to actual closing prices than the other models. As iterations increase, this proximity becomes more pronounced, while the predictions of the other models display larger deviations.

Table 1 presents the predictive results for each dataset.

Table 1. Comparison of evaluation metrics for different models.

Datasets	Models	$R^{2}$	MAE	MAPE	RMSE
Apple stock	DCA-BiLSTM	0.9507	3.5970	0.0192	4.1158
	CNN-BiLSTM	0.8868	4.8763	0.0305	6.2350
	DCA	0.6996	9.3914	0.0566	10.1557
	CNN	0.1905	15.9490	0.1401	24.1788
	BiLSTM	0.3356	13.5497	0.0944	16.6720
Google stock	DCA-BiLSTM	0.9595	2.4250	0.0207	3.1841
	CNN-BiLSTM	0.7427	6.2594	0.0554	8.0303
	DCA	0.9567	2.5022	0.0212	3.2921
	CNN	0.8418	4.9632	0.0417	6.2955
	BiLSTM	0.8796	4.3224	0.0374	5.4932
Tesla stock	DCA-BiLSTM	0.9077	9.1389	0.0428	11.4288
	CNN-BiLSTM	0.6614	17.0838	0.0752	21.8854
	DCA	0.8111	14.0728	0.0650	16.3476
	CNN	0.3350	23.6805	0.1098	30.6716
	BiLSTM	0.6498	16.9560	0.0777	22.2567
Nasdaq index	DCA-BiLSTM	0.9594	183.0246	0.0146	230.0247
	CNN-BiLSTM	0.8064	396.4562	0.0314	502.0048
	DCA	0.7276	498.9665	0.0388	595.5876
	CNN	0.2620	828.9291	0.0623	980.2409
	BiLSTM	0.5169	700.4817	0.0544	793.1271

Open in a new tab

The analysis of predictive performance across the Apple, Google, Tesla, and Nasdaq datasets reveals that the DCA-BiLSTM model consistently outperforms other models. For Apple stock, it achieves an $R^{2}$ of 0.9507, with superior accuracy across MAE, MAPE, and RMSE metrics. On the Google stock dataset, the model stands out with an $R^{2}$ of 0.9595 and the lowest MAE, RMSE, and MAPE, demonstrating precise price predictions. Similarly, for Tesla stock, the DCA-BiLSTM model achieves the highest $R^{2}$ at 0.9077, indicating strong predictive performance across all metrics. Finally, on the Nasdaq index, the model attains an $R^{2}$ of 0.9594 and outperforms in error metrics, underscoring its robustness and accuracy across varied stock datasets.

4 Conclusion and future work

This paper proposes a stock price prediction model based on the DCA-BiLSTM approach. The method begins with decomposing the time-series data using wavelet packet decomposition to extract high- and low-frequency features. Subsequently, a neural network architecture integrating convolutional layers and attention mechanisms performs deep feature extraction. A bidirectional LSTM model is then applied to capture forward and backward context within the data. Experimental results demonstrate the DCA-BiLSTM model’s accuracy and stability in stock price prediction, underscoring its capacity to capture complex temporal relationships and improve prediction precision.

Nonetheless, several challenges remain for future research. First, stock markets are influenced by numerous external factors such as political events, natural disasters, and economic policies, which existing models may inadequately address. Future studies should explore integrating these factors to enhance model robustness and applicability. Additionally, although our model performs well in short-term predictions, its accuracy and stability diminish in long-term forecasting due to factors such as economic cycles and industry trends that are difficult to capture. Therefore, future research should focus on addressing long-term prediction challenges through specialized modeling techniques. Lastly, current models face overfitting risks, especially when handling large datasets. Future studies should explore strategies to prevent overfitting and improve model generalization and robustness.

In conclusion, while this study has achieved promising results, stock price prediction remains a complex and evolving field that demands continuous exploration and refinement of predictive techniques. Future research will continue to investigate innovative methodologies to better capture stock market patterns, providing investors with more accurate and reliable forecasting insights.

Data Availability

The complete minimal anonymized dataset required to replicate the findings of this study has been deposited in the Zenodo public repository (doi: 10.5281/zenodo.14915253).

Funding Statement

Fujian Provincial Department of Science and Technology Innovation Strategic Project, grant no. 2022R0066.

References

1.Gandhmal DP, Kumar K. Systematic analysis and review of stock market prediction techniques. Comput Sci Rev. 2019;34:100190. doi: 10.1016/j.cosrev.2019.08.001 [DOI] [Google Scholar]
2.Jiang W. Applications of deep learning in stock market prediction: recent progress. Expert Syst Appl. 2021;184:115537. doi: 10.1016/j.eswa.2021.115537 [DOI] [Google Scholar]
3.Obthong M, Tantisantiwong N, Jeamwatthanachai W, Wills G. A survey on machine learning for stock price prediction: algorithms and techniques. In: 2nd International Conference on Finance, Economics, Management and IT Business, 5 May 2020–6 May 2020 [Google Scholar]
4.Mehtab S, Sen J. Stock price prediction using convolutional neural networks on a multivariate timeseries. arXiv preprint. arXiv:2001.09769. 2020. [Google Scholar]
5.Wu JMT, Li Z, Srivastava G, Frnda J, Diaz VG, Lin JCW. A CNN-based stock price trend prediction with futures and historical price. In: 2020 International Conference on Pervasive Artificial Intelligence (ICPAI). IEEE; 2020: pp. 134–9. [Google Scholar]
6.Cheung L, Wang Y, Lau ASM, Chan RMC. Using a novel clustered 3D-CNN model for improving crop future price prediction. Knowledge-Based Systems. 2023; 260:110133. [Google Scholar]
7.Yang F, Chen J, Liu Y. Improved and optimized recurrent neural network based on PSO and its application in stock price prediction. Soft Comput 2023;27(6):3461–76. doi: 10.1007/s00500-021-06113-5 [DOI] [Google Scholar]
8.Fayeem A, Kumar A, Sagar R, Aggarwal A, Jain D. Stock price prediction: recurrent neural network in financial market. Int J Mod Trends Sci Technol. 2022;8:259–64. [Google Scholar]
9.Muhammad T, Aftab AB, Ibrahim M, Ahsan MM, Muhu MM, Khan SI et al. Transformer-based deep learning model for stock price prediction: a case study on Bangladesh stock market. Int J Comput Intell Appl. 2023:2350013. doi: 10.48550/arXiv.2208.08300 [DOI] [Google Scholar]
10.Costa LD, Machado AMC. Prediction of stock price time series using transformers. In: Anais do II Brazilian Workshop on Artificial Intelligence in Finance. SBC; 2023: pp. 85–95. [Google Scholar]
11.Jaiswal R, Singh B. A hybrid convolutional recurrent (CNN-GRU) model for stock price prediction. In: 2022 IEEE 11th International Conference on Communication Systems and Network Technologies (CSNT). IEEE; 2022: pp. 299–304. [Google Scholar]
12.Rostamian A, O’Hara JG. Event prediction within directional change framework using a CNN-LSTM model. Neural Comput Applic 2022;34(20):17193–205. doi: 10.1007/s00521-022-07687-3 [DOI] [Google Scholar]
13.Putra R A, Nurmawati E. Prediction-based stock portfolio optimization using bidirectional long short-term memory (BiLSTM) and LSTM. Sci J Inform 2024;11(3):609–20. doi: 10.15294/sji.v11i3.5941 [DOI] [Google Scholar]
14.Woźniak M, Wieczorek M, Siłka J. BiLSTM deep neural network model for imbalanced medical data of IoT systems. Future Gener Comput Syst. 2023;141:489–99. doi: 10.1016/j.future.2022.12.004 [DOI] [Google Scholar]
15.Masalimov K, Muslimov T, Munasypov R. Real-time monitoring of parameters and diagnostics of the technical condition of small unmanned aerial vehicle’s (UAV) units based on deep BiGRU-CNN models. Drones 2022;6(11):368. doi: 10.3390/drones6110368 [DOI] [Google Scholar]
16.Guo MH, Xu TX, Liu JJ, Liu ZN, Jiang PT, Mu TJ. Attention mechanisms in computer vision: a survey. Comput Vis Media 2022;8(3):331–68. doi: 10.1007/s41095-022-0271-y [DOI] [Google Scholar]
17.Lu W, Li J, Wang J, Qin L. A CNN-BiLSTM-AM method for stock price prediction. Neural Comput Applic. 2021;33:4741–53. https://link.springer.com/article/10.1007/s00521-020-05532-z [Google Scholar]
18.Luo A, Zhong L, Wang J, Wang Y, Li S, Tai W. Short-term stock correlation forecasting based on CNN-BiLSTM enhanced by attention mechanism. IEEE Access. 2024;12:29617–32. doi: 10.1109/ACCESS.2024.3369419 [DOI] [Google Scholar]
19.Wang S. A stock price prediction method based on BiLSTM and improved transformer. IEEE Access. 2023;11:104211–23. doi: 10.1109/ACCESS.2023.3296308 [DOI] [Google Scholar]
20.Jiang X, Zhu Y, Liu Y, Wang N, Yi L. MC-DC: an MLP-CNN based dual-path complementary network for medical image segmentation. Comput Methods Programs Biomed. 2023;242:107846. doi: 10.1016/j.cmpb.2023.107846 [DOI] [PubMed] [Google Scholar]
21.Tong J, Wang C. A dual tri-path CNN system for brain tumor segmentation. Biomed Signal Process Control. 2023;81:104411. doi: 10.1016/j.bspc.2022.104411 [DOI] [Google Scholar]
22.Zhang Y, Yang S, Sun Y, Liu S, Li X. Attention-guided multi-path cross-CNN for underwater image super-resolution. SIViP 2022;16:155–63 [Google Scholar]
23.Dang F, Chen H, Zhang P. DPT-FSNet: dual-path transformer based full-band and sub-band fusion network for speech enhancement. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2022: pp. 6857–61. [Google Scholar]
24.Sun Y, Zheng H, Zhang G, Ren J, Xu H, Xu C. DP-ViT: a dual-path vision transformer for real-time sonar target detection. Remote Sensing 2022;14(22):5807. doi: 10.3390/rs14225807 [DOI] [Google Scholar]
25.Lee Y, Kim J, Willette J, Hwang SJ. Mpvit: Multi-path vision transformer for dense prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: pp. 7287–96. [Google Scholar]
26.Ferkous K, Guermoui M, Bellaour A, Boulmaiz T, Bailek N. Enhancing photovoltaic energy forecasting: a progressive approach using wavelet packet decomposition. Clean Energy 2024;8(3):7287–96. doi: 10.1093/ce/zkae027 [DOI] [Google Scholar]
27.Franco de Godói L, de Oliveira Nóbrega EG. Unsupervised machinery prognostics approach based on wavelet packet decomposition and variational autoencoder. J Braz Soc Mech Sci Eng 2024;46(2):97. doi: 10.21203/rs.3.rs-3332750/v1 [DOI] [Google Scholar]
28.Taranenko YK, Oliinyk OY. Optimizing the algorithm of wavelet packet signal filtering. Cybern Syst Anal. 2024;60(1):135–45. [Google Scholar]
29.Halder R, Chatterjee R. CNN-BiLSTM model for violence detection in smart surveillance. SN Comput Sci 2020;1(4):201. https://link.springer.com/article/10.1007/s42979-020-00207-x [Google Scholar]
30.El Zaar A, Benaya N, Bakir T, Mansouri M, El Allati A. Prediction of US 30-years-treasury-bonds movement and trading entry point using the robust 1DCNN-BiLSTM-XGBoost algorithm. Expert Syst 2024;41(1):e13459. doi: 10.22541/au.168079685.52841217/v1 [DOI] [Google Scholar]
31.Pandey V, Lilhore UK, Walia R, Alroobaea R, Alsafyani M, Baqasah AM, et al. Enhancing heart disease classification with M2MASC and CNN-BiLSTM integration for improved accuracy. Sci Rep 2024;14(1):24221. doi: 10.1038/s41598-024-74993-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Dalal S, Lilhore UK, Simaiya S, Radulescu M, Belascu L. Improving efficiency and sustainability via supply chain optimization through CNNs and BiLSTM. Technol Forecast Soc Change. 2024;209:123841. doi: 10.1016/j.techfore.2024.123841 [DOI] [Google Scholar]
33.Tian J, Shen C, Wang B, Ren C, Xia X, Dong R, et al. EVADE: targeted adversarial false data injection attacks for state estimation in smart grid. IEEE Trans Sustain Comput. 2024;(1):1–13. 10.1109/TSUSC.2024.3492290 [DOI] [Google Scholar]
34.Anantanasuwong K, Chaivisuttangkun S. Do investors benefit from DCA? Evidence from the stock exchange of Thailand. Chula Med J. Chula Bus Rev. 2019;41(2):84–101. [Google Scholar]
35.Hoseinzade E, Haratizade S. CNNpred: CNN-based stock market prediction using a diverse set of variables. Expert Syst Appl. 2019;129:273–85. doi: 10.1016/j.eswa.2019.03.029 [DOI] [Google Scholar]
36.Pramesti MI, Indikawati FI, Prahara A. Multivariate time series stock price data prediction in the banking sector in Indonesia using Bidirectional Long Short-Term Memory (biLSTM). Signal Image Process Lett. 2022;4(2):28–37. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The complete minimal anonymized dataset required to replicate the findings of this study has been deposited in the Zenodo public repository (doi: 10.5281/zenodo.14915253).

[pone.0319775.ref001] 1.Gandhmal DP, Kumar K. Systematic analysis and review of stock market prediction techniques. Comput Sci Rev. 2019;34:100190. doi: 10.1016/j.cosrev.2019.08.001 [DOI] [Google Scholar]

[pone.0319775.ref002] 2.Jiang W. Applications of deep learning in stock market prediction: recent progress. Expert Syst Appl. 2021;184:115537. doi: 10.1016/j.eswa.2021.115537 [DOI] [Google Scholar]

[pone.0319775.ref003] 3.Obthong M, Tantisantiwong N, Jeamwatthanachai W, Wills G. A survey on machine learning for stock price prediction: algorithms and techniques. In: 2nd International Conference on Finance, Economics, Management and IT Business, 5 May 2020–6 May 2020 [Google Scholar]

[pone.0319775.ref004] 4.Mehtab S, Sen J. Stock price prediction using convolutional neural networks on a multivariate timeseries. arXiv preprint. arXiv:2001.09769. 2020. [Google Scholar]

[pone.0319775.ref005] 5.Wu JMT, Li Z, Srivastava G, Frnda J, Diaz VG, Lin JCW. A CNN-based stock price trend prediction with futures and historical price. In: 2020 International Conference on Pervasive Artificial Intelligence (ICPAI). IEEE; 2020: pp. 134–9. [Google Scholar]

[pone.0319775.ref006] 6.Cheung L, Wang Y, Lau ASM, Chan RMC. Using a novel clustered 3D-CNN model for improving crop future price prediction. Knowledge-Based Systems. 2023; 260:110133. [Google Scholar]

[pone.0319775.ref007] 7.Yang F, Chen J, Liu Y. Improved and optimized recurrent neural network based on PSO and its application in stock price prediction. Soft Comput 2023;27(6):3461–76. doi: 10.1007/s00500-021-06113-5 [DOI] [Google Scholar]

[pone.0319775.ref008] 8.Fayeem A, Kumar A, Sagar R, Aggarwal A, Jain D. Stock price prediction: recurrent neural network in financial market. Int J Mod Trends Sci Technol. 2022;8:259–64. [Google Scholar]

[pone.0319775.ref009] 9.Muhammad T, Aftab AB, Ibrahim M, Ahsan MM, Muhu MM, Khan SI et al. Transformer-based deep learning model for stock price prediction: a case study on Bangladesh stock market. Int J Comput Intell Appl. 2023:2350013. doi: 10.48550/arXiv.2208.08300 [DOI] [Google Scholar]

[pone.0319775.ref010] 10.Costa LD, Machado AMC. Prediction of stock price time series using transformers. In: Anais do II Brazilian Workshop on Artificial Intelligence in Finance. SBC; 2023: pp. 85–95. [Google Scholar]

[pone.0319775.ref011] 11.Jaiswal R, Singh B. A hybrid convolutional recurrent (CNN-GRU) model for stock price prediction. In: 2022 IEEE 11th International Conference on Communication Systems and Network Technologies (CSNT). IEEE; 2022: pp. 299–304. [Google Scholar]

[pone.0319775.ref012] 12.Rostamian A, O’Hara JG. Event prediction within directional change framework using a CNN-LSTM model. Neural Comput Applic 2022;34(20):17193–205. doi: 10.1007/s00521-022-07687-3 [DOI] [Google Scholar]

[pone.0319775.ref013] 13.Putra R A, Nurmawati E. Prediction-based stock portfolio optimization using bidirectional long short-term memory (BiLSTM) and LSTM. Sci J Inform 2024;11(3):609–20. doi: 10.15294/sji.v11i3.5941 [DOI] [Google Scholar]

[pone.0319775.ref014] 14.Woźniak M, Wieczorek M, Siłka J. BiLSTM deep neural network model for imbalanced medical data of IoT systems. Future Gener Comput Syst. 2023;141:489–99. doi: 10.1016/j.future.2022.12.004 [DOI] [Google Scholar]

[pone.0319775.ref015] 15.Masalimov K, Muslimov T, Munasypov R. Real-time monitoring of parameters and diagnostics of the technical condition of small unmanned aerial vehicle’s (UAV) units based on deep BiGRU-CNN models. Drones 2022;6(11):368. doi: 10.3390/drones6110368 [DOI] [Google Scholar]

[pone.0319775.ref016] 16.Guo MH, Xu TX, Liu JJ, Liu ZN, Jiang PT, Mu TJ. Attention mechanisms in computer vision: a survey. Comput Vis Media 2022;8(3):331–68. doi: 10.1007/s41095-022-0271-y [DOI] [Google Scholar]

[pone.0319775.ref017] 17.Lu W, Li J, Wang J, Qin L. A CNN-BiLSTM-AM method for stock price prediction. Neural Comput Applic. 2021;33:4741–53. https://link.springer.com/article/10.1007/s00521-020-05532-z [Google Scholar]

[pone.0319775.ref018] 18.Luo A, Zhong L, Wang J, Wang Y, Li S, Tai W. Short-term stock correlation forecasting based on CNN-BiLSTM enhanced by attention mechanism. IEEE Access. 2024;12:29617–32. doi: 10.1109/ACCESS.2024.3369419 [DOI] [Google Scholar]

[pone.0319775.ref019] 19.Wang S. A stock price prediction method based on BiLSTM and improved transformer. IEEE Access. 2023;11:104211–23. doi: 10.1109/ACCESS.2023.3296308 [DOI] [Google Scholar]

[pone.0319775.ref020] 20.Jiang X, Zhu Y, Liu Y, Wang N, Yi L. MC-DC: an MLP-CNN based dual-path complementary network for medical image segmentation. Comput Methods Programs Biomed. 2023;242:107846. doi: 10.1016/j.cmpb.2023.107846 [DOI] [PubMed] [Google Scholar]

[pone.0319775.ref021] 21.Tong J, Wang C. A dual tri-path CNN system for brain tumor segmentation. Biomed Signal Process Control. 2023;81:104411. doi: 10.1016/j.bspc.2022.104411 [DOI] [Google Scholar]

[pone.0319775.ref022] 22.Zhang Y, Yang S, Sun Y, Liu S, Li X. Attention-guided multi-path cross-CNN for underwater image super-resolution. SIViP 2022;16:155–63 [Google Scholar]

[pone.0319775.ref023] 23.Dang F, Chen H, Zhang P. DPT-FSNet: dual-path transformer based full-band and sub-band fusion network for speech enhancement. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2022: pp. 6857–61. [Google Scholar]

[pone.0319775.ref024] 24.Sun Y, Zheng H, Zhang G, Ren J, Xu H, Xu C. DP-ViT: a dual-path vision transformer for real-time sonar target detection. Remote Sensing 2022;14(22):5807. doi: 10.3390/rs14225807 [DOI] [Google Scholar]

[pone.0319775.ref025] 25.Lee Y, Kim J, Willette J, Hwang SJ. Mpvit: Multi-path vision transformer for dense prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: pp. 7287–96. [Google Scholar]

[pone.0319775.ref026] 26.Ferkous K, Guermoui M, Bellaour A, Boulmaiz T, Bailek N. Enhancing photovoltaic energy forecasting: a progressive approach using wavelet packet decomposition. Clean Energy 2024;8(3):7287–96. doi: 10.1093/ce/zkae027 [DOI] [Google Scholar]

[pone.0319775.ref027] 27.Franco de Godói L, de Oliveira Nóbrega EG. Unsupervised machinery prognostics approach based on wavelet packet decomposition and variational autoencoder. J Braz Soc Mech Sci Eng 2024;46(2):97. doi: 10.21203/rs.3.rs-3332750/v1 [DOI] [Google Scholar]

[pone.0319775.ref028] 28.Taranenko YK, Oliinyk OY. Optimizing the algorithm of wavelet packet signal filtering. Cybern Syst Anal. 2024;60(1):135–45. [Google Scholar]

[pone.0319775.ref029] 29.Halder R, Chatterjee R. CNN-BiLSTM model for violence detection in smart surveillance. SN Comput Sci 2020;1(4):201. https://link.springer.com/article/10.1007/s42979-020-00207-x [Google Scholar]

[pone.0319775.ref030] 30.El Zaar A, Benaya N, Bakir T, Mansouri M, El Allati A. Prediction of US 30-years-treasury-bonds movement and trading entry point using the robust 1DCNN-BiLSTM-XGBoost algorithm. Expert Syst 2024;41(1):e13459. doi: 10.22541/au.168079685.52841217/v1 [DOI] [Google Scholar]

[pone.0319775.ref031] 31.Pandey V, Lilhore UK, Walia R, Alroobaea R, Alsafyani M, Baqasah AM, et al. Enhancing heart disease classification with M2MASC and CNN-BiLSTM integration for improved accuracy. Sci Rep 2024;14(1):24221. doi: 10.1038/s41598-024-74993-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319775.ref032] 32.Dalal S, Lilhore UK, Simaiya S, Radulescu M, Belascu L. Improving efficiency and sustainability via supply chain optimization through CNNs and BiLSTM. Technol Forecast Soc Change. 2024;209:123841. doi: 10.1016/j.techfore.2024.123841 [DOI] [Google Scholar]

[pone.0319775.ref033] 33.Tian J, Shen C, Wang B, Ren C, Xia X, Dong R, et al. EVADE: targeted adversarial false data injection attacks for state estimation in smart grid. IEEE Trans Sustain Comput. 2024;(1):1–13. 10.1109/TSUSC.2024.3492290 [DOI] [Google Scholar]

[pone.0319775.ref034] 34.Anantanasuwong K, Chaivisuttangkun S. Do investors benefit from DCA? Evidence from the stock exchange of Thailand. Chula Med J. Chula Bus Rev. 2019;41(2):84–101. [Google Scholar]

[pone.0319775.ref035] 35.Hoseinzade E, Haratizade S. CNNpred: CNN-based stock market prediction using a diverse set of variables. Expert Syst Appl. 2019;129:273–85. doi: 10.1016/j.eswa.2019.03.029 [DOI] [Google Scholar]

[pone.0319775.ref036] 36.Pramesti MI, Indikawati FI, Prahara A. Multivariate time series stock price data prediction in the banking sector in Indonesia using Bidirectional Long Short-Term Memory (biLSTM). Signal Image Process Lett. 2022;4(2):28–37. [Google Scholar]

PERMALINK

A dual-path convolutional neural network combined with an attention-based bidirectional long short-term memory network for stock price prediction

Guiyan Zhao

Yunfei Cheng

Jianhui Yang

Jiayuan Ouyang

Roles

Abstract

1 Introduction

2 Stock price prediction model

Fig 1. The framework of the stock price prediction method based on the DCA-BiLSTM model.

2.1 Wavelet packet-based feature decomposition

Fig 2. The architecture of the wavelet packet decomposition layer.