Skip to main content
PLOS One logoLink to PLOS One
. 2022 Dec 1;17(12):e0278064. doi: 10.1371/journal.pone.0278064

An attention-based recurrent learning model for short-term travel time prediction

Jawad-ur-Rehman Chughtai 1,2,*, Irfan Ul Haq 1,2, Muhammad Muneeb 3
Editor: Xiyu Liu4
PMCID: PMC9714702  PMID: 36454768

Abstract

With the advent of Big Data technology and the Internet of Things, Intelligent Transportation Systems (ITS) have become inevitable for future transportation networks. Travel time prediction (TTP) is an essential part of ITS and plays a pivotal role in congestion avoidance and route planning. The novel data sources such as smartphones and in-vehicle navigation applications allow traffic conditions in smart cities to be analyzed and forecast more reliably than ever. Such a massive amount of geospatial data provides a rich source of information for TTP. Gated Recurrent Unit (GRU) has been successfully applied to traffic prediction problems due to its ability to handle long-term traffic sequences. However, the existing GRU does not consider the relationship between various historical travel time positions in the sequences for traffic prediction. We propose an attention-based GRU model for short-term travel time prediction to cope with this problem enabling GRU to learn the relevant context in historical travel time sequences and update the weights of hidden states accordingly. We evaluated the proposed model using FCD data from Beijing. To demonstrate the generalization of our proposed model, we performed a robustness analysis by adding noise obeying Gaussian distribution. The experimental results on test data indicated that our proposed model performed better than the existing deep learning time-series models in terms of Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Coefficient of Determination (R2).

Introduction

Recent years have witnessed a drastic movement of people from rural to urban areas. 4.1 billion people were living in urban areas in 2017, comprising 55% of the total global population [1]. According to the Population Reference Bureau report, the current population will have grown by 14% by 2050 [2]. Urbanization has significantly improved the quality of life of individuals [3] whereas, on the other hand, it has brought new challenges and raised new concerns [4].

Advancement in information and communication technology has brought about a significant rise in the availability of mobility data collected through multiple data sources, including FCD (Floating Car Data), detectors, cameras, etc. Research groups and companies analyze this data using big data and machine learning to improve people’s living standards [5]. Researchers have used data from multiple sources to improve traffic-related operations with applications in traffic congestion prediction [6], traffic flow prediction [7], traffic speed estimation [8], traffic demand prediction [9], traffic signal control [10], parking space forecasting [11], stay point detection [12], traffic accident prediction [13], accident severity analysis [14], and many others.

One of the essential components of an Intelligent Transportation System (ITS) is Travel Time Prediction (TTP). Accurate TTP helps commuters and travelers make wise decisions about departure time and route selection which, in turn, leads to congestion avoidance. Moreover, it assists logistic operators in improving service quality and reducing transportation costs by avoiding congested routes. Furthermore, TTP helps traffic managers and decision-makers make traffic-related strategies and improve existing operations [15].

Various approaches including statistical (e.g., Historical Average (HA)), classical time-series (e.g., Auto-Regressive Moving Average (ARIMA) and variants), machine learning(e.g., Random Forest (RF), Support Vector Regression (SVR)), and deep learning-based approaches (Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and variants) have been proposed to predict travel time [1623]. Deep learning-based approaches outperformed their counterparts in prediction tasks because of their ability to deal with non-linearities, traffic trends, and long-term sequences [24].

Recurrent Neural Networks (RNNs) are specialized models developed for sequence learning problems. Simple RNN performs well for short-term sequences. However, it suffers from exploding Gradient and vanishing gradient problems when dealing with long-term sequences. Gated Recurrent Unit (GRU) and Long-Short-Term Memory (LSTM) were developed to resolve these issues. Both models have shown state-of-the-art performance on various sequence learning tasks with applications ranging from Natural Language Processing (NLP) to traffic prediction [25].

Existing RNNs architectures like LSTM and GRU suffer from implicitly modeling the context in historical travel time sequences. These models give equal weights to all hidden states when used for the TTP task. We introduced an attention mechanism, which aims to re-weight the network weights by leveraging the hidden relationship between distinct positions in the Travel Time (TT) sequence [26].

In this paper, we propose an attention-based GRU model for TTP. We selected GRU due to its simplistic architecture and faster training time. The experimental results on the Q-Traffic dataset show significant improvement in short-term traffic prediction compared to baseline approaches.

The main contribution of the paper can be summarized as follows:

  • This paper proposes a deep learning model based on GRU for short-term travel time prediction. We introduced self-attention in GRU to address the limitation of GRU in finding the relation across various travel time positions in the input (past) sequences. To the best of our knowledge, no attempt has been made to forecast travel time using traffic flow as input with attention-based GRU.

  • We compared our proposed model with baseline state-of-the-art statistical, classical time-series, Machine Learning (ML), and Deep Learning (DL) approaches. The comparative results on the Beijing-based FCD dataset show considerable improvement in Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Coefficient of Determination (R2).

  • Moreover, we performed perturbation analysis by adding noise to our data which validated the generalization abilities of our proposed model.

We organized the remainder of this paper as follows: Section II provides the historical background of our studied area. Section III explains the proposed methodology. Section IV presents the findings and results of our work. Section V concludes the paper.

Related work

The literature on TTP is grouped into two broad categories: traditional approaches and advanced approaches. Traditional approaches include classical approaches and machine learning-based approaches while advanced approaches include deep learning-based approaches, ensemble learning-based approaches and attention-based approaches.

Traditional approaches for TTP

Classical approaches

Earlier travel time prediction approaches employed statistical theory-based modeling and classical time-series approaches. The HA was one of the first statistical theory-based modeling approaches used in TTP studies. In this approach, travel time in the historical period is averaged to get the prediction [16]. HA is computationally fast and doesn’t require any assumption for prediction. However, HA does not consider temporal variations and features, resulting in lower prediction precision. The ARIMA was another widely used classical time-series model [17]. ARIMA treats the traffic data as a stationary time series to predict future travel time, and this assumption hampers ARIMA’s ability to predict TT in uncertain or changing traffic conditions. Despite ARMIA’s widespread use, the simple linear model falls short of accurately forecasting nonlinear traffic data.

Machine learning-based approaches

SVR is one of the widely used approach for TTP. Some studies used SVR with nonlinear transformations to handle data complexities. The authors in [18] proposed SVR for freeway TTP. When compared to classical Support Vector Machine (SVM), an SVM model optimized using the artificial fish swarm approach [27] or least squared loss function and equality constraint [28] has been found to improve model precision. k-Nearest Neighbors (k-NN), an example-based or pattern matching-based model, has also been widely employed for similarity pattern matching in travel time problems on urban roads and highways [29]. Typically, k-NN uses euclidean distance to find k similar patterns and then uses a weighted algorithm to get the final result. Myung et al. in [30] employed k-NN on data collected through automatic toll collection and vehicle detector systems to predict highway travel time.

Advanced approaches

Deep learning-based approaches

DL has attracted researchers’ attention for TTP and is still an enduring area. Different DL approaches, including MLP, auto-encoders, CNN, and RNN, have been applied for TTP. MLP is one of the earliest and most widely used approaches for TTP. The authors in [21] have proposed a multi-step deep learning approach for TTP. Extensive feature engineering is performed using geospatial feature analysis, principal component analysis, and k-means clustering, followed by a deep-stacked auto-encoder. The findings revealed that the proposed approach performed well for general traffic dynamics but failed to predict travel time in case of rare events. Fu et al. [31] used MLP as a final predictor on top of wide deep recurrent modules to predict travel time. Yuan et al. [32] employed MLP for spatiotemporal learning of travel time by exploiting periodicity in daily and weekly patterns and the road network structure. Researchers have also used CNN to capture the spatial aspects of TTP data. The authors in [20] proposed a global-level representation for CNN to capture better the relationship between the predicted information and historical data points to overcome local receptive field limitations. However, the proposed approach is validated only on a single highway link. A new local-receptive field is proposed by [33] to model nonlinear spatiotemporal relationships in travel time data over multiple highway links. Shen et al. [34] implemented CNN with RNN to learn both spatial and temporal features to improve the prediction of FCD. Likewise, the authors in [35] proposed a graph convolutional network with LSTM to capture spatiotemporal features in urban road travel time data. A graph-based deep learning approach is presented in [36]. The model gives promising results compared to baselines, but only short trajectories are considered with no external features incorporated. Unlike MLPs and CNNs which are feed-forward neural networks and take data all at once, RNNs act on data sequentially and are frequently employed in the NLP domain. RNNs have also been widely adopted for TTP. Zhao et al. [37] employed GRU on integrated data from remote transportation microwave sensors and dedicated short-range communications to predict travel time. The experiment used two freeway segments yielding better results with data fusion. With the introduction of data sparsity as the spatial scale increased, a neighboring-segments-based strategy is proposed in [23] which employed GRU to predict travel time for the entire trajectory path. Adjacent road segment information addresses trajectory data sparseness due to longer trips. In [22], the authors compared the LSTM-based RNN model with a Back Propagation Neural Network (BPNN) and Deep Belief Network (DBN) using multi-factor data for TTP.

Ensemble learning-based approaches

Researchers have also employed ensemble approaches for TTP. The most extensively used ensemble approaches for TTP are Gradient Boosting Machine (GBM), eXtreme Gradient Boosting (XGBoost), and RF. The authors in [19] implemented GBM, RF, and ARIMA for multi-step ahead prediction using freeways data and demonstrated better performance of GBM over RF and ARIMA. Verkehr In Städten-SIMulationsmodell (VISSIM) freeway data is used in [38] to predict TT for multi-step ahead prediction. The Gradient Boosting Regression Tree (GBDT) model performed better than the SVM and BPNN. Chen et al. [39] proposed the XGBoost model to predict freeway TT. The results show better performance of XGBoost compared to GBM. The authors in [40] implemented decision tree, RF, XGBoost, and LSTM and demonstrated better performance of RF on freeway data. However, as the data volume increases, the performance of these approaches begins to deteriorate, and to improve the performance of these approaches, some studies used a combination of various algorithms for TTP. Ting et al. in [41] proposed an ensemble based on XGBoost and GRU to improve prediction accuracy using freeway data. For example, the results from the ensemble of light gradient boosting machine and MLP are combined using a decision tree for TTP in [42]. Recently, an ML-based ensemble has been proposed in [43] that uses kalman filter, k-NN, and BPNN as base learners. The prediction results are combined using fuzzy soft set theory to improve the prediction accuracy using freeway data. Although the proposed study improves individual models’ prediction accuracy, the criteria for choosing these base learners are not discussed.

Attention-based approaches

Attention mechanisms or attention-based models have proven to be very powerful and adaptable models in a wide range of transportation applications [6, 8, 44]. Attention mechanism has recently been introduced in TTP due to its success in other related applications to learn only the relevant context. [26, 45] implemented an attention mechanism with LSTM to enhance performance. Ran et al. [46] introduced an attention mechanism with CNN on freeways data for better results. The authors in [47] employed attention with LSTM for joint prediction of travel time and next location.

To summarize, various approaches have been developed for TTP to improve and enhance TTP performance. Different datasets have been used in different studies. Some studies are conducted on freeway data, while others use urban road networks. It is difficult to conclude which approach is better in every scenario. Generally, deep learning performed better in handling data complexities and non-linearities. For traffic data, RNNs have shown promising results due to the nature of the problem (i.e., the traffic conditions at the current timestamp or near future timestamp are dependent on past timestamps).

Therefore, we propose a GRU variant to improve prediction performance by adding an attention method that allows the model to learn only the relevant context rather than considering all historical timestamps (positions) equally. The attention mechanism resulted in enhanced feature space yielding better results.

Proposed methodology

Problem definition

Travel time prediction can be formalized as forecasting future travel time given historical travel time. Let Tτi denote the i-th segment travel time during t-th time period. Given the historical travel time sequence Tτi (τ=t-mδ,…,t-δ, t and iS, where S is the set of segments in the considered study area), the task is to predict segment travel time at time interval (t + fδ) for some prediction horizon δ. In this work, we consider δ = 15 minutes, m = 4, and f = 1, 2, 3, 4, which means that previous one-hour observations are used to predict the travel time of the next 15 minutes, 30 minutes, 45 minutes and 60 minutes.

GRU

RNNs are proposed to process sequential data efficiently. However, standard RNNs suffer from exploding and vanishing gradient problems as the input sequence lengthens. To overcome these limitations, two specialized variants of RNNs, GRU, [48] and LSTM, [49] are developed. These models use the gated mechanism to handle long-term time-series sequences. LSTM comprises three gates; input, forget, and output gate. GRU uses two gates (update gate and reset gate), speeding the training process with fewer parameters than LSTM.

In this study, a GRU with an attention mechanism was employed to forecast future TT (short-term) using past travel time sequences. Fig 1 shows the structure of a GRU cell. The Reset gate decides how much past information the model needs to forget at each timestamp. Likewise, the Update gate is responsible for determining how much past information the model needs to pass at each timestamp. Eqs (1)–(4) show how the two gates govern the flow of information within the GRU network.

ut=σ(Wuxt+Uuht-1) (1)
rt=σ(Wrxt+Urht-1) (2)
ht=μ(Wxt+rtUht-1) (3)
ht=ztht-1+(1-ztht) (4)

where rt and ut denote reset gate and update gate, respectively, ht, and ht denote memory content (current) and memory content (final) at time t, respectively, σ is the sigmoid activation function and μ denotes tanh activation function. Wu and Uu are the respective weight matrices of the two gates whereas ⊙ denotes element-wise multiplication.

Fig 1. Structure of a GRU cell [48].

Fig 1

The architectural diagram and data flow are illustrated in Figs 2 and 3.

Fig 2. Layers of the proposed GRU model.

Fig 2

Fig 3. Data flow of proposed GRU model.

Fig 3

Attention mechanism

The attention mechanism enhances the learning ability of predictive models by focusing on relevant information. The authors in [50] improved weight assignment by giving different weights to different text fragments, thereby enhancing the encoding process in neural machine translation. Subsequently, the attention mechanism is successfully applied in document classification [51], image caption generation [52], tabular learning [53], and many more. In the context of ITS, attention has recently been applied to traffic congestion prediction [6], traffic speed prediction, [8], traffic flow prediction [44] and travel time prediction [26]. Because standard RNN models such as LSTM and GRU could not identify the relevance in historical travel time sequences explicitly, we have implemented an attention mechanism to learn global trends in travel time sequences. The following three steps explain the attention mechanism. First, GRU computes the hidden states at different timestamps (H=(h1,h2,…,hn)). In the second step, weights of each hidden state hi are computed using a scoring function (i.e., a two-layer deep neural network in our case). Thirdly, the context vector At, which is used to get the final prediction, is extracted with an attention function. We illustrated the attention mechanism concept in Fig 4 where the relation between the predicted (Xt+1) value and historical values (Xt−3, Xt−2, Xt−1, Xt) is shown by the thickness of arrows. We adapted the attention mechanism implemented for traffic speed prediction in [54] and given by the Eqs (5)–(7).

ai=W(h2)(W(h1)H+b(h1))+b(h2) (5)
αi=exp(ai)k=1nexp(ak) (6)
At=i=1nαi*hi (7)

where the weights and biases of two hidden layers are denoted by W(h1), W(h2), b(h1) and b(h2), respectively, αi shows the dependency between ht (i.e., current position at t) and ht′ (i.e., the previous position at t) in H.

Fig 4. An illustration of the attention mechanism.

Fig 4

Results

Dataset

We evaluated our model on the Q-Traffic dataset presented in [55]. The dataset contains 15,073 road segments spanning 738.91 km from April 1, 2017, to May 31, 2017. All the data is collected around the most crowded area of Beijing (i.e., around the 6th ring road). This dataset also incorporates events happening around that time like Summer Palace (May Day), Fish Leong Concert, Chou Chuan-huing Concert, 106th Anniversary of THU and Spring outing, etc., causing a massive increase in traffic congestion than usual traffic. The data is aggregated at a 15-minute time interval on every road. The training-test split is 80-20%. The data is normalized to the interval [0, 1]. Travel time for the next 15 minutes, 30 minutes, 45 minutes, and 60 minutes are predicted in this experiment.

Performance metrics

We used four evaluation measures to evaluate our proposed model; RMSE, MAE, MAPE, and R2. The RMSE can be computed using Eq (8). These equations are taken from [56].

RMSE=1ni=1n(TTi^-TTi)2 (8)

where TT_i denotes the actual travel time and TTi^ denotes the predicted travel time. MAE is the average absolute error among the TT_i and TTi^ and is shown in Eq (9).

MAE=1ni=1n|TTi^-TTi| (9)

MAPE denotes the percentage of the difference between the actual and predicted value and is given in Eq (10). A lower value of MAPE indicates high prediction accuracy.

MAPE=1ni=1n|TTi^-TTi|TTi (10)

Eq (11) shows the R2, which reflects how much variation the model learns.

R2=1-i=1n|(TTi^-TTi)|i=1n|(TTi^-TTm)| (11)

Here TT_m denotes the mean travel time value. For optimal prediction, RMSE and MAE should be zero (or close to zero), and R2 should be close to one.

Hyperparameters setting

In our experiments, we set the training epoch to 600 and the learning rate to 0.001. One of the important hyperparameters is the number of hidden units which greatly affect the prediction output. We tested our model with 8, 16,32,64, and 128 hidden units with varying batch sizes (i.e. [16, 32 and 64] and chose the values with the best results. The results of the finalized model with a batch size of 32 are illustrated in Fig 5. It can be seen that when the hidden units are set to 32, we got the smallest values for RMSE, MAE, and MAPE and higher values for R2. By choosing a smaller value or a value greater than 32 for batch size and hidden units, the evaluation measures either give higher values or start diverging from minima. As a result, we set hidden units to 32 in our experiments. To avoid overfitting, a normalization term is added in loss computation as shown in Eq (12).

Loss=|TTi^-TTi|+CLnorm (12)

where Lnorm is the normalization term and C is a parameter whose value is set to 0.0015 for this experiment.

Fig 5. Comparison of RMSE, MAE, MAPE, and R Squared error results for different hidden units values.

Fig 5

Baselines

  • HA [57]: Historical Average is a simple mathematical model which takes the mean of traffic values in the historical interval as the final prediction.

  • ARIMA [58]: ARIMA is a widely used time-series model that predicts future traffic data by fitting a parametric model to the historical time series. We have used ARIMA from statsmodel python package with parameters setting as (2,0,1).

  • SVR [59]: SVR is a well-known machine learning model that we train on training data to obtain a relationship between explanatory variables and the target variable. In this experiment, we have used rbf kernel function, ϵ =0.1, and C = 1.

  • XGBoost [60]: XGBoost is a state-of-the-art model from the decision tree family that employs an ensemble of decision tree regressors for travel time prediction. In our work, we have set max_depth to 7.

  • MLP [59]: MLP is a feed-forward artificial neural network consisting of fully connected layers (dense). In this study, we used 3 layers deep neural network. Hidden units are set to 64 and relu activation function is used.

  • GRU [48]: GRU is an improved variant of recurrent neural network (Readers are referred to Section III for details.). In our experiment, a single layer GRU model with 32 hidden units and relu activation function is used.

Performance comparison with baselines

A comparison of the proposed model and baseline approaches for the prediction horizon of 15 minutes, 30 minutes, 45 minutes, and 60 minutes is shown in Table 1. ⋇ shows minimal values of the error measures indicating the model’s poor performance.

Table 1. Performance evaluation of baselines & proposed (Overall).

Predication-Horizon Model RMSE MAE MAPE R 2
15-min HA 2.73136 0.60377 6.61174 0.87217
ARIMA 4.25444 1.48836 9.24698
SVR 2.71936 0.58616 6.58210 0.87379
XGBoost 2.69701 0.58133 6.56837 0.87706
MLP 2.67871 0.57979 6.54887 0.87915
GRU 2.63252 0.55150 6.48860 0.88855
Proposed 2.62162 0.54244 6.36716 0.89974
Improvement 0.41% 1.64% 1.87% 1.26%
30-min HA 2.73136 0.60377 6.61174 0.87217
ARIMA 4.05522 1.29382 9.49330
SVR 2.73877 0.59465 6.65536 0.87095
XGBoost 2.70256 0.58505 6.59215 0.87494
MLP 2.69742 0.58291 6.56015 0.87514
GRU 2.65508 0.57152 6.50139 0.88298
Proposed 2.64053 0.56109 6.38909 0.89177
Improvement 0.55% 1.82% 1.73% 0.99%
45-min HA 2.73136 0.60377 6.61174 0.87217
ARIMA 4.02042 1.27711 9.51899
SVR 2.75216 0.61031 6.68810 0.86794
XGBoost 2.72788 0.60527 6.61623 0.87028
MLP 2.70909 0.59954 6.58186 0.87136
GRU 2.67372 0.59137 6.52952 0.87966
Proposed 2.66211 0.57983 6.40188 0.88548
Improvement 0.43% 1.95% 1.95% 0.66%
60-min HA 2.73136 0.60377 6.61174 0.87217
ARIMA 3.99637 1.26543 9.52668
SVR 2.77696 0.62892 6.70206 0.86491
XGBoost 2.74182 0.62154 6.64527 0.86725
MLP 2.73427 0.61899 6.60636 0.86842
GRU 2.69637 0.61543 6.54968 0.87395
Proposed 2.68005 0.59523 6.42337 0.87893
Improvement 0.61% 3.28% 1.93% 0.57%
Overall HA 2.73136 0.60377 6.61174 0.87217
ARIMA 4.08161 1.33118 9.44649
SVR 2.74681 0.0501 6.65691 0.86940
XGBoost 2.71732 0.59830 6.64527 0.87238
MLP 2.70237 0.59531 6.57431 0.87352
GRU 2.66442 0.58246 6.51730 0.88129
Proposed 2.65108 0.56965 6.39538 0.88898
Improvement 0.50% 2.20% 1.87% 0.87%

The results demonstrate that SVR with a non-linear kernel performed better than HA and ARIMA. For example, SVR reduces the RMSE from 2.73136 (HA) and 4.25444 (ARIMA) to 2.71936, a 0.44 percent and 36.08 percent reduction, respectively. Compared to HA, ARIMA, and SVR, an ensemble model XGBoost performed better. For example, XGBoost shows a reduction of 1.26%, 36.61%, and 0.82% against HA, ARIMA, and SVR in RMSE for the prediction horizon of 15 minutes. Similarly, there is a reduction of 3.72%, 60.94%, and 0.82% in MAE when comparing XGBoost with HA, ARIMA, and SVR for the same prediction horizon. The same trend is visible for MAPE in Table 1. Like RMSE, MAE, and MAPE, we observed an improvement in R2 error compared to HA, ARIMA, and SVR. For instance, an improvement of 0.56% is reported when comparing HA with XGBoost.

The results in the Table 1 show that neural networks such as MLP, GRU, and our proposed attention-based GRU model performed better than traditional machine learning and time-series models. For example, there is a reduction of 1.93%, 37.04%, 1.49%, and 0.68% in RMSE when comparing MLP with HA, ARIMA, SVR, and XGBoost for the prediction horizon of 15 minutes. Likewise, compared to HA, ARIMA, SVR, XGBoost, and MLP, GRU performance was reduced by 3.62%, 38.12%, 3.19%, 2.39%, and 1.72%, in terms of RMSE. Our proposed attention-based GRU has shown a decrease in RMSE, MAE, and MAPE of 4.02%, 10.16%, and 3.7%, compared to HA. Furthermore, comparing HA with our proposed model, we have seen an improvement of about 3.16% in the R2 error. With the attention mechanism, we improved the prediction precision of GRU. The RMSE, MAE, and MAPE have been decreased by 0.41 percent, 1.64 percent, and 1.87 percent, respectively.

Our proposed model can achieve better prediction performance regardless of how the horizon varies, and the prediction results have a lower tendency to change. Compared to GRU, there is a 0.50 percent, 2.20 percent, and 1.87 percent reduction in RMSE, MAE, and MAPE, respectively, demonstrating that the proposed model can be utilized for both short and long-term prediction without reducing performance significantly.

To demonstrate our model performance, we selected a single road and showed the plots of actual and predicted travel time values for the four prediction horizons. The visualization results on test data for the prediction horizon of 15-min, 30-min, 45-min, and 60-min are shown in Figs 613. These results show that our proposed model captures the traffic dynamics regardless of the prediction horizon. However, taking into account both the geographical and temporal dimensions can improve the results even more, particularly along with local minima/maxima.

Fig 6. Prediction results for 15 minutes horizon on test data (overall).

Fig 6

Fig 13. Prediction results for 60 minutes horizon on test data (two days).

Fig 13

Fig 7. Prediction results for 15 minutes horizon on test data (two days).

Fig 7

Fig 8. Prediction results for 30 minutes horizon on test data (overall).

Fig 8

Fig 9. Prediction results for 30 minutes horizon on test data (two days).

Fig 9

Fig 10. Prediction results for 45 minutes horizon on test data (overall).

Fig 10

Fig 11. Prediction results for 45 minutes horizon on test data (two days).

Fig 11

Fig 12. Prediction results for 60 minutes horizon on test data (overall).

Fig 12

Robustness analysis

Noise is unavoidable during the data collection process in real-world circumstances. We have performed a robustness analysis to test the generalization of our proposed model in the presence of noise.

We induced a common type of noise that obeys Gaussian Distribution (i.e., N ∈ 0, σ2 where sigma varies from 0.2−2) to our dataset after normalising it to the interval [0, 1]. Fig 14 shows that the change in error measures is small, which shows the proposed model’s generalization ability to perform well even on noisy traffic data.

Fig 14. Robustness analysis after adding Gaussian noise.

Fig 14

Conclusion

Travel time is becoming an attractive research area in the traffic prediction domain compared to other traffic variables as it is more interpretable and understandable for those unfamiliar with transportation terms. With cutting-edge traffic data collection technologies in recent years, data-driven approaches have been extensively applied for traffic prediction problems. In this article, we have implemented a GRU model to capture temporal relations in travel time data. We added an attention mechanism to the GRU model to help it learn the relevant information and enhance prediction precision. Experimental results show improvement in the performance when using the proposed attention-based GRU model compared to classical time-series models. Furthermore, we conducted a robustness test and found that the proposed model performed better even when there was noise in the traffic data, which is inevitable, with just a minor degradation in the model’s performance.

In the future, we plan to extend our work by incorporating graph-based neural networks to cater to the spatial dimension along with temporal on the same dataset To improve prediction accuracy, we also plan to combine exogenous elements such as weather, peak/non-peak hours, and other factors with traffic data.

Data Availability

All the implementation details of our work can be found at https://github.com/jawadchughtai/Att_GRU_TTP We have made the visibility of the repository public. Also, minimal anonymized dataset is provided in the Dataset folder to replicate our work.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Ritchie H, Roser M. Urbanization 2018. [Online]. Available: https://ourworldindata.org/urbanization
  • 2.Bureau PR. 2018 World population data. 2018. [Online]. Available: https://interactives.prb.org/wpds/2018/index.html
  • 3. Kato T, Uchida K. A study on benefit estimation that considers the values of travel time and travel time reliability in road networks. Transportmetrica A: transport science. 2018, 14(1-2):89–109. doi: 10.1080/23249935.2017.1321695 [DOI] [Google Scholar]
  • 4. Schrank D, Eisele B, Lomax T. Urban mobility report 2019. Texas Transportation Institute, 2019. [Google Scholar]
  • 5. Quasim MT, Khan MA, Algarni F, Alshahrani MM. Fundamentals of Smart Cities. In: Smart Cities: A Data Analytics Perspective. Springer, 2021:3–16. [Google Scholar]
  • 6. Zheng C, Fan X, Wang C, Qi J. Gman: A graph multi-attention network for traffic prediction. AAAI Conference on Artificial Intelligence. 2020:1234–1241. doi: 10.1609/aaai.v34i01.5477 [DOI] [Google Scholar]
  • 7. Ma C, Dai G, Zhou J. Short-Term traffic flow prediction for urban road sections based on time series analysis and LSTM BILSTM method. IEEE Transactions on Intelligent Transportation Systems. 2021. [Google Scholar]
  • 8. Abdelraouf A, Abdel-Aty M, Yuan J. Utilizing Attention-Based Multi-Encoder-Decoder Neural Networks for Freeway Traffic Speed Prediction. IEEE Transactions on Intelligent Transportation Systems. 2021. [Google Scholar]
  • 9. Roy KC, Hasan S, Culotta A, Eluru N. Predicting traffic demand during hurricane evacuation using real-time data from transportation systems and social media. Transportation research part C: emerging technologies. 2021, 131:1033–1039. doi: 10.1016/j.trc.2021.103339 [DOI] [Google Scholar]
  • 10. Astarita V, Giofre VP, Festa DC, Guido G, Vitale A. Floating Car Data Adaptive Traffic Signals: A Description of the First Real-Time Experiment with Connected Vehicles. Electronics. 2020, 9(1):114. doi: 10.3390/electronics9010114 [DOI] [Google Scholar]
  • 11. Yang S, Ma W, Pi X, Qian S. A deep learning approach to real-time parking occupancy prediction in transportation networks incorporating multiple spatio-temporal data sources. Transportation Research Part C: Emerging Technologies. 2019, 107:248–265. doi: 10.1016/j.trc.2019.08.010 [DOI] [Google Scholar]
  • 12. Chen J, Xiao Z, Wang D, Long W, Bai J, Havyarimana V. Stay time prediction for individual stay behavior. IEEE Access. 2019, 7:130085–130100. doi: 10.1109/ACCESS.2019.2940545 [DOI] [Google Scholar]
  • 13. Lin DJ, Chen MY, Chiang HS, Sharma PK. Intelligent Traffic Accident Prediction Model for Internet of Vehicles With Deep Learning Approach. IEEE Transactions on Intelligent Transportation Systems. 2021. doi: 10.1109/TITS.2021.3074987 [DOI] [Google Scholar]
  • 14. Rahim MA, Hassan HM. A deep learning based traffic crash severity prediction framework. Accident Analysis & Prevention. 2021, 154:106090. doi: 10.1016/j.aap.2021.106090 [DOI] [PubMed] [Google Scholar]
  • 15. Chiabaut N, Faitout R. Traffic congestion and travel time prediction based on historical congestion maps and identification of consensual days. Transportation Research Part C: Emerging Technologies. 2021, 124:102920. doi: 10.1016/j.trc.2020.102920 [DOI] [Google Scholar]
  • 16. Schmitt EJ, Jula H. On the limitations of linear models in predicting travel times. 2007 IEEE Intelligent Transportation Systems Conference. IEEE, 2007:830–835. [Google Scholar]
  • 17.Billings D, Yang JS. Application of the ARIMA models to urban roadway travel time prediction-a case study. 2006 IEEE International Conference on Systems, Man and Cybernetics. IEEE, 2006:2529–2534.
  • 18. Wu CH, Ho JM, Lee DT. Travel-time prediction with support vector regression. IEEE transactions on intelligent transportation systems. 2004, 5(4):276–281. doi: 10.1109/TITS.2004.837813 [DOI] [Google Scholar]
  • 19. Zhang Y, Haghani A. A gradient boosting method to improve travel time prediction. Transportation Research Part C: Emerging Technologies. 2015, 58:308–324. doi: 10.1016/j.trc.2015.02.019 [DOI] [Google Scholar]
  • 20. Ran X, Shan Z, Fang Y, Lin C. Travel time prediction by providing constraints on a convolutional neural network. IEEE Access. 2018, 6:59336–59349. doi: 10.1109/ACCESS.2018.2874399 [DOI] [Google Scholar]
  • 21. Abdollahi M, Khaleghi T, Yang K. An integrated feature learning approach using deep learning for travel time prediction. Expert Systems with Applications. 2020, 139:112864. doi: 10.1016/j.eswa.2019.112864 [DOI] [Google Scholar]
  • 22. Wang M, Li W, Kong Y, Bai Q. Empirical evaluation of deep learning-based travel time prediction. Pacific Rim Knowledge Acquisition Workshop. Springer, 2019:54–65. doi: 10.1007/978-3-030-30639-7_6 [DOI] [Google Scholar]
  • 23. Qiu J, Du L, Zhang D, Su S, Tian Z. Nei-TTE: intelligent traffic time estimation based on fine-grained time derivation of road segments for smart city. IEEE Transactions on Industrial Informatics. 2019, 16(4):2659–2666. doi: 10.1109/TII.2019.2943906 [DOI] [Google Scholar]
  • 24. Yuan H, Li G. A survey of traffic prediction: from spatio-temporal data to intelligent transportation. Data Science and Engineering. 2021, 6(1):63–85. doi: 10.1007/s41019-020-00151-z [DOI] [Google Scholar]
  • 25.Lipton ZC, Berkowitz J, Elkan C. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:150600019. 2015.
  • 26. Ran X, Shan Z, Fang Y, Lin C. An LSTM-based method with attention mechanism for travel time prediction. Sensors. 2019, 19(4):861. doi: 10.3390/s19040861 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Long K, Yao W, Gu J, Wu W, Han LD. Predicting freeway travel time using multiple-source heterogeneous data integration. Applied Sciences. 2019, 9(1):104. doi: 10.3390/app9010104 [DOI] [Google Scholar]
  • 28. Bing Q, Qu D, Chen X, Pan F, Wei J. Arterial travel time estimation method using SCATS traffic data based on KNN-LSSVR model. Advances in Mechanical Engineering. 2019, 11(5):1687814019841926. doi: 10.1177/1687814019841926 [DOI] [Google Scholar]
  • 29. Zhao J, Gao Y, Tang J, Zhu L, Ma J. Highway travel time prediction using sparse tensor completion tactics and-nearest neighbor pattern matching method. Journal of Advanced Transportation. 2018. [Google Scholar]
  • 30. Myung J, Kim DK, Kho SY, Park CH. Travel time prediction using k nearest neighbor method with combined data from vehicle detector system and automatic toll collection system. Transportation Research Record. 2011, 2256(1):51–59. doi: 10.3141/2256-07 [DOI] [Google Scholar]
  • 31.Fu K, Meng F, Ye J, Wang Z. Compacteta: A fast inference system for travel time prediction. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020:3337–3345.
  • 32.Yuan H, Li G, Bao Z, Feng L. Effective Travel Time Estimation: When Historical Trajectories over Road Networks Matter. 2020 ACM SIGMOD International Conference on Management of Data, 2020:2135–2149.
  • 33. Ran X, Shan Z, Shi Y, Lin C. Short-term travel time prediction: a spatiotemporal deep learning approach. International Journal of Information Technology & Decision Making. 2019, 18(04):1087–1111. doi: 10.1142/S0219622019500202 [DOI] [Google Scholar]
  • 34. Shen Y, Jin C, Hua J. TTPNet: A neural network for travel time prediction based on tensor decomposition and graph embedding. IEEE Transactions on Knowledge and Data Engineering. 2020. [Google Scholar]
  • 35. Li X, Wang H, Sun P, Zu H. Spatiotemporal Features—Extracted Travel Time Prediction Leveraging Deep-Learning-Enabled Graph Convolutional Neural Network Model. Sustainability. 2021, 13(3):1253. doi: 10.3390/su13031253 [DOI] [Google Scholar]
  • 36. Jin G, Wang M, Zhang J, Sha H, Huang J. STGNN-TTE: Travel time estimation via spatial–temporal graph neural network. Future Generation Computer Systems. 2022, 126:70–81. doi: 10.1016/j.future.2021.07.012 [DOI] [Google Scholar]
  • 37. Zhao J, Gao Y, Qu Y, Yin H, Liu Y, Sun H. Travel time prediction: Based on gated recurrent unit method and data fusion. IEEE Access. 2018, 6:70463–70472. doi: 10.1109/ACCESS.2018.2878799 [DOI] [Google Scholar]
  • 38. Cheng J, Li G, Chen X. Research on travel time prediction model of freeway based on gradient boosting decision tree. IEEE access. 2018, 7:7466–7480. doi: 10.1109/ACCESS.2018.2886549 [DOI] [Google Scholar]
  • 39. Chen Z, Fan W. A Freeway Travel Time Prediction Method Based on an XGBoost Model. Sustainability. 2021, 13(15):8577. doi: 10.3390/su13158577 [DOI] [Google Scholar]
  • 40. Qiu B, Fan WD. Machine Learning Based Short-Term Travel Time Prediction: Numerical Results and Comparative Analyses. Sustainability. 2021, 13(13):7454. doi: 10.3390/su13137454 [DOI] [Google Scholar]
  • 41. Ting PY, Wada T, Chiu YL, Sun MT, Sakai K, Ku WS, et al. Freeway Travel Time Prediction Using Deep Hybrid Model–Taking Sun Yat-Sen Freeway as an Example. IEEE Transactions on Vehicular Technology. 2020, 69(8):8257–8266. doi: 10.1109/TVT.2020.2999358 [DOI] [Google Scholar]
  • 42. Zou Z, Yang H, Zhu AX. Estimation of Travel Time Based on Ensemble Method With Multi-Modality Perspective Urban Big Data. IEEE Access. 2020, 8:24819–24828. doi: 10.1109/ACCESS.2020.2971008 [DOI] [Google Scholar]
  • 43. Li H, Xiong S. Time-varying weight coefficients determination based on fuzzy soft set in combined prediction model for travel time. Expert Systems with Applications. 2022, 189:115998. doi: 10.1016/j.eswa.2021.115998 [DOI] [Google Scholar]
  • 44. Do LN, Vu HL, Vo BQ, Liu Z, Phung D. An effective spatial-temporal attention based neural network for traffic flow prediction. Transportation research part C: emerging technologies. 2019, 108:12–28. doi: 10.1016/j.trc.2019.09.008 [DOI] [Google Scholar]
  • 45. Wu J, Wu Q, Shen J, Cai C. Towards attention-based convolutional long short-term memory for travel time prediction of bus journeys. Sensors. 2020, 20(12):3354. doi: 10.3390/s20123354 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Ran X, Shan Z, Fang Y, Lin C. A convolution component-based method with attention mechanism for travel-time prediction. Sensors. 2019, 19(9):2063. doi: 10.3390/s19092063 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Sun J, Kim J. Joint prediction of next location and travel time from urban vehicle trajectories using long short-term memory neural networks. Transportation Research Part C: Emerging Technologies. 2021, 128:103114. doi: 10.1016/j.trc.2021.103114 [DOI] [Google Scholar]
  • 48.Cho K, Van Merri enboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:14061078. 2014.
  • 49. Hochreiter S, Schmidhuber J. Long short-term memory. Neural computation. 1997, 9(8):1735–1780. doi: 10.1162/neco.1997.9.8.1735 [DOI] [PubMed] [Google Scholar]
  • 50. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in neural information processing systems. 2017:30. [Google Scholar]
  • 51. Zhao W, Fang D, Zhang J, Zhao Y, Xu X, Jiang X, et al. An effective framework for semistructured document classification via hierarchical attention model. International Journal of Intelligent Systems. 2021, 36(9):5161–5183. doi: 10.1002/int.22508 [DOI] [Google Scholar]
  • 52. Li X, Ye Z, Zhang Z, Zhao M. Clothes image caption generation with attribute detection and visual attention model. Pattern Recognition Letters. 2021, 141:68–74. doi: 10.1016/j.patrec.2020.12.001 [DOI] [Google Scholar]
  • 53.Arık SO, Pfister T. Tabnet: Attentive interpretable tabular learning. AAAI Conference on Artificial Intelligence. 2021:6679–6687.
  • 54. Bai J, Zhu J, Song Y, Zhao L, Hou Z, Du R, et al. A3t-gcn: Attention temporal graph convolutional network for traffic forecasting. ISPRS International Journal of Geo-Information. 2021, 10(7):485. doi: 10.3390/ijgi10070485 [DOI] [Google Scholar]
  • 55.Liao B, Zhang J, Wu C, McIlwraith D, Chen T, Yang S, et al. Deep sequence learning with auxiliary information for traffic prediction. 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; 2018:537–546.
  • 56. Xu X, Liu C, Zhao Y, Lv X. Short-term traffic flow prediction based on whale optimization algorithm optimized BiLSTM Attention. Concurrency and Computation: Practice and Experience. 2022:6782. doi: 10.1002/cpe.6782 [DOI] [Google Scholar]
  • 57. Liu J, Guan W. A summary of traffic flow forecasting methods. Journal of highway and transportation research and development. 2004, 21(3):82–85. [Google Scholar]
  • 58. Ahmed MS, Cook AR. Analysis of freeway traffic time-series data by using Box-Jenkins techniques. 1979. [Google Scholar]
  • 59. Smola AJ, Sch olkopf B. A tutorial on support vector regression. Statistics and computing. 2004, 14(3):199–222. doi: 10.1023/B:STCO.0000035301.49549.88 [DOI] [Google Scholar]
  • 60.Chen T, Guestrin C. Xgboost: A scalable tree boosting system. 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016:785–794.

Decision Letter 0

Xiyu Liu

29 Aug 2022

PONE-D-22-18546An Attention Based Recurrent Learning Model for Short-term Travel Time PredictionPLOS ONE

Dear Dr. Chughtai,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Oct 13 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Xiyu Liu

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf  and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. Thank you for stating the following in the Acknowledgments Section of your manuscript:

“The funding for this research is provided by the Khalifa University of Science and Technology.”

We note that you have provided additional information within the Acknowledgements Section that is not currently declared in your Funding Statement. Please note that funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“The author(s) received no specific funding for this work.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thank you very much for the beautiful manuscript. I think the paper is in good condition!

The idea presented in the attention section is very practical and logical.

Manuscripts and forms are written in beautiful formats.

Accept

Reviewer #2: Paper summary:

This paper proposed an attention-based GRU model for short-term travel time prediction, enabling GRU to learn the relevant context in historical time slots and update the weights of hidden states accordingly. The authors evaluated the proposed model using FCD data from Beijing. To demonstrate the generalization of the proposed model, the authors performed a robustness analysis by adding noise obeying Gaussian distribution. The experimental results on test data indicated that the proposed model performed better than the existing deep learning time-series models in terms of RMSE, MAE, MAPE, and R^2.

Strengths:

1. This proposal is the first to use traffic flow as input with attention-based GRU to forecast travel time.

2. This proposal is well-structured, and the experiments are detailed and convincing.

Weaknesses:

1. I am wondering whether the overfitting of the original model will affect the results of this proposal.

2. The related work section is suggested to be divided into 2-3 subsections to make the structure of this section clearer.

Other comments:

1. It seems that attention based is missing a hyphen in “attention based”.

2. To overcome these limitations, two specialized variants of RNNs, Gated Recurrent Unit (GRU) [48] and Long Short-Term Memory (LSTM) [49] are developed. Consider inserting a comma to separate the elements.

3. RNNs, unlike MLPs and CNNs (feed-forward neural networks and take data all at once), act on data sequentially and are frequently employed in the Natural Language Processing (NLP) domain. It seems that there is a grammar mistake.

4. “This paper employed GRU with attention mechanism to process travel time sequences and forecast future TT.”. It seems that there is an article usage problem.

Reviewer #3: The authors proposed an attention-based GRU model for short-term travel time prediction to cope with this problem enabling GRU to learn the relevant context in historical time slots and update the weights of hidden states accordingly. This scheme thus solves the problem that the existing GRU does not consider the relationship between various historical travel time slots for traffic prediction. The main message, background and figures are generally clear and concise, but some comparisons with the state-of-the-art algorithms still need to be added in the experimental results section. I recommend improving the following points.

==Major concern==

1.The algorithm proposed in the article integrates the attention mechanism into the GRU to improve the prediction ability. And the comparison results with GRU and several other traditional algorithms are given in the article. In fact, in recent years, attention mechanism has been introduced into DNN to solve Travel Time Prediction, such as [1]. Therefore, it is hoped that the author can add new experiments to compare with such algorithms.

[1] Wu J, Wu Q, Shen J, Cai C. Towards attention-based convolutional long short-term memory for travel time prediction of bus journeys. Sensors. 2020;20(12):3354

==Minor concern==

1. On line 203 of the article, too many commas are printed.

2. References are poorly written. For example, the representation of page numbers, some are p.785-794, some are: 785-794. In addition, the reference is ended with a period, and some ends with a semicolon and a period. and many more.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Dec 1;17(12):e0278064. doi: 10.1371/journal.pone.0278064.r002

Author response to Decision Letter 0


21 Oct 2022

1. Reviewer#2, Concern # 1: I am wondering whether the overfitting of the original model will affect the results of this proposal.

Author response: Thank you very much for the valuable suggestion.

Author action: As discussed in the Dataset section, we have used holdout cross validation (80-20 %) to validate the results of our proposed approach. Furthermore, we have used a normalization term in the loss function calculation to regularize the training and improve model generalization.

2. Reviewer#2, Concern # 2: The related work section is suggested to be divided into 2-3 subsections to make the structure of this section clearer.

Author response: Thank you very much for the valuable suggestion.

Author action: We have updated the Related Work section by organizing the literature under suitable headings as suggested by the reviewer.

3. Reviewer#2, Concern # 3: It seems that attention based is missing a hyphen in “attention based”.

Author response: Thank you very much for pointing it out.

Author action: We have updated the manuscript and added a hyphen in “attention based”.

4. Reviewer#2, Concern # 4: To overcome these limitations, two specialized variants of RNNs, Gated Recurrent Unit (GRU) [48] and Long Short-Term Memory (LSTM) [49] are developed. Consider inserting a comma to separate the elements.

Author response: Thank you very much for the valuable suggestion.

Author action: We have inserted comma to separate the elements as suggested by the reviewer.

5. Reviewer#2, Concern # 5: RNNs, unlike MLPs and CNNs (feed-forward neural networks that take data all at once), act on data sequentially and are frequently employed in the Natural Language Processing (NLP) domain. It seems that there is a grammar mistake.

Author response: Thank you very much for pointing it out.

Author action: We have corrected the identified grammar mistake and updated the manuscript.

6. Reviewer#2, Concern # 6: “This paper employed GRU with attention mechanism to process travel time sequences and forecast future TT.”. It seems that there is an article usage problem.

Author response: Thank you very much for pointing it out.

Author action: We have corrected the highlighted issue in the manuscript.

1. Reviewer#3, Concern # 1: The algorithm proposed in the article integrates the attention mechanism into the GRU to improve the prediction ability. And the comparison results with GRU and several other traditional algorithms are given in the article. In fact, in recent years, attention mechanism has been introduced into DNN to solve Travel Time Prediction, such as [1]. Therefore, it is hoped that the author can add new experiments to compare with such algorithms.

[1] Wu J, Wu Q, Shen J, Cai C. Towards attention-based convolutional long short-term memory for travel time prediction of bus journeys. Sensors. 2020;20(12):3354

Author response: Thank you very much for the valuable suggestion.

Author action: We have cited the reference suggested by the reviewer in our paper and some other references in the Related Work section. This research work is different from the predictions provided by [1] and [2]. [1] proposed an attention-based convolutional long short-term memory for predicting journey trip time of selected bus routes at current time. [2] proposed attention-based LSTM for predicting travel time of freeway at current time. In our research work, we have used urban network data of Beijing which is different from freeways or bus data (selected routes). Secondly, we proposed an attention-based GRU model for short-term travel time prediction. This is the reason we have compared our approach with GRU and several other traditional algorithms from various families as discussed in the Related Work section. A fair comparison of real-time (current time) prediction approaches with short-term travel time prediction approaches is not possible. Once again, we are thankful to the respectable reviewer for the valuable suggestion.

[1] Wu J, Wu Q, Shen J, Cai C. Towards attention-based convolutional long short-term memory for travel time prediction of bus journeys. Sensors. 2020;20(12):3354

[2] Ran X, Shan Z, Fang Y, Lin C. An LSTM-based method with attention

mechanism for travel time prediction. Sensors. 2019;19(4):861.

2. Reviewer#3, Concern # 2: On line 203 of the article, too many commas are printed.

Author response: Thank you very much for pointing it out.

Author action: We have corrected the identified error in the manuscript.

3. Reviewer#3, Concern # 3: References are poorly written. For example, the representation of page numbers, some are p.785-794, some are: 785-794. In addition, the reference is ended with a period, and some ends with a semicolon and a period. and many more.

Author response: Thank you very much for pointing it out.

Author action: We have updated the manuscript by updating all the references.

Attachment

Submitted filename: Response-to-Reviewers.docx

Decision Letter 1

Xiyu Liu

9 Nov 2022

An Attention-based Recurrent Learning Model for Short-term Travel Time Prediction

PONE-D-22-18546R1

Dear Dr. Chughtai,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Xiyu Liu

Academic Editor

PLOS ONE

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: (No Response)

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: (No Response)

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: (No Response)

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: (No Response)

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: (No Response)

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: (No Response)

Reviewer #3: The authors proposed an attention-based GRU model for short-term travel time prediction to cope with this problem enabling GRU to learn the relevant context in historical time slots and update the weights of hidden states accordingly.They have answered all my questions. I have no other comments.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: No

**********

Acceptance letter

Xiyu Liu

18 Nov 2022

PONE-D-22-18546R1

An Attention-based Recurrent Learning Model for Short-term Travel Time Prediction

Dear Dr. Chughtai:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Professor Xiyu Liu

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Response-to-Reviewers.docx

    Data Availability Statement

    All the implementation details of our work can be found at https://github.com/jawadchughtai/Att_GRU_TTP We have made the visibility of the repository public. Also, minimal anonymized dataset is provided in the Dataset folder to replicate our work.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES