Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2020 Oct 2;176:3009–3018. doi: 10.1016/j.procs.2020.09.202

Predictive analytics on open big data for supporting smart transportation services

Paul Patrick F Balbin a, Jackson CR Barker a, Carson K Leung a, Marvin Tran a, Riley P Wall a, Alfredo Cuzzocrea b
PMCID: PMC7531986  PMID: 33042316

Abstract

In the current era of big data, huge quantities of valuable data, which may be of different levels of veracity, are being generated at a rapid rate. Embedded into these big data are implicit, previously unknown and potentially useful information and valuable knowledge that can be discovered by data science solutions, which apply techniques like data mining. There has been a trend that more and more collections of these big data have been made openly available in science, government and non-profit organizations so that people could collaboratively study and analysis these open big data. In this article, we focus on open big data for public transit because public transit (e.g., bus) as a means of transportation is a vital part of many people’s lives. As time is a precious resource, bus delays could negatively affect commuters’ plans. Unfortunately, they are inevitable. Hence, many existing works focused on predicting bus delays. However, predicting on-time or early buses is also important. For instance, commuters who come to a bus stop on time may still miss their buses if the buses leave early. So, in this article, we examine open big data about bus performance (e.g., early, on-time, and late stops). We analyze the data with frequent pattern mining and make predictions with decision-tree based classification. For illustration, we perform predictive analytics on real-life open big data available on Winnipeg Open Data Portal, about bus performance from Winnipeg Transit. It shows the benefits of predictive analytics on open big data for supporting smart transportation services.

Keywords: Predictive analytics, open data, Winnipeg open data, big data, transportation data, on-time performance, frequent patterns, software engineering, large-scale systems

References

  • 1.Ang R.P., Goh D.H. "Predicting juvenile offending: a comparison of data mining methods,". Int. J. Offender Therapy and Comparative Criminology. 2013;57(2):191–207. doi: 10.1177/0306624X11431132. [DOI] [PubMed] [Google Scholar]
  • 2.Ashraf N., Haque R.R., Islam M.A., Ahmed C.F., Leung C.K., Mai J.J., Wodi B.H. "WeFreS: weighted frequent subgraph mining in a single large graph,". ICDM. 2019:201–215. 2019. [Google Scholar]
  • 3.Audu A.A., Cuzzocrea A., Leung C.K., MacLeod K.A., Ohin N.I., Pulgar-Vidal N.C. "An intelligent predictive analytics system for transportation analytics on open data towards the development of a smart city,". CISIS. 2019:224–236. 2019. [Google Scholar]
  • 4.Chowdhury, N.K., & C.K. Leung (2011) “Improved travel time prediction algorithms for intelligent transportation systems,” in KES 2011, Part II, pp. 355--365.
  • 5.Chowdhury N.K., Nath R.P.D., Lee H., Chang J. "Development of an effective travel time prediction method using modified moving average approach,". KES. 2009:130–138. 2009, Part I. [Google Scholar]
  • 6.Cox T.S., Hoi C.S.H., Leung C.K., Marofke C.R. "An accurate model for hurricane trajectory prediction,". IEEE COMPSAC. 2018;2:534–539. 2018. [Google Scholar]
  • 7.Cuzzocrea A. "Combining multidimensional user models and knowledge representation and management techniques for making web services knowledge-aware". WIAS. 2006;4(3):289–312. [Google Scholar]
  • 8.Cuzzocrea A., Bertino E. "Privacy preserving OLAP over distributed XML data: a theoretically-sound secure-multiparty-computation approach,". JCSS. 2011;77(6):965–987. [Google Scholar]
  • 9.Cuzzocrea A., de Maio C., Fenza G., Loia V., Parente M. "OLAP analysis of multidimensional tweet streams for supporting advanced analytics,". ACM SAC. 2016:992–999. 2016. [Google Scholar]
  • 10.Cuzzocrea A., Russo V. "Privacy preserving OLAP and OLAP security,". Encyclopedia of Data Warehousing and Mining. 2009:1575–1581. [Google Scholar]
  • 11.Cuzzocrea A., Song I. "Big graph analytics: the state of the art and future research agenda,". DOLAP. 2014:99–101. 2014. [Google Scholar]
  • 12.Czibula G., Mihai A., Crivei L.M. "S PRAR: a novel relational association rule mining classification model applied for academic performance prediction,". Procedia Computer Science. 2018;159:20–29. KES 2019. [Google Scholar]
  • 13.Dedić N., Stanier C. "Towards differentiating business intelligence, big data, data analytics and knowledge discovery,". ERP Future. 2016:114–122. 2016. [Google Scholar]
  • 14.de Guia J., Devaraj M., Leung C.K. "DeepGx: deep learning using gene expression for cancer classification,". IEEE/ACM ASONAM. 2019:913–920. 2019. [Google Scholar]
  • 15.Friesen J., Rausch L., Pelz P., Fürnkranz J. "Determining factors for slum growth with predictive data mining methods,". Urban Science. 2018;2(3):81:1–81:19. [Google Scholar]
  • 16.Ivancevic V., Lukovic I. "National university rankings based on open data: a case study from Serbia,". Procedia Computer Science. 2018;126:1516–1525. KES 2018. [Google Scholar]
  • 17.Jiang, F., C.K. Leung, R. Middleton, & A.G.M. Pazdor (2018) “Big social data mining in a cloud computing environment,” in ICCBB 2018, pp. 58--65.
  • 18.Jodayree M., Abaza M., Tan Q. "A predictive workload balancing algorithm in cloud services,". Procedia Computer Science. 2019;159:902–912. KES 2019. [Google Scholar]
  • 19.Kassen M. "A promising phenomenon of open data: a case study of the Chicago open data project,". Government Information Quarterly. 2013;30(4):508–513. [Google Scholar]
  • 20.Kulla, E., S. Morita, K. Katayama, & L. Barolli (2018) “Route lifetime prediction method in VANET by using AODV routing protocol (AODV-LP),” in CISIS 2018, pp. 3--11.
  • 21.Lakshmanan L.V., Leung C.K., Ng R.T. "The segment support map: scalable mining of frequent itemsets,". ACM SIGKDD Explorations. 2000;2(2):21–27. [Google Scholar]
  • 22.Leung C.K. "Frequent itemset mining with constraints,". Encyclopedia of Database Systems. 2009:1179–1183. [Google Scholar]
  • 23.Leung C.K. "Big data analysis and mining,". Encyclopedia of Information Science and Technology. 2018;4e:338–348. [Google Scholar]
  • 24.Leung C.K., Braun P., Cuzzocrea A. "AI-based sensor information fusion for supporting deep supervised learning,". Sensors. 2019;19(6):1345:1–1345:12. doi: 10.3390/s19061345. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Leung, C.K., P. Braun, C.S.H. Hoi, J. Souza, & A. Cuzzocrea (2019) “Urban analytics of big transportation data for supporting smart cities,” in DaWaK 2019, pp. 24--33.
  • 26.Leung, C.K., P. Braun, & A.G.M. Pazdor (2018) “Effective classification of ground transportation modes for urban data mining in smart cities,” in DaWaK 2018, pp. 83--97.
  • 27.Leung C.K., Cuzzocrea A., Mai J.J., Deng D., Jiang F. "Personalized DeepInf: enhanced social influence prediction with deep learning and transfer learning,". IEEE BigData. 2019:2871–2880. 2019. [Google Scholar]
  • 28.Leung C.K., Joseph K.W. "Sports data mining: predicting results for the college football games,". Procedia Computer Science. 2014;35:710–719. KES 2014. [Google Scholar]
  • 29.Leung C.K., Tanbeer S.K., Cameron J.J. "Interactive discovery of influential friends from social networks,". Social Network Analysis and Mining. 2014;4(1):154:1–154:13. [Google Scholar]
  • 30.Lin Y., Yang X., Zou N., Jia L. "Real-time bus arrival time prediction: case study for Jinan,". China J. Transport. Eng. 2013;139(11):1133–1140. [Google Scholar]
  • 31.Morris K.J., Egan S.D., Linsangan J.L., Leung C.K., Cuzzocrea A., Hoi C.S.H. "Token-based adaptive time-series prediction by en-sembling linear and non-linear estimators: a machine learning approach for predictive analytics on big stock data,". IEEE ICMLA. 2018:1486–1491. 2018. [Google Scholar]
  • 32.Nath, R.P.D., H. Lee, N.K. Chowdhury, & J. Chang (2010) “Modified k-means clustering for travel time prediction based on historical traffic data,” in KES 2010, Part I, pp. 511--521.
  • 33.Phankokkruad, M., & S. Wacharawichanant (2018) “Prediction of mechanical properties of polymer materials using extreme gradient boosting on high molecular weight polymers,” in CISIS 2018, pp. 375--385.
  • 34.Rajput P., Toshniwal D., Agggarwal A. Improving infrastructure for transportation sys-tems using clustering. LNCS. 2017;10721:129–143. BDA 2017. [Google Scholar]
  • 35.Sassi M.S., Jedidi F.G., Fourati L.C. "A new architecture for cognitive internet of things and big data,". Procedia Computer Science. 2019;159:534–543. KES 2019. [Google Scholar]
  • 36.Singh, S.P., C.K. Leung, F. Jiang, & A. Cuzzocrea (2019) “A theoretical approach to discover mutual friendships from social graph networks,” in iiWAS 2019, pp. 212--221.
  • 37.Snijders C., Matzat U., Reips U. "‘Big data’: big gaps of knowledge in the field of internet,". Int. J. Internet Science. 2012;7:1–5. [Google Scholar]
  • 38.Sun D., Luo H., L Fu, Liu W., Liao X., Zhao M. "Predicting bus arrival time on the basis of global positioning system data,". Transp. Res. Rec. 2007;2034(1):62–72. (2007) [Google Scholar]
  • 39.Tanbeer S.K., Leung C.K., Cameron J.J. "Interactive mining of strong friends from social networks and its applications in e-commerce,". JOCEC. 2014;24(2--3):157–173. [Google Scholar]
  • 40.Upoma, F.M., S.A. Khan, C.F. Ahmed, T. Alam, S.A. Zahin, & C.K. Leung (2019) “Discovering correlation in frequent subgraphs,” in IMCOM 2019, pp. 1045--1062.
  • 41.Cuzzocrea, A., Mastroianni, C., & Grasso, G.M. (2016) “Private databases on the cloud: Models, issues and research perspectives,” in BigData 2016, pp. 3656--3661.
  • 42.Camara, R.C., Cuzzocrea, A., Grasso, G.M., Leung, C.K., Powell, S.B., Souza, J., & Tang, B. (2018) “Fuzzy Logic-Based Data Analytics on Predicting the Effect of Hurricanes on the Stock Market,” in FUZZ-IEEE 2018, pp. 1--8.
  • 43.Braun, P., Cuzzocrea, A., Leung, C.K., Pazdor, A.G.M., Tanbeer, S.K., & Grasso, G.M. (2018) “An Innovative Framework for Supporting Frequent Pattern Mining Problems in IoT Environments,” in ICCSA 2018, pp. 642--657.

Articles from Procedia Computer Science are provided here courtesy of Elsevier

RESOURCES