Abstract
This study presents a methodology for estimating passenger’s spatio-temporal trajectory with personalization and timeliness by using incomplete Wi-Fi probe data in urban rail transit network. Unlike the automatic fare collection data that only records passenger’s entries and exits, the Wi-Fi probe data can capture more detailed passenger movements, such as riding a train or waiting on a platform. However, the estimation of spatio-temporal trajectories remains as a challenging task because a few unfavorable situations could result into deficient data. To address this problem, we first describe the Wi-Fi probe data and summarize their common defects. Then, the n-gram method is developed to infer missing spatio-temporal location information. Next, an estimation algorithm is designed to generate feasible spatio-temporal trajectories for each individual passenger by integrating multiple data sources, i.e., urban rail transit network topology, Wi-Fi probe data, train schedules, etc. This proposed method is tested on both simulated data in blind experiments and real-world data from a complex urban rail transit network. The results of case study show that 93% of passengers’ unique physical routes can be estimated. Then, for 80% of passengers, the number of feasible spatio-temporal trajectories can be reduced to one or two. Potential applications of the trajectory estimation approach are also identified.
Keywords: Urban rail transit, Trajectory estimation, Spatio-temporal network, n-gram method, Wi-Fi probe data
Highlights
-
•
This study extends personalization and timeliness into the SST estimation.
-
•
A new algorithm is developed by integrating structurally heterogeneous data sources.
-
•
The estimated trajectory describes detailed information both on-board and at stations.
-
•
Real-world analyses are conducted for the URT network with the largest route length in China.
-
•
This is the first attempt to find out all the routes and trains at the network level.
1. Introduction
A spatio-temporal trajectory (STT) of passenger in urban rail transit (URT) network is the recording of one passenger’s motion, i.e., the route and train choice of each passenger at specific timestamps. Estimation of passengers’ route and train choices within a complex seamless transfer URT network is a challenging task due to the rapid expansion of network operation and high departure frequencies. In Shanghai, China, there are 415 stations on 16 URT lines, 53 of which are transfer stations as of October 2019 [1]. In general, there are multiple routes and trains between each origin–destination pair (OD). For example, from Disney Resort to Tongji University, there are at least 8 possible routes and the minimum headway of trains is 2 min and 30 s [2]. Obviously, it is critical for rail transit operators to understand how passengers choose routes in order to better plan and operate the services, especially on those dense networks with complex structures. For instance, the route choice is essential to allocating passenger flows to different rail transit lines, which further determine how fare revenues are distributed. The estimation results can also help operators to estimate the train loads over time and space. Furthermore, real-time demand management and application evaluation, e.g., personalized dynamic route guidance, flexible pricing strategy, and real-time passenger flow control, etc., can be conducted more efficient if passenger flows are assigned at the granularity of trains. Therefore, despite its technical intricacy, the STT estimation is required to support the day-to-day operations of large URT systems.
In previous studies [3], [4], automatic fare collection (AFC) data have been used for inferring passenger route choices on UTR system. However, such transaction data only contain information on tap-in and tap-out timestamps and stations, without any information between the two trip ends. Therefore, it is very challenging to estimate the most likely route a passenger took based on the limited AFC data. On the other hand, although various complex methods are developed, such as stochastic assignment principle [5], [6] and passenger flow assignment model [7], [8], [9], they still do not pay enough attention to personalization and timeliness when dealing with the problem of SST estimation. It would result in neither the number of passengers left behind nor the real-time train load can be accurately calculated. It is desirable to introduce new technologies that could reveal intuitively detailed information about individual passengers’ STT during their trips, thus greatly reducing the set of feasible routes and trains.
Considering the very large-scale and complex URT networks in East Asia, especially in major Chinese metropolitans, it is necessary to explore new dataset and develop new approach for estimating rail transit route choices efficiently and relatively accurately. In this study, we use the emerging Wi-Fi probe data thanks to the recent deployment of Wi-Fi networks in major rail transit systems in China. But a few unfavorable situations could result into incomplete Wi-Fi probe data in URT network. Then, a spatio-temporal location inference model based on n-gram method is developed to analysis how an individual passenger travels on different rail transit lines and trains. Next, we present an estimation algorithm to generate feasible STT for each individual passenger by considering multiple data sources, i.e., metro network topology, Wi-Fi probe data, train schedules, and passenger walking times, etc. In general, this paper contributes to the URT route estimation literature by presenting the first study to estimate the STT of individual passenger using Wi-Fi probe data at the network level. The main contributions are summarized as:
(1) This study extends personalization and timeliness into the STT estimation by integrating structurally heterogeneous data sources. This space–time network topology that is integrated by URT network’s physical topology and train diagram is set as boundary conditions of the individual passenger’s STT, and the real-time Wi-Fi probe data are adopted to reduce the STT searching space.
(2) A new algorithm of estimating individual passenger’s STT that describes detailed information both on-board and at stations during the trip is developed. This methodology consists of three sub-tasks, i.e., constructing network topology, inferring spatio-temporal location, and matching Wi-Fi probe data to space–time network topology by considering the node–node, node–link and link–link relationships.
(3) Real-world analyses are conducted for the URT network with the largest route length in China. This is also the first attempt to find out all the routes and trains that passengers actually choose using Wi-Fi probe data at the complex real-world URT network level.
This paper is organized as follows: Section 2 gives a review on route choice studies using AFC data and analyses of Wi-Fi probe data in URT networks. Then, Section 3 describes the Wi-Fi probe data and the algorithm outline. Afterwards, Section 4 explains in detail our proposal SST estimation methodology. Section 5 shows the results of blind experiment and real-world case study. And finally, Section 6 expounds the conclusions and future works.
2. Related work
The estimation of STT in URT networks is difficult, especially in the case of complex URT networks. Previous studies on STT estimation can be divided into two categories, i.e., route choice modeling using AFC transaction data and Wi-Fi probe data used in route-choice estimation.
2.1. Route choice model using AFC data
The AFC system is used for collecting fares when passengers tap-in or tap-out the URT. Furthermore, this system also records the station and time information which can be used for inferring the passenger’s route choice behavior [10], [11]. In light of the AFC data, some researchers tackled this problem based on a stochastic assignment principle [5], [6] and passenger flow assignment model [7], [8], [9]. Xu et al. [5] proposed a deletion algorithm for available routes based on the depth-first principle and calculated the corresponding proportions. But the model proposed in that paper is a static model. Moreover, train schedule connection networks (TSCN) were mainly used to estimate STT which was to assign passengers to each train. Zhou and Xu [12] connected rail network with train schedules. Sun et al. [13] named this network as TSCN and established a space–time trajectory estimation model which was a set generation and weighted assignment problem for feasible TSCN routes. The optimal trajectory problem using AFC data was also researched within the last few years. Chen et al. [14], Chen et al. [15] and Zhu et al. [16] concentrated on optimizing route estimation method and established models to estimate the most likely space–time trajectory. Comparison of previous studies on estimating metro passengers’ trajectories is listed in Table 1.
Table 1.
Comparison of previous studies on estimating metro passengers’ trajectories.
| Studies | Years | Data | Methods | Applications |
|---|---|---|---|---|
| Zhu et al. [3] | 2017 | AFC, automated vehicle location (AVL) | Maximum likelihood estimation and bayesian inference methods | Estimate the passengers left behind probability |
| Zhang et al. [17] | 2018 | AFC, self-reported revealed preference data | Integrating an expectation–maximization algorithm and a nested logit model estimation method | Estimate metro passengers’ route choices |
| Ma et al. [4] | 2019 | AFC, AVL | Mixture model consists of clustering and event-based assignment model | Estimate denied boarding in urban rail systems |
| Hänseler et al. [18] | 2020 | AFC, AVL | A passenger–pedestrian model | Describe pedestrian movements in train stations and vehicle-specific train ridership distributions |
| Luo et al. [19] | 2018 | AFC, AVL, general transit feed specification data | A matching methodology with four steps | Obtain load profiles of transit vehicles |
| This research | 2020 | Wi-Fi probe data, AVL, URT network’s physical topology | Network topology construction, spatio-temporal location inference and trajectory estimation | Estimate detailed STT information both on-board and at stations for individual passenger in real time |
Although the modeling is considered rather comprehensive in theory, the input data and model parameters seriously restrict the validity of the model. On one hand, AFC data [3], [4], [17], [18], [19] could only provide passengers’ origin and destination information, which makes the estimation of STT and even route on the complex URT network become very difficult. On the other hand, there has no intermediate information between origin and destination, which means most constraints need to be established by setting model parameters. But these parameters are obtained through manual investigation and hardly take each individual’s differences into consideration. In addition, denied boarding at the station is a critical element that is neglected by [17]. Likewise, Hänseler et al. [18] assumed that passengers are able to board any desired train. This is also a critical limitation for estimating route choice using solely AFC in the congested metro systems. Without further information, it has a very limited way to differentiate a same 20 min journey observation is caused by denied boarding using a short route or no denied boarding with a long route. Therefore, the route and STT estimated by models could be inaccurate and the characteristics of individual cannot be reflected. Tang et al. [20] proposed a time geographic method to estimate the most likely space–time routes of passengers in vehicle level. Luo et al. [19] obtained load profiles of transit vehicles in the transit network consisting of 12 tram lines and 8 bus lines. The above-mentioned methods estimated metro passenger dynamics at the station- and vehicle-levels separately. Thus, there is an urgent need to develop a new algorithm to describe detailed STT information both on-board and at stations.
2.2. Analyses of Wi-Fi probe data in URT networks
The Wi-Fi probe technology has a relatively concentrated detection range than GPS data [21] and fast detection speed. So short-time passenger route restoration and monitoring can be realized. At present, most passengers are willing to open the Wi-Fi function of their cell phones when taking the metro, so the sampling rate can be guaranteed and the analysis results will be representative. With the comprehensive coverage of the Wi-Fi signal in the URT network, the quantity and position accuracy of the data are sufficient to support the STT estimation of the individual passenger.
The Wi-Fi probe data contains the interaction information between Access Points (APs) and the access device (e.g., mobile phones, computers). The route of the object carrying the Wi-Fi access device can be tracked dynamically. These studies about Wi-Fi technology mainly were applied in indoor positioning [22], [23], [24]. Zhuang et al. [25] studied the AP location problem of smart phones by using automatic AP positioning and propagation parameter estimation of inertial navigation. Ma et al. [10] proposed a pedestrian trajectory estimation method to match the Wi-Fi indoor positioning system with the mapping building. Shang et al. [26] designed a space–time-state hyper network-based assignment approach to estimate the passenger flow state by integrating multiple data sources (i.e., AFC data, Wi-Fi data and flow data). Some researchers used media access control (MAC) addresses, which are unique identifiers of electronics devices, to estimate passengers’ travel times or waiting times in stations of urban rail or bus [27], [28]. Abedi et al. [29] took pedestrian and cyclists as research objects, fixed Wi-Fi with Bluetooth to track people in train stations, airports, etc. Shlayan et al. [30] estimated time-dependent OD demands and station waiting time of bus and subway users by Bluetooth and Wi-Fi technologies to explore the potential benefits of improving the level of public transport services. Li et al. [31] focused on Received Signal Strength Indication (RSSI) technique to predict the trajectory of passengers within URT stations. To the best of our knowledge, none of the existing studies have used the Wi-Fi probe data to estimate the passenger’s STT on large-scale and complex networks.
3. Overview
3.1. Wi-Fi probe data
Over the past few years, indoor Wi-Fi technology has developed rapidly in China. Extensive Wi-Fi coverage in rail transit systems have been achieved in many cities. One of the largest URT networks in china, referred to as X, has been installed Wi-Fi AP devices in 277 stations and 464 trains across 12 URT lines. On each workday, about 70 million Wi-Fi probe data records are collected. The primarily purpose of these Wi-Fi devices is to provide better internet service for passengers. These AP infrastructures can also be harnessed to capture and archive bulk positioning data while the mobile devices of passengers are connected to corresponding AP device. After removing unrelated information, the Wi-Fi probe data attributes used in this study are listed in Table 2.
Table 2.
Attributes in Wi-Fi probe data in URT network.
| Attribute | Explanation | Type | Sample |
|---|---|---|---|
| AD_MAC | Anonymous MAC address | String | 34-80-B3-3C-XX-XX |
| CAPTURE_TIME | Time of detection | Datetime | 2017-08-18 21:03:12 |
| STATION | Station name | String | Sipinglu Station |
| POS | Detected position | String | 0: station hall; 1: platform; 2: train |
| LN_ID | URT line number | Number | |
| TRAIN_ID | Train number | Number | 203 |
| CAR_ID | Carriage number | String |
It is worth noting that the Wi-Fi probe data of this URT system is regulated by the public security department. The dataset used in this paper is only for research purposes. It has been authorized by the metro transit police department. What is more, the personal information has been desensitized through the privacy-preserving authentication protocol [32]. In an ideal situation, once an anonymous passenger with an access device enters the URT system, this passenger’s location information will be continuously collected over time. The location can be classified into station and link. Fig. 1 illustrates the Wi-Fi probe data collection process when one passenger travels from station A to station B via transfer station C.
Fig. 1.
Ideal case of passenger’s location identification based on Wi-Fi probe data.
As shown in Fig. 1, the blue arrows indicate the complete STT of this passenger. Depending on the positions of the APs, the passenger’s location can be detected in many places (such as station hall, platform, and train). If a passenger’s locations are detected regularly and frequently, e.g., every 30 s, it is not difficult to reconstruct the actual route and identify train the passenger chose. However, after preliminary analyses of the Wi-Fi probe data, only 5% of passenger’s complete routes can be reconstructed. For the rest of the passengers, their routes cannot be determined solely based on Wi-Fi probe data, because such data usually suffers from the following issues: (1) there is only one data record collection for an AD_MAC, (2) either the origin or the destination station is missing, or (3) although both the origin and destination station are identifiable, certain positioning data (i.e., transfer station, LN_ID and TRAIN_ID, etc.) are lost during the trip.
Therefore, this study aims to identify the most likely routes and trains that are taken by each individual passenger in term of AD_MAC. In particular, according to the historical trip information of an individual passenger, the station information can be complemented. Then, this problem is transformed into the third category issue proposed above. Considering that the information about trains or transfer stations of passenger is partially lost, additional data including walking time and actual train schedules are used to eliminate infeasible STT between the identified origins and destinations in this paper.
3.2. Algorithm outline
The STT estimation process of individual passenger is mainly composed of three steps. They can be respectively summarized as Wi-Fi probe data collection, path estimation and SST estimation. Fig. 2 illustrates the overall STT estimation process on a real-world URT network.
Fig. 2.
STT estimation process.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
These points in Fig. 2(a) are Wi-Fi data points for one journey. First, we can generate an initial Wi-Fi probe data network by sequentially connecting the adjacent Wi-Fi probe data. Then, we can coordinate this Wi-Fi data network with the URT network topology. The purpose of path estimation is to identify these crucial stations that passengers passed by coordinating the Wi-Fi data network with the URT network topology. These crucial stations include the original station, the destination station and the transfer station. As shown in Fig. 2(b), there are two feasible paths from station A to station B, namely ACB and ADEB. According to the formed Wi-Fi data network, no data is recorded at station D, station E and URT line 3, so it can be sure that this passenger traveled from station A to station B by path ACB. To estimate the passenger’s STT along this estimated path, train timetable and transfer time parameters need to be used to generate the time when the passenger arrived at another four important positions. They are the station hall where passenger enters the URT system, the platform that passenger gets up a train, the platform that passenger gets off a train, and the station hall where the passenger exits the URT system or transfers to another URT line. The estimated STT is connected with arrowheads in Fig. 2(c). And these four kinds of important positions are marked as red, yellow, green and blue, respectively. Finally, the SST of each individual passenger can be represented as a combination of the route and train choice at specific timestamps.
4. Methodology
The passenger STT estimation algorithm developed for this study includes three subtasks: (1) extracting a route skeleton from the collected Wi-Fi probe data, (2) inferring the spatio-temporal locations for each individual passenger according to the route skeleton and historical trajectories, and (3) estimating the STT based on the URT network’s physical topology and train diagram.
4.1. Network topology construction
(1) Space–time network topology
This space–time network topology is integrated by URT network’s physical topology and train diagram. An abstract URT network’s physical topology G that consists of stations and trains is shown in Fig. 3(a). The route from station A to station B by route ACB can be extended into such a STT, as shown in Fig. 3(b).
Fig. 3.
A simple URT network and one STT.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
These four kinds important positions referred to above could be treated as nodes without length and area, . They are defined as entry node , boarding node , alighting node , and exit node respectively. Those four kinds nodes share the same spatiotemporal attribute: the metro station and the timestamp . Boarding and alighting nodes have an extra attribute: the train set . Thus, according to different categories of nodes , the spatial attributes of metro station can be expressed as , , and . The time attributes are respectively expressed as entry time , boarding time/train departure time , alighting time/train arriving time and exit time . The train set attributes of boarding and alighting nodes which are expressed as and , come from train timetable data and indicate the possible train from which passengers may board or alight. Then the four processes can be converted into links A in this network. Four types of links, namely access link , egress link , train link and transfer link are defined. Concretely, combining with the space–time network topology in Fig. 3(b), the route ACB can be described as .
(2) Wi-Fi probe data network
One passenger’s Wi-Fi probe data network is generated by connecting the key Wi-Fi probe data points collected within a journey as the dashed line connected by red pentagrams marked in Fig. 3(b). The four corresponding processes of the passenger can also be treated as nodes . These nodes are respectively defined as origin node , train node , transfer node , and destination node . The spatial attributes of these nodes are defined as origin station , line number of the train , transfer station and destination station . The time attributes , , , and represent a period of time within each process. These time data are same type, but different values may be collected multiple times in each process. It is pretty important for route-estimation, in details, for positioning transfer stations or narrow the range of alternative trains. Then we further split these nodes into smaller discrete nodes which have only one time point and extract the nodes with critical time data. These nodes include the last record of the origin station , the first and last record on a train , the first and last record of a transfer station , and the first record of the destination station .
4.2. Spatio-temporal location inference
The collection frequency of passenger’s Wi-Fi probe data is 30 s when passengers coming into the detection area. It offers the possibility to address missed spatio-temporal locations attributes from the historical perspective. The n-gram model is commonly used to estimate the probability of different word sequences. It has supported a range of natural language processing applications such as speech recognition and machine translation. According to the chain rule, this n-gram model infers the next location attribute (i.e., timestamp and location) in a mobility sequence in the form of an O() Markov model. This paper proposes a mobility n-gram model for location attribute inferring. The joint probability can be factorized into the product of two conditional probabilities and the target joint probability can be rewritten as:
| (1) |
| (2) |
| (3) |
where is the joint probability of the th location and timestamp according to the last trajectory attribute for passenger . indicates the target timestamp variable. shows the number of timestamps. denotes the target location variable at the timestamp . represents the context location space variable of the n-gram. expresses the prior probability of target location according to the previous location information. indicates the context time variable of the n-gram. means the prior probability of target location’s time according to the previous time information. expresses the number of times that occurs immediately after location sequences . denotes the number of times that location sequences in historical trajectory dataset of passenger . means the number of times that occurs immediately after time sequences . indicates the number of times that time sequences .
4.3. Trajectory estimation algorithm
Given the URT network’s physical topology, Wi-Fi data network and train schedules, this method aims to estimate all the feasible STT to the passengers. Firstly, constraints at boarding nodes and alighting nodes are proposed to generate the feasible train set. Then, node–link relationships are used to narrow the range of optional trains of each line. Finally, link–link relationships are considered to form a complete feasible trajectory. The main purpose of this method is to estimate real route and the precise trains that passengers take. Thus, the entry process and exit process are simplified. It is assumed that the last time recorded at origin station and the first time recorded at destination station are equal to entry time and exit time . The detailed assumptions are as follow:
| (4) |
| (5) |
| (6) |
| (7) |
Furthermore, Eqs. (8), (9) should be held to avoid loops.
| (8) |
| (9) |
where and indicate the first time and last time of passenger on a train. and mean the first and last time of passenger on a transfer station.
(1) Boarding process
Boarding node is the tail node of access link or transfer link , and the head node of train link . Combined with Wi-Fi probe data, the boarding time must be later than or equal to the last time or on the station or and earlier than or equal to the first time on the train . In all circumstances, station constraint should be established to satisfy Eq. (10).
| (10) |
For time constraint, four circumstances are proposed.
| (11) |
| (12) |
Assuming that is the tail node of . Both and are probed. Eqs. (11), (12) should be established and , .
Assuming that is the tail node of and is lost. Then, can be used to replace . In Eq. (11), .
Assuming that is lost. Then, is used to replace and in Eq. (12).
Assuming that both and are lost, the origin and destination station are used to replace them respectively. In train timetable, is the train departure time on station , so all the trains that satisfy Eqs. (10)–(12) at node can form the train set .
(2) Alighting process
Alighting node is the tail node of train link and the head node of egress link or transfer link . Combined with Wi-Fi probe data, the alighting time must be later than or equal to the last time on the train and earlier than or equal to the first time or on the station or . Station constraint is shown in Eq. (13).
| (13) |
Time constraint could also be divided into four circumstances.
| (14) |
| (15) |
Assuming that both and are recorded, Eqs. (11), (12) are established and , .
Assuming that is lost. can be used to replace and in Eq. (14).
Assuming that is lost. is used to replace and in Eq. (15).
Assuming that both the train link and transfer link are lost in Wi-Fi probe data. and are replaced by and respectively according to Eqs. (11), (12). In train timetable, is the train arriving time on station , so all the trains that satisfy Eqs. (10)–(12) at node will form the train set .
(3) Train link process
According to the above two steps, the feasible train set and at each and are formed. The final feasible train set of each train link is named as . Then, the boarding node and alighting node of the same train link can be represented as and . For the same line, trains in the intersection of the two train sets are kept, as expressed in Eq. (16).
| (16) |
When generate a complete STT, Eq. (17) must be held to ensure the train attributes for the boarding node and alighting node of the train link are the same.
| (17) |
(4) Transfer link process
For transfer links, and represent the same station, and satisfy Eq. (15). while and should satisfy transfer time constraint Eqs. (18), (19) to avoid train set errors.
| (18) |
| (19) |
| (20) |
The main idea of passenger route generation is to estimate the location (i.e., hall, platform, train, etc.) that passengers may appear during the whole route in real time. It should consider the relationship between the time recorded by Wi-Fi probe data and the time of each node in URT space–time network. Once Wi-Fi probe data is lost in some sections, the time on origin station and time on destination station are used for replacement. Intuitively, the pseudocode of the trajectory estimation algorithm is provided in Algorithm 1.
5. Experiment and analysis
5.1. Blind experiments
So-called blind experiments [33] is used to test this proposed method. In such an experiment, “certain information which could introduce bias or otherwise skew the result is withheld from the participants, but the experimenter will be in full possession of the facts”. Two types of routes will be obtained using the proposed method: (1) the physical network route without any temporal attributes, and (2) the STT.
The blind experiment is conducted as follows: an experimenter generates a route with complete Wi-Fi probe data and thus has full knowledge of the experiment. Then the experimenter reveals partial Wi-Fi data to the observer according to different data missing rate. The main purpose of this experiment is to find the impacts of data missing rate on both route estimation and STT estimation.
The network topology used in this blind experiment is shown in Fig. 3(a). The train schedule of URT line 1 is as follows: the first train leaves Station A at 37 957 and the last at 483̇97, with a fixed headway of 360 s during this time period. The travel time from Station A to Station D is fixed at 1719 s and from Station A to Station C is 2652 s. The first train of URT line 2 leaves Station C at 38 298 and the last at 51 138, with a fixed headway of 240 s. The travel time from Station C to Station B is 1479 s and from Station E to Station C is 946 s. The train schedule of URT line 3 starts at 37 513 and ends at 50 233, with a fixed headway of 270 s. The travel time from Station D to Station E is 1070. Firstly, we generate a sequence of Poisson arrival times in the interval [37 959, 48 397] at station A. Then, a trajectory for each passenger is selected and generated randomly from route ACB and ADEB. Finally, interpolating time data into the generated trajectory with 120 s intervals to form a complete Wi-Fi data group for each passenger. On average, each trajectory is composed by 40 Wi-Fi probe data.
Within the experiment, each trajectory corresponds to one passenger. In order to derive feasible routes and trajectories, the experimenter will present all the train schedules, minimum transfer time and different sample rates to the observer. Using the method proposed above, no matter what the data missing rate is, for all the passengers, at least one feasible route/trajectory can be found. For clarity, Other detailed estimation results under different data missing rates are given in Table 3.
Table 3.
Route and trajectory estimation results under different data missing rates.
| Wi-Fi data missing rate | The percent of passengers with a unique route | The percent of passengers with a unique trajectory |
|---|---|---|
| 0.10 | 1.00 | 0.96 |
| 0.20 | 1.00 | 0.90 |
| 0.30 | 1.00 | 0.84 |
| 0.40 | 1.00 | 0.76 |
| 0.50 | 1.00 | 0.69 |
| 0.60 | 0.99 | 0.57 |
| 0.70 | 0.98 | 0.40 |
| 0.80 | 0.89 | 0.27 |
| 0.90 | 0.62 | 0.10 |
Table 3 shows that in this experiment, if the data missing rate is below 0.6, a unique route can be determined for 99% of passengers and a unique STT can be determined for 57% of passengers. Therefore, the method proposed in this paper is suitable for estimating feasible routes and trajectories unless the data missing rate is very high, e.g., 0.7. The method will be further demonstrated in a much more complex real-world network in the next section.
5.2. Case study
(1) Results of spatio-temporal location inference
This section presents a case study using the proposed spatio-temporal location inference method based on real-world URT network data. The Wi-Fi probe data is obtained from the Technology Center of Metro Co. Ltd. In addition, the Wi-Fi probe data has been eliminated outliers by referring to the approach mentioned in [34]. The dataset contains records during a period of 64 days (from November 29, 2017 to January 31, 2018). Each record contains the attributes that are listed in Table 2. We divide the dataset into two subsets. The training set consists of the records for the first 60 days from November 29, 2017 to January 27, 2018. The last 4 days’ records from January 28, 2018 to January 31, 2018 are utilized as the testing dataset. In the training model, the training samples and of each passenger can be generated from his/her history trajectories when the value of equals 1 to 3, which corresponds to the Bayes, 2-gram and 3-gram approach respectively. Secondly, the values of , , and for each passenger are calculated. Thirdly, the values of and can be captured according to Eqs. (2), (3). Finally, following the Eq. (1), the value of can be computed. In this test model, corresponding the real-time sub-trajectory input of each passenger, the inferenced spatio-temporal location and the value of can be calculated in accordance with the training model. Furthermore, these training model parameters would be updated with the generation of individual trajectory data.
Notice that accuracy refers to the frequency of the true next spatio-temporal location occurring in the list of inferred locations. Let be 1 it does and 0 otherwise, then the accuracy value can be calculated by , where indicates the number of location for each individual passenger. In order to certify that the next location would be affected by how many preceding locations, this paper studies the effect of the n-gram model by varying n from 1 to 3 (i.e., Bayes, 2-gram and 3-gram models). The example results of STT inference with the top- rule when equals 1, 3 and 5 are demonstrated in Fig. 4.
Fig. 4.
Spatio-temporal location inference results.
As can be observed from Fig. 4, the accuracy has an apparent improvement when the value of n increases from 1 to 2, and starts to decline when n equals 3. Apart from this, this 2-gram model achieves the best performance in these three models. The best accuracy values are marked in bold. It suggests that the next location may affected by the two preceding location mainly. The reason is that travel rule of individuals is pretty complex and may not travel following the historical trip always. In addition, we can find that these accuracy values improve as the value of k increases. This proposed algorithm gives a set/a sequence of locations, these top-k trajectories that best connect the given locations are retrieved from existing Wi-Fi trajectories. It is helpful to optimize the feasible scope of individual’s trajectory estimation in the next step.
(2) Results of trajectory estimation
One case consists of 7 stations from Siping Road to Xinjiangwancheng on URT line 10 and 3 stations from Xiuyan Road to Disney Resort on line 11. We treat the 7 stations on line 10 as a station group and consider passengers entering from these stations have the same route-choice behavior because Siping Road is the only station they could choose to transfer to other lines. And these 3 stations on URT line 11 are regarded as the other station group for the same reason.
In theory, all the feasible routes between these two station groups should be set as the initial feasible routes. However, this sub-network is pretty complex and exist many feasible routes between these two station groups. Actually, passengers are highly sensitive to travel time and transfer times, which means most passengers’ route choice will be concentrated within a limited route set. Taking the Wi-Fi probe data collected at a transfer station with rail transit lines A and B as an example, the probability distributions of passengers’ transfer time are shown in Fig. 5.
Fig. 5.
Probability distribution of transfer time in one transfer station.
In Fig. 5, the top-4 best fit distributions of passengers’ transfer time between line A and line B are listed. The value of average transfer time can be obtained according to the best fit distribution. Then, we treat the 8 paths as a set of feasible routes to improve the efficiency of our algorithm. In particular, train timetables and minimum transfer times using in this case study are obtained from the real-world URT system. Fig. 6. shows the spatial topology of the 8 paths in the real-world URT network of X.
Fig. 6.
Set of available routes.
As shown in Fig. 6, three typical circumstances occur within the estimation results: (1) the unique STT could be extracted; (2) the unique route could be found while two or more trajectories are possible; (3) more than one feasible route are found.
Here is one typical result about circumstance (2). The passenger whose AD_MAC is 90-3C-**-**-13-AD enters URT network at Xiuyan Road, and leaves at Guoquan Road. URT line 11, 2 and 10 have been recorded in Wi-Fi probe data, while none of the transfer stations are collected. Fig. 7(b) illustrates the detailed information collected during the journey.
Fig. 7.
Typical feasible route-choice estimation results about circumstance (2).(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
The red pentagrams marked in Fig. 7(b) represent the key Wi-Fi data we need. After route estimation, the only one route is determined that firstly transfers at Luoshan Road to line 16, then transfers to line 2 at Longyang Road. The third transfer occurs at East Nanjing Road when the passenger finally enters line 10 and arrives at Guoquan Road in Fig. 7(a). Because the information of line 16 is lost, both No. 163 train and No. 16 431 train are possible to be taken, so two feasible STT are extracted in this process as shown in Fig. 7(c).
The other typical route-choice estimation results about circumstance (3) is shown in Fig. 8.
Fig. 8.
Typical feasible route-choice estimation results about circumstance (3).
According to original Wi-Fi probe data of the passenger whose AD_MAC is D8-63-**-**-4B-7F, he/she enters URT network from Disney Resort and the last time at this station is 20:10:18. The destination station is Guoquan Road, 21:27:31 is the first time recorded at this station. During the journey, one data on No. 1140 train is recorded, so the first and last time recorded on this train is calibrated as the same time, 20:32:00. Three data rows on No. 1020 train are recorded, calibrating the first time as 21:09:44 and the last time as 21:19:39. The records from 20:32:00 to 21:09:44 are lost, which need to be estimated by our method. In this case, two feasible routes are found: (1) transfer directly to line 10 at JiaoTong University; (2) transfer to line 8 at Oriental Sport Center, then transfer to line 10 at Laoximen. Fig. 8(a) shows the route estimation results in the real-world URT network. In route 1, two trajectories are found, while in route 2, six trajectories can be found. Fig. 8(b) shows the detailed STT estimation results in each route.
After data reprocessing, totally 345 valid data groups of passenger trajectories are recorded by Wi-Fi sensors. Their origin and destination information are complete and belong to the scope of the OD groups that we study. Then, sum of route and STT estimation results, as shown in Fig. 9.
Fig. 9.
Sum of route and trajectory estimation results.
In the total data, 97 original Wi-Fi data groups can directly reflect the real route, constituting 28.12% of total. After using our method, 343 data groups output feasible route estimation results, accounting for 99.4% of input data. As shown in Fig. 9, 316 data groups receive unique feasible route, constituting 92.9% of output data. So, we can obtain the conclusion that the route and STT estimation method using Wi-Fi probe data is pretty suitable to determine the unique route-choice in complex network structures. Within the 316 data, 152 data groups acquire only one feasible STT, which means 44.6% of passengers’ travel trajectory could be accurately restored using Wi-Fi probe data and the proposed method. 117 passengers gain two feasible trajectories and the rest of passengers get more than two feasible trajectories, and further study still need to be done to estimate the most likely trajectory.
5.3. Applications
The main purpose of this study is to design a methodology to reconstruct individual passenger’s actual route and train choice under incomplete Wi-Fi probe data situation. This proposed approach can be considered as one of the first steps toward the use of Wi-Fi probe data for the SST estimation at the network level. The solution to this problem facilitates a number of applications, for instance, the route choice is essential to allocating passenger flows to different rail transit lines, which further determine how fare revenues are distributed. The estimation results can also help operators to estimate the train loads over time and space. Furthermore, real-time demand management can be conducted more refined, e.g., detailed estimation of platform usage and number of passengers left behind for safety analysis, personalized dynamic route guidance, crowding information systems, real-time passenger flow control, and disruption management. Real-time application evaluation, such as analysis of car-specific on-board accumulation for comfort assessment and flexible pricing strategy and real-time train rescheduling strategies, can be optimized according these estimated SST results. It can also provide comprehensive and effective input data for trajectory tracking [35], trajectory pattern mining [36], [37], [38] and trajectory prediction [39]. Specially, this proposed approach can significantly benefit for real-time monitoring SST of suspected COVID-19 patients at the URT network.
6. Conclusions
This paper contributes to the route estimation research by presenting a general modeling framework for estimating passenger’s STT with personalization and timeliness using the Wi-Fi probe data at the URT network level. After reviewing relevant studies about route-choice models and Wi-Fi technology, several literature gaps are identified. First, at the present stage, most of the route-choice models take AFC transaction data as model input. Since no intermediate information between origin and destination is recorded, the accuracy of the estimation is difficult to determine. Second, most of studies use Wi-Fi technology to position and calculate time parameters in a very limited area, such as a station or an airport. Third, detailed information of Wi-Fi probe data has not been described specifically. Therefore, in this study, structure and problems of Wi-Fi probe data in URT system are introduced and useful Wi-Fi data are integrated in the estimation method. Blind experiments prove that our method can find feasible routes and STT in a simple network successfully under any data missing rates. In the case study, we reconstruct the real-world URT network into a topology network and choose a more complex OD pair which consisting of 8 feasible routes. Results show that (1) for 93% of passengers, a unique physical route can be estimated and (2) for 80% of passengers, the number of feasible STT is reduced to one or two. The STT estimation is capable to support the day-to-day operations of large URT systems. The route estimation results can be useful for metro corporation revenue allocations, the train loads estimation over time and space, real-time demand management and application evaluation.
The proposed approach can be considered as a first step toward the utilization of Wi-Fi probe data for the SST estimation at the network level. This paper could thus stimulate further research in three aspects. First, combining data from more data sources, such as metro video surveillance, is helpful to improve the real-time STT estimation accuracy for individual passenger. Second, semantic SST patterns mining [40] is useful to find the most likely path and trajectory from the multiple feasible paths estimated, which is also the main point of our future study. Then, it is of great theoretical and practical significance for backtracking of COVID-19 spreading trajectory and tracking of close contacts of infected passenger in real time.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
We greatly appreciate the very constructive comments made by the anonymous reviewers, who have helped significantly improve the quality and presentation of this paper. This work is supported by the National Natural Science Foundation of China (Grant No. 61473210), the Shanghai Science and Technology Committee (Grant No. 18DZ1201404) and the Shanghai Shentong Metro Group Co., Ltd. (Grant Nos. CX-GL20R014-WT-20037, JS-KY19R024-WT-19005), the authors are grateful to these funding agencies. All authors approved the version of the manuscript to be published.
References
- 1.Shanghai Metro . 2020. Wikipedia. https://en.wikipedia.org/wiki/Shanghai_Metro Accessed on January 26, 2020. [Google Scholar]
- 2.Route & Fare Inquiry . 2020. Shanghai Metro website. http://service.shmetro.com/en/ Accessed on June 8, 2020. [Google Scholar]
- 3.Zhu Y., Koutsopoulos H.N., Wilson N.H. Inferring left behind passengers in congested metro systems from automated data. Transp. Res. Proc. 2017;23:362–379. [Google Scholar]
- 4.Ma Z., Koutsopoulos H.N., Chen Y., Wilson N.H. Estimation of denied boarding in urban rail systems: alternative formulations and comparative analysis. Transp. Res. Rec. 2019;2673(11):771–778. [Google Scholar]
- 5.Xu R.H., Luo Q., Gao P. Passenger flow distribution model and algorithm for urban rail transit network based on multi-route choice. J. China Railw. Soc. 2009;31:110–114. [Google Scholar]
- 6.Gao S.G., Zhong W.U. Modeling passenger flow distribution based on travel time of urban rail transit. J. Transp. Syst. Eng. Inf. Technol. 2011;11:124–130. [Google Scholar]
- 7.Hurk E.V.D., Kroon L., Maróti G. Deduction of passengers’ route choices from smart card data. IEEE Trans. Intell. Transp. Syst. 2015;16:430–440. [Google Scholar]
- 8.Sun L.J., Lu Y., Jin J.G. An integrated Bayesian approach for passenger flow assignment in metro networks. Transp. Res. C. 2015;52:116–131. [Google Scholar]
- 9.Ben-Akiva M.E., Gao S., Wei Z., Wen Y. A dynamic traffic assignment model for highly congested urban networks. Trans. Res. C. 2012;24:62–82. [Google Scholar]
- 10.Ma L., Fan Y., Xu Y., Cui Y. IEEE International Conference on Communications. 2017. Pedestrian dead reckoning trajectory matching method for radio map crowdsourcing building in WiFi indoor positioning system; pp. 1–6. [Google Scholar]
- 11.Zhao Z., Koutsopoulos H.N., Zhao J. Individual mobility prediction using transit smart card data. Transp. Res. C. 2018;89:19–34. [Google Scholar]
- 12.Zhou F., Xu R. Model of passenger flow assignment for urban rail transit based on entry and exit time constraints. Transp. Res. Rec. 2012;2284:57–61. [Google Scholar]
- 13.Sun Y., Schonfeld P. Schedule-based rail transit route-choice estimation using automatic fare collection data. J. Trans. Eng. 2016;142 [Google Scholar]
- 14.X. Chen, L. Zhou, J. Tang, H. Zhou, Estimating the most likely space–time route by mining automatic fare collection data, in: Presented at 97th Annual Meeting of Transportation Research Board, Washington, D. C. 2018.
- 15.Chen X., Zhou L., Yue Y., Zhou Y., Liu L. Data-driven method to estimate the maximum likelihood space–time trajectory in an urban rail transit system. Sustainability. 2018;10:1752. [Google Scholar]
- 16.Zhu Y., Koutsopoulos H.N., Wilson N.H. A probabilistic passenger-to train assignment model based on automated data. Transp. Res. B. 2017;104:522–542. [Google Scholar]
- 17.Zhang Y., Yao E.J., Zhang J.Y., Zheng K.N. Estimating metro passengers’ route choices by combining self-reported revealed preference and smart card data. Transp. Res. C. 2018;92:76–89. [Google Scholar]
- 18.Hänseler F.S., van den Heuvel J.P., Cats O., Daamen W., Hoogendoorn S.P. A passenger-pedestrian model to assess platform and train usage from automated data. Transp. Res. A. 2020;132:948–968. [Google Scholar]
- 19.Luo D., Bonnetain L., Cats O., van Lint H. Constructing spatiotemporal load profiles of transit vehicles with multiple data sources. Transp. Res. Rec. 2018;2672(8):175–186. [Google Scholar]
- 20.Tang J.J., Song Y., Miller H.J., Zhou X.S. Estimating the most likely space–time routes, dwell times and route uncertainties from vehicle trajectory data: A time geographic method. Transp. Res. C. 2016;66:176–194. [Google Scholar]
- 21.Luo L.B., Hou X.T., Cai W.T., Guo B. Incremental route inference from low-sampling GPS data: an opportunistic approach to online map matching. Inform. Sci. 2020;512:1407–1423. [Google Scholar]
- 22.Alvarez-Alvarez A., Alonso J.M., Trivino G. Human activity recognition in indoor environments by means of fusing information extracted from intensity of WiFi signal and accelerations. Inform. Sci. 2013;233(Complete):162–182. [Google Scholar]
- 23.Kotaru M., Joshi K., Bharadia D., Katti S. ACM Conference on Special Interest Group on Data Communication. ACM; 2018. Spotfi: decimeter level localization using wifi; pp. 269–282. [Google Scholar]
- 24.Lu Q., Liao X., Xu S., Zhu W. IEEE International Symposium on Personal, Indoor, and Mobile Radio Communications. 2016. A hybrid indoor positioning algorithm based on WiFi fingerprinting and pedestrian dead reckoning; pp. 1–6. [Google Scholar]
- 25.Zhuang Y., Syed Z., Georgy J., EI-Sheimy N. Autonomous smartphone-based WiFi positioning system by using access points localization and crowdsourcing. Pervasive Mob. Comput. 2015;18:118–136. [Google Scholar]
- 26.Shang P., Li R.M., Guo J.F., Xian K., Zhou X.S. Integrating Lagrangian and Eulerian observations for passenger flow state estimation in an urban rail transit network: a space–time-state hyper network-based assignment approach. Transp. Res. B. 2019;121:135–167. [Google Scholar]
- 27.A. Lesani, L.F. Miranda-Moreno, Development and testing of a real-time WiFi-Bluetooth system for pedestrian network monitoring and data extrapolation, in: Presented at 95th Annual Meeting of Transportation Research Board, Washington, D. C. 2016.
- 28.El-Tawab S., Oram R., Garcia M., Poster B.B. Vehicular Networking Conference. IEEE; 2017. Monitoring transit systems using low cost WiFi technology; pp. 1–2. [Google Scholar]
- 29.Abedi N., Bhaskar A., Chung E., Miska M. Assessment of antenna characteristic effects on pedestrian and cyclists travel-time estimation based on Bluetooth and WiFi MAC addresses. Transp. Res. C. 2015;60:124–141. [Google Scholar]
- 30.Shlayan N., Kurkcu A., Ozbay K. IEEE International Conference on Intelligent Transportation Systems. 2016. Exploring pedestrian Bluetooth and WiFi detection at public transportation terminals; pp. 229–234. [Google Scholar]
- 31.S. Li, W. Zhu, L. Guo, Characterizing travel space–time trajectory on urban rail transit network using Wi-Fi data, in: Presented at 95th Annual Meeting of Transportation Research Board, Washington, D. C. 2016.
- 32.Zeng S.K., Mu Y., Zhang H.J., He M.X. A practical and communication-efficient deniable authentication with source-hiding and its application on Wi-Fi privacy. Inform. Sci. 2020;516:331–345. [Google Scholar]
- 33.Sun Y., Xu R. Rail transit travel time reliability and estimation of passenger route choice behavior: Analysis using automatic fare collection data. Transp. Res. Rec. 2012;2275:58–67. [Google Scholar]
- 34.Gu J.J., Jiang Z.B., Fan W., Wu J.M., Chen J.J. Real-time passenger flow anomaly detection considering typical time series clustered characteristics at metro stations. J. Transp. Eng. A. 2020;146(4) [Google Scholar]
- 35.Wu T.-S., Karkoub M., Weng C.-C., Yu W.-S. Trajectory tracking for uncertainty time delayed-state self-balancing train vehicles using observer-based adaptive fuzzy control. Inform. Sci. 2015;324:1–22. [Google Scholar]
- 36.Zhu J., Huang C., Yang M., Cheong Fung G.P. Context-based prediction for road traffic state using trajectory pattern mining and recurrent convolutional neural networks. Inform. Sci. 2019;473:190–201. [Google Scholar]
- 37.Bermingham L., Lee I. Mining distinct and contiguous sequential patterns from large vehicle trajectories. Knowl.-Based Syst. 2020;189 [Google Scholar]
- 38.Lv M.Q., Chen L., Chen T., Zeng D., Cao B. Discovering individual movement patterns from cell-id trajectory data by exploiting handoff features. Inform. Sci. 2019;474:18–32. [Google Scholar]
- 39.Wang P.H., Sun F.Y., Wang D., Tao J., Guan X.H., Albert B. Predicting attributes and friends of mobile users from ap-trajectories. Inform. Sci. 2018;463–463:110–128. [Google Scholar]
- 40.Zhang D.Z., Lee K., Lee I. Semantic periodic pattern mining from spatio-temporal trajectories. Inform. Sci. 2019;502:164–189. [Google Scholar]










