CrowdTelescope: Wi-Fi-positioning-based multi-grained spatiotemporal crowd flow prediction for smart campus

Shiyu Zhang; Bangchao Deng; Dingqi Yang

doi:10.1007/s42486-022-00121-6

2022 Dec 12;5(1):31–44. doi: 10.1007/s42486-022-00121-6

CrowdTelescope: Wi-Fi-positioning-based multi-grained spatiotemporal crowd flow prediction for smart campus

Shiyu Zhang ^1,^#, Bangchao Deng ^1,^#, Dingqi Yang ^1,^✉

PMCID: PMC9742642

Abstract

Crowd flow prediction is one of the key problems in human mobility modeling, forecasting crowd flows of locations based on historical human mobility traces. Traditional human mobility traces (collected via telecommunication companies, online social media platforms, or field studies/experiments, etc.) suffer from severe data quality issues such as low precision, data sparsity, and insufficient coverage. In this paper, we investigate crowd flow prediction using Wi-Fi connection records on the campus of a university, which imply comprehensive, large-scale, high-coverage, and multi-grained (building/floor/room level) human mobility traces. However, we are facing not only non-trivial noises in the raw Wi-Fi connection data when extracting human mobility traces, but also the trade-off between location granularities and mobility patterns when modeling multi-grained crowd flow. Against this background, we propose CrowdTelescope, a Wi-Fi-positioning-based multi-grained spatiotemporal crowd flow prediction framework. We design a systematic approach for robust human mobility trace extraction from the noisy Wi-Fi connection records and adopt spatiotemporal Graph Neural Networks to model multi-grained crowd flow under a unified graph model for the three-level location hierarchy. We also develop a prototype system of CrowdTelescope, providing the interactive visualization of crowd flows on campus. We evaluate CrowdTelescope by collecting a Wi-Fi connection dataset on the campus of the University of Macau. Results show that CrowdTelescope can effectively extract informative human mobility traces from the noisy Wi-Fi connection records with an improvement of 3.3% over baselines, and also accurately predict on-campus crowd flow across different location granularities with 1.5% $-$ 24.1% improvements over baselines.

Keywords: Mobility, Crowd flow, Wi-Fi positioning, Smart campus

Introduction

Crowd flow prediction forecasts the crowd flows of locations based on historical human mobility traces (Luca et al. 2021), which can benefit both authorities and residents. For example, it can provide insight to authorities and organizations for decision-making in various aspects, such as risk assessment (Liang et al. 2021), resource management (Chen et al. 2020), predictive policing (Yang et al. 2018), etc. Meanwhile, it can also benefit residents by better scheduling their daily activities. In particular, facing the recent COVID-19 epidemic, social distancing (i.e., avoiding crowdedness) has been suggested as an effective measure by many countries worldwide (Chang et al. 2021); in this context, accurately forecasting crowd flow can significantly help the implementation of social distancing in practice (Swain et al. 2021).

Toward the goal of accurate crowd flow prediction, it is indispensable to collect human mobility traces. On one hand, Outdoor Positioning Systems (OPS) such as global positioning systems (GPS) or cell identification (CID)-based systems have been widely deployed, providing real-time positioning services to users in an outdoor environment. Due to the low positioning precision of CID and GPS in an indoor environment, existing work using OPS mobility traces usually focuses on macroscopic mobility, such as global level (Yang et al. 2016), country level (Fan et al. 2015), or urban level (Liang et al. 2021). On the other hand, Indoor Positioning Systems (IPS) such as Wi-Fi-based or Bluetooth-based position systems provide fine-grained indoor and outdoor (depending on the density of the deployed access points/hotspots) localization. Due to the implementation constraints of these Wireless Local Access Networks (WLAN) from their service providers, the scale of the publicly available mobility traces is often small, which is usually limited to hundreds of users (e.g., about 700 students in Copenhagen Networks Study (Stopczynski et al. 2014), about 200 users in Lausanne Mobile Data Challenge (Laurila et al. 2013)).

Against this background, in this study, we focus on Wi-Fi-positioning-based human mobility traces on the campus of the University of Macau. The uniqueness of on-campus Wi-Fi connection records implies comprehensive, large-scale, high-coverage, and multi-grained human mobility traces on campus. For example, on the campus of the University of Macau, over 7,000 Wi-Fi Access Points (APs) have been deployed, covering over 80% of the campus (both indoor and outdoor areas), providing Internet services to over 10,000 students and staff, as well as to guests. With the ubiquity of smartphones and wearable devices (e.g., smart watches or bracelets), individuals carrying mobile devices leave their spatiotemporal “digital footprints” when moving on campus, which are recorded by the (automatic) connection logs between the devices and Wi-Fi APs. For example, during a typical semester weekday on 1st March 2021, we observed 2,321,420 records from 28,551 devices and 7,096 APs. However, crowd flow prediction from such data sources faces the following two issues:

How to extract informative human mobility traces from noisy Wi-Fi connection records. Although Wi-Fi connection records serve as a powerful crowdsensing paradigm for mobility traces, such a data source contains two types of non-trivial noises. First, connection records from some devices (such as Wi-Fi-equipped desktops and smart home equipment) that cannot reflect human mobility need to be filtered out. Second, connection records from multiple mobile devices carried by the same user (such as smartphones, tablets, and smartwatches of the same individual) cause over-sampled mobility traces from the user, which need to be merged to alleviate the over-sampling bias. In this context, it is thus indispensable to consider these intrinsic noises and design a robust method to extract informative human mobility traces.
How to model multi-grained crowd flow from on-campus mobility traces. Wi-Fi connection records reflect the user mobility traces at different levels of granularities, usually following a three-level hierarchy of “building-floor-AP”. Modeling such mobility traces faces the trade-off between location granularity and mobility patterns, where finer-grained crowd flow usually has weaker mobility patterns and verse vice. For example, the mobility transition between APs is usually less obvious than that between buildings. On the other hand, understanding such multi-grained mobility patterns is crucial for accurate crowd flow prediction, which can serve as a “crowd telescope” to analyze crowd flows at different granularities. It is challenging to model multi-grained crowd flow with varying mobility patterns.

To address these two issues, we propose CrowdTelescope, a Wi-Fi-positioning-based multi-grained spatiotemporal crowd flow prediction framework for smart campus. To address the first issue of noisy Wi-Fi connection records, we design a robust human mobility trace extraction method, which firstly uses a heuristic-based noisy data filter to remove those devices that cannot reflect human mobility and then learns to integrate mobility traces from devices carried by the same user using cross-grained features. To address the second issue of multi-grained crowd flow modeling, we adopt spatiotemporal Graph Neural Networks (GNNs) to model multi-grained crowd flow, by formulating the location graphs of different granularities under a unified graph model considering the three-level location hierarchy (“building-floor-AP”). Finally, we develop a Web-based prototype system visualizing both historical and predicted crowd flows via an interactive map. To evaluate our CrowdTelescope, we collect an in-house Wi-Fi connection dataset on the campus of the University of Macau and perform evaluation on two tasks, i.e., human mobility trace extraction and crowd flow prediction tasks. Results show that in the human mobility trace extraction task, CrowdTelescope can effectively integrate mobility traces from devices carried by the same user, with an improvement of 3.3% over baselines; in the crowd flow prediction task, it can also make accurate predictions of crowd flow across different location granularities, yielding 1.5%–24.1% improvements over baselines without using location graphs.

Related work

Human mobility data sources

In the early stage, human mobility study mainly relies on demographic data (Ravenstein 1885), which incurs significant human effort in data collection. With the recent advance of wireless sensing and communication technologies, various positioning systems have been used to monitor and collect human mobility traces, which mainly fall into two categories, i.e., outdoor and indoor positioning systems.

First, Outdoor Positioning Systems (OPS) such as global positioning systems (GPS) or cell identification (CID)-based systems have been widely deployed, providing real-time positioning services to users in an outdoor environment. However, these systems have their intrinsic limitation on the localization precision in an indoor environment. More precisely, while CID-based positioning systems (mapping to a nearby cell tower location) have an intrinsic drawback in localization precision (about 50 m) [14], GPS (using satellite signals and trilateration) is known to have poor indoor localization precision due to the signal attenuation caused by construction materials (del Peral-Rosado et al. 2017). Subsequently, existing work using OPS mobility traces usually focuses on macroscopic mobility, such as global level (Yang et al. 2016), country level (Fan et al. 2015), or urban level (Liang et al. 2021). Moreover, these data sources often have a sparsity issue of the collected mobility traces (Yang et al. 2020) and also a low and insufficient coverage of the population in the target area, as the data is usually collected by a telecommunication service provider (CID-based positioning (Blondel et al. 2012)) or an urban transportation company (e.g., Taxi mobility traces (Yuan et al. 2010)), or is crawled on an online social network platform (e.g., Foursquare/Twitter (Yang et al. 2015, 2019, 2020)), etc.

Second, Indoor Positioning Systems (IPS) such as Wi-Fi-based or Bluetooth-based position systems provide fine-grained indoor and outdoor (depending on the density of the deployed access points/hotspots) localization. However, due to the implementation of the Wireless Local Access Networks (WLAN) and its privacy sensitivity, the scale of the publicly available mobility traces are often small, which is usually limited to hundreds of users (e.g., about 700 students in Copenhagen Networks Study (Stopczynski et al. 2014), about 200 users in Lausanne Mobile Data Challenge (Laurila et al. 2013)). To the best of our knowledge, the only work using large-scale Wi-Fi-positioning-based mobility traces in this category is from Georgia Institute of Technology (Swain et al. 2021), which involves about 40K students and 7K Wi-Fi Access Points. The scale of this data is comparable to the mobility traces at the University of Macau. However, this large-scale dataset is not publicly accessible due to privacy protection regulations. Therefore, we collect an in-house Wi-Fi connection dataset on the campus of the University of Macau to study the on-campus human mobility.

Human mobility modeling methodology

According to the problem settings (Luca et al. 2021), human mobility modeling techniques generally fall into two types of tasks (i.e., predictive and generative tasks) with two types of data representation (i.e., mobility trajectory and flow). First, predictive tasks on mobility trajectories are known as next-location prediction problems, forecasting the location of an individual based on her historical mobility traces (Wu et al. 2018; Yang et al. 2020). Second, predictive tasks on crowd flow forecast the crowd flows (the number of individuals or vehicles) of locations based on historical human mobility traces (Lin et al. 2019; Lv et al. 2014). Third, generative tasks with mobility trajectories try to generate synthetic trajectories that are similar to real-world human mobility traces in terms of statistical patterns (Liu et al. 2018; Feng et al. 2020). Finally, generative tasks with mobility flow generate synthetic flows among locations, mimicking the real-world mobility flow (Shin et al. 2020; Simini et al. 2020). These mobility modeling tasks have been widely studied to support various smart city applications, such as urban event organization (Chen et al. 2016; Yu et al. 2018), location recommendation (Yang et al. 2013; Yu et al. 2015), crowdsensing (Yu et al. 2021, 2015), and urban resource allocation (Chen et al. 2016; Wang et al. 2022), etc.

This paper studies the crowd flow prediction problem using the on-campus Wi-Fi connection records, which implies comprehensive mobility traces on campus. Accurate crowd flow prediction requires subtly capturing spatiotemporal dynamics and dependencies of crowd flow. Traditional solutions to this problem use time series prediction algorithms based on autoregression, such as AutoRegressive Integrated Moving Average (ARIMA) (Shumway et al. 2000). However, the autoregression models often ignore spatial dependencies and also fail to capture complex temporal dynamics, resulting in unsatisfied results (Luca et al. 2021). Recently, Deep Learning models have been widely used for crowd flow prediction problems. Specifically, Recurrent Neural Networks (RNNs), such as vanilla RNN (Zhang et al. 2014), Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber 1997), and Gated Recurrent Unit (GRU) (Cho et al. 2014), are designed to capture sophisticated temporal dynamics over time series and sequences. On the other hand, spatial dependencies have been modeled by applying Convolutional Neural Networks (CNNs) (Gu et al. 2018) on a crowd flow matrix (where the matrix represents the targeted geographical region and each entry in the matrix represents the flow in a spatial grid), or adopting Graph Neural Networks (GNNs) (Wu et al. 2020) on a crowd flow graph (where the graph represents the spatial connection between locations and each node represents the flow at a specific location). Recent approaches to this problem combine RNNs and CNNs/GNNs into a unified model by jointly capturing spatiotemporal dynamics and dependencies. For example, STGCN (Yu et al. 2017) combines two temporal gated convolution layers and a spatial graph convolution layer as a “sandwich” structure; STTN (Xu et al. 2020) integrates a spatial and a temporal transformers to capture dynamical directed spatial dependencies and long-range temporal dependencies, respectively; GWNET (Wu et al. 2019) designs a learnable adaptive dependency matrix to capture the hidden spatial dependencies; MTGNN (Wu et al. 2020) learns to extract uni-directed relations among multi-variate variables to capture spatial dependencies. In this paper, we explore spatiotemporal GNNs for crowd flow prediction using Wi-Fi-positioning-based human mobility traces.

CrowdTelescope

Figure 1 shows the overview of our CrowdTelescope framework. First, from the large-scale and noisy Wi-Fi connection records, we design a robust human mobility trace extraction method to obtain informative human mobility traces. Second, based on the extracted human mobility traces, we adopt spatiotemporal GNNs to model multi-grained crowd flow, by formulating the location graphs of different granularities under a unified graph model considering the three-level location hierarchy. Finally, we develop a prototype system of CrowdTelescope visualizing both historical and predicted crowd flow via an interactive map on the Web.

Fig. 1 — CrowdTelescope overview with three steps: (1) Robust human mobility trace extraction, (2) Multi-grained crowd flow modeling and (3) Prototype development

Robust human mobility trace extraction

With the ubiquity of smartphones and wearable devices, the raw Wi-Fi connection records imply comprehensive human mobility traces on campus. However, these records contain two types of non-trivial noises: (1) records from some devices (such as Wi-Fi-equipped desktops and smart home equipment) that cannot reflect human mobility, and (2) records from multiple mobile devices carried by the same user (such as smartphones, tablets, smartwatches of the same individual) which cause over-sampled mobility traces from the user. To extract informative human mobility traces from such noisy data, we design a robust human mobility trace extraction pipeline as follows.

Wi-Fi connection log preprocessing

According to the configuration of the Wi-Fi AP provider (Aruba Networks), the Wi-Fi connection records have six types of connection events between devices and APs, i.e., Authentication request, Authentication success, Deauthentication from station, Association request, Association success and Disassociation from station. A device is uniquely identified by a MAC address when accessing Wi-Fi (here we do not have any device meta data due to the privacy protection regulation). When a device wants to connect to an AP for the first time, the log traces are “Authentication request—Authentication success—Association request—Association success”; the authentication process requires a valid user account registered at the university to reach a success state for the internal Wi-Fi services. When a connected device moves from one AP to another, if the device enables fast-roaming, the log traces are “Association request—Association success”; otherwise the system record a full set of logs, the same as the device connects to the AP for the first time. Subsequently, to extract the mobility traces of devices, we keep the “Association success” events only, which is the final step of all these log traces, indicating the presence of a device at an AP at a certain timestamp.

Note that we ignore “Deauthentication from station” and “Disassociation from station” events due to the fact that these two events are often significantly lagged in the log traces. For example, when a device moves from one AP to another, the “Disassociation from station” event at the former AP is often recorded with a timestamp later than the “Association success” event at the latter AP; subsequently, the “Disassociation from station” event cannot be used to record one’s presence at an AP.

Noisy device trace filtering

Based on the “Association success” events of a device, we can extract a trajectory of the device, represented as a sequence of AP-timestamp pairs. However, records from some devices usually fail to reflect human mobility. Through our empirical analysis, we identify three types of such noisy devices.

Non-(or low-)mobile devices, such as Wi-Fi-equipped desktops, lab equipment or smart home equipment, cannot reflect human mobility. Such a device often attaches to a small number of different APs over a long period of time. We define these devices as those that have ever connected to less than 10 different APs during a week.
Publicly shared devices, such as shared handsets for campus management and security staff, do not reflect individual’s mobility traces. A shared device often connects to a large number of APs, such as a shared handset for security patrolling. We define these devices as those that have ever connected to over 500 different APs during a week.
Devices from irregular user accounts cannot reflect real human mobility traces. Specifically, the authentication process requires a valid user account. We observe a few user accounts that are associated with many mobile devices, implying that one user account is shared by many users. In this case, the device mobility traces cannot reflect a single user’s mobility traces. We empirically define irregular user accounts as those that have been used by over five different devices.

We filter out these noisy devices according to the above criteria. As shown in Fig. 2, noisy data defined above are mostly the outliers from the data distribution. Note that the above three types of noisy devices may overlap. For example, multiple publicly shared handsets for security patrolling may use the same user account for Wi-Fi authentication.

Integration of the mobility traces of the devices of the same user

Based on the filtered device mobility traces, we need to extract human mobility traces. Specifically, if a user has only one device, the device’s mobility traces represent the user’s mobility traces. However, when a user has multiple devices (such as a smartphone and a smart watch/bracelet), these devices’ traces cause over-sampled mobility traces from the users. Therefore, it is critical to overcome this bias by integrating the mobility traces of the multiple devices of the same user. To this end, we need to identify the devices of the same user. Note that the device ownership information is not always available due to various reasons. First, the authentication may not necessarily be conducted with UM registered account; devices can also access Wi-Fi via eduroam1 Wi-Fi networks or public Wi-Fi, where we cannot link these devices to any UM registered accounts. Second, user account information for accessing Wi-Fi may be protected due to privacy issues.

Against this background, we design a novel method to learn to identify whether a pair of devices belong to the same user. When two mobile devices are carried by the same user, their spatiotemporal mobility traces should be very similar (not necessarily identical due to the stochasticity of the wireless network connection). For example, from the spatial perspective, the two devices may connect to two neighboring APs, respectively; from the temporal perspective, the two devices may connect one after the other to APs. Subsequently, it is necessary to consider these aspects to tolerate such stochasticity when measuring the similarity between two devices’ trajectories. In this study, instead of manually defining a fixed spatiotemporal tolerance to accommodate such stochasticity, we define three levels of spatial granularities and four levels of temporal granularities and design a convolutional neural network to learn to classify whether two devices belong to the same user.

Figure 3 shows our defined spatiotemporal granularities for tolerating the connection stochasticity. We consider three levels of spatial granularity, i.e., building, floor, and AP, and four levels of temporal granularity, i.e., 10 mins, 5 mins, 1 min, and 1 sec. Subsequently, each device trajectory can be transformed into these 12 spatiotemporal granularities. To compute the similarity between two devices’ trajectories, we borrow ideas from Jaccard similarity between two sets. However, instead of using Jaccard similarity directly, we define the size of the intersection and the size of the union as two features. A toy example for feature extraction is shown on the top of Fig. 4 for one granularity (AP, 1 min), where we extract two features, i.e., size of intersection and size of union. Subsequently, we can extract 24 features under the 12 spatiotemporal granularities. Based on these features, we design a Convolutional Neural Network (CNN) model to learn to classify if two devices belong to the same user, as shown at the bottom of Fig. 4. Specifically, we reshape the extracted 24 features as an “image” of $4 \times 3$ (temporal granularities by spatial granularities) with 2 channels (size of the intersection and the size of the union). Afterward, the “image” is firstly fed to a convolutional layer with $n_{t}$ temporal filters of size $4 \times 1$ , and then fed to another convolutional layer with $n_{s}$ temporal filters of size $1 \times 3$ , followed by a fully connected layer to output the predicted score (the probability of the input pair of devices belonging to the same user). The key idea behind this design is to let the CNN model learn to tolerate the connection stochasticity across different spatiotemporal granularities for predicting devices belonging to the same user.

Fig. 3 — Spatiotemporal granularities tolerating the connection stochasticity

Fig. 4 — Our proposed CNN model for the classification of device pairs

The model is trained using those devices of which we have the user ownership information. For each (positive) pair of devices belonging to the same user, we randomly sample a negative pair of devices not belonging to the same user. The negative pairs are randomly sampled in each epoch during the model training process. Finally, the trained model can be used to identify the devices belonging to the same user. To extract a user’s mobility trace from her devices, we aggregate the devices’ mobility traces by taking the most active device traces within each week, to form the user’s mobility traces.

Multi-grained crowd flow prediction

Based on the extracted user mobility traces, we adopt spatiotemporal GNN models for predicting multi-grained crowd flow. Specifically, Wi-Fi connection records reflect user mobility traces at different levels of granularities, following a three-level hierarchy of “building-floor-AP”. In this context, mobility modeling faces the trade-off between location granularity and mobility patterns, where finer-grained crowd flow usually has weaker mobility patterns, and verse vice (as evidenced by our experiments later). Therefore, we train three independent spatiotemporal GNNs for the respective location granularities, under a unified graph model capturing spatial dependencies between locations. In the following, we first present crowd flow estimation from extracted mobility traces, followed by our proposed crowd flow prediction models using spatiotemporal GNNs.

Crowd flow estimation from human mobility traces

Based on the filtered and aggregated human mobility traces, we estimate the crowd flow as follows. Considering the practical use cases (e.g., the frequency of campus loop shuttle is 10–15 min), we first define the targeted temporal granularity for crowd flow prediction as 10 min and assume that a user can contribute to only one AP’s flow in this period of time. The crowd flow of one AP is then estimated as the total number of users contributing to the AP in each time slot of 10 min. However, this crowd flow estimation method has to consider the following practical issues raised from our empirical analysis of the dataset.

An extracted human mobility trace may connect to multiple APs in one time slot. In this case, we select the AP with the most connection records as the contributed AP. In contrast, if a user mobility trace has no connection in a time slot, we estimate her associated APs as follows.
When a user stays at the same place for a long time (e.g., attending a class for 45 min or sleeping in the dormitory), her devices may enter to a sleeping mode and do not have any connection records. In this case, if a user is associated with the same AP in a row, we assume the user always contributes to the AP’s flow during that time period.
If the two consecutively associated APs are different and the time interval is less than one hour (considering the campus size of 1.09 km $^{2}$ ), we assume that the user is moving from the first AP to the second one. The user thus contributes to the flow of the first/second AP in the first/second half of the time interval, respectively.
If the two consecutively associated APs are different and the time interval is greater than one hour, we assume that the user is away from the campus and thus does not contribute to any APs’ flow during that time period.

Following these heuristics, we estimate the crowd flow for each AP in each time slot. Subsequently, we can further compute the crowd flow for other location granularities. Specifically, the locations of APs follow a three-level hierarchy of “building-floor-AP”. For example, an AP of ID “E11-GF-22” is located at the building E11, on the ground floor (“GF”). Using this information, the flow in one floor/building is computed as the sum of flows of all APs located in the floor/building, respectively.

Crowd flow modeling using spatiotemporal GNNs

Crowd flow modeling requires to capture the spatiotemporal dynamics and dependencies of the input flow. To this end, we adopt spatiotemporal GNNs which have been shown as a powerful technique for solving various crowd flow modeling problems (Luca et al. 2021; Wang et al. 2021). Specifically, spatiotemporal GNNs combines a GNN model that leverages a graph structure to encode spatial dependencies and a temporal component (mostly RNN models) that learns the temporal dynamics of the flow. In the following, we first present our unified graph modeling process for the three-level hierarchy of locations, followed by the spatiotemporal GNN models.

The graph topology uniquely defines the spatial dependencies between locations in GNNs. Figure 5 shows our graph model for the three-level location hierarchy of “building-floor-AP”; the hierarchy is represented by the red dashed edges on the right panel. First, for the building level, we adopt Delaunay triangulation (Delaunay 1934), which is a widely used method for surface morphology studies in Geographic Information Systems (GIS). Specifically, it treats each building as a node (with its GPS coordinates) and connects the nodes to form a triangular irregular network, ensuring that no node lies within the interior of any of the circumcircles of the triangles in the network. Note that Delaunay triangulation is the dual graph for Voronoi diagrams (Boots et al. 2009) which is also a popular spatial tessellation method in GIS. The left panel of Fig. 5 shows the Delaunay triangulation on buildings on the maps of the University of Macau, which is an unweighted and undirected graph (all edges have the same distance of one). Second, based on the building graph, we use the “building-floor” hierarchy to construct a floor graph, following three principles: (1) the floor nodes belonging to the same building are fully connected with a distance of one; (2) the floor nodes of two neighboring buildings (on the building graph) are connected with a distance computed by traversing the building graph, which is three in this case ( $f l o o r \to b u i l d i n g \to b u i l d i n g \to f l o o r$ ), where all the hierarchy edges also have a distance of one; (3) the floor nodes of non-neighboring buildings are not connected. Finally, following the similar logic of constructing a graph from the graph of the higher hierarchy, we construct an AP graph based on the floor graph. Note that the edges in the floor graph now have either a distance of one or three; the AP nodes of neighboring floors (in the floor graph) are now connected via ( $A P \to f l o o r \to f l o o r \to A P$ ) with a distance of either three (in the case of $f l o o r \to f l o o r$ having the distance of one) or five (in the case of $f l o o r \to f l o o r$ having the distance of three).

Fig. 5 — Graph modeling with the three-level hierarchy of locations

Based on the constructed graphs, we adopt spatiotemporal GNNs to model the spatiotemporal dynamics and dependencies of crowd flows. We train three independent spatiotemporal GNNs for the respective location granularities. Figure 6 shows the spatiotemporal GNNs with the building graph as an example. Specifically, we have a flow graph for each time slot, where each node (building) in this graph is associated with the computed crowd flow as its attribute. Given a sequence of such flow graphs in the past, the crowd flow prediction tries to forecast the attribute (crowd flow) of each node in this graph in a future time slot. Note that the topological structure of this graph is the same over time, while the node attribute (crowd flow) evolves. Following this problem formulation and settings, we will experiment with a sizeable collection of state-of-the-art spatiotemporal GNN models (see our experiments below) and adopt the best-performing one in our CrowdTelescope. Note that CrowdTelescope as a general framework can flexibly integrate with any spatiotemporal GNN models.

Fig. 6 — Crowd flow prediction using spatiotemporal GNNs. We use the building graph as a toy example

Prototype development

We develop a prototype system “CrowdTelescope” as a smart campus application. The prototype system is built with an interactive user interface visualizing both historical and forecasted crowd flows on campus, providing decision support to a wide range of users, including the campus management team, students, staff and visiting guests, etc. Specifically, Fig. 7 shows a snapshot of our Web user interface built on top of Mapbox.2 We use heat maps to visualize the crowd flow, where the hotspots can be easily identified by their color. A user-friendly interactive visualization interface is provided through a few control options. Users can switch between historical and forecasted crowd flow. For the historical crowd flow, users can specify a date and click the start/pause button to visualize the crowd flow of the selected date as a video. The progress bar also serves as an option to flexibly control (by sliding on the progress bar) the time of the crowd flow that users want to visualize. For the forecasted crowd flow, users can visualize the predicted crowd flow for the current day, through the same control penal as for the historical crowd flow.

Fig. 7 — The Web user interface of CrowdTelescope prototype https://pursue1221.github.io/CrowdTelescope/. The screenshot shows the visualization of the historical crowd flow on 01 Mar. 2021. We observe active crowd flow transition from residential colleges to central teaching buildings right before 10 am

Experiment

We evaluate our CrowdTelescope using a Wi-Fi connection record dataset on two tasks, i.e., human mobility trace extraction and crowd flow prediction. We present the experiment setup below, followed by the results and discussion.

Experiment setup

Dataset

We collect Wi-Fi connection records on the campus of the University of Macau for four consecutive weeks in March 2021. Table 1 shows the statistics of the dataset across different data processing steps. From the raw data, we first filter device traces using the criteria discussed in Sect. 3.1.2. We observe that 63% of the devices and 53% of the connection records are removed, which implies the raw data contains a large amount of noisy data, including non-(or low-)mobile devices, publicly shared devices, and devices from irregular user accounts. Afterward, we extract human mobility traces from the filtered device traces by identifying and integrating the mobility traces of the devices of the same user, using our proposed method in Sect. 3.1.3. We observe that in the final integrated user traces, the number of devices is slightly higher than the number of users, which implies that for a few users, the most active devices are different across different weeks. In other words, most of the users have a unique active device across the four weeks of the data collection period.

Table 1.

Dataset statistics in different data processing steps

Data processing steps	Raw data	Device traces (Sect. 3.1.2)	User traces (Sect. 3.1.3)
#Device	48,565	18,136	13,621
#User	29,743	13,593	13,593
#Record	52,174,535	24,266,224	21,027,546
#Record per device	1,074	1,338	1,544
#Device per user	1.63	1.33	1.00

Open in a new tab

Evaluation protocol and baselines

We evaluate our CrowdTelescope in both human mobility trace extraction and crowd flow prediction tasks. We present our evaluation protocol and baselines for each task below.

For the human mobility trace extraction task, the key problem is formulated as a classification task to classify whether two devices belong to the same user, as discussed in Sect. 3.1.3. To evaluate our proposed method, we first consider a Single-Grained Feature, i.e., Jaccard similarity between the mobility traces of two devices under a single spatiotemporal granularity, and learn a threshold using a decision tree algorithm for classification. In addition, based on our Cross-Grained Features, i.e., 24 features with intersection and union features on each of the 12 spatiotemporal granularities as shown in Fig. 4, we consider several popular classification techniques as baselines, including Multi-Layer Perceptron, Logistic Regression, Naive Bayes, Decision Tree and Random Forest. For our proposed method CrowdTelescope (CNNs), we set the numbers of temporal and spatial filters as $n_{t} = 16$ and $n_{s} = 64$ , respectively. To evaluate the classification performance, we collect a set of positive device pairs (belonging to the same user), and randomly sample the same amount of negative device pairs (not belonging to the same user). We split them into 80% training and 20% test datasets, with a balanced amount of positive and negative data in both. We report the accuracy for each method averaged over 10 repeated trails (randomly sample negative samples in each trail).

For the crowd flow prediction task, we follow the problem setting as specified in Fig. 6, and consider the following baselines. First, traditional time series prediction methods include Historical Average (HA), AutoRegressive Integrated Moving Average (ARIMA) (Shumway et al. 2000), Support Vector Regression (SVR) (Platt 1999). Second, deep sequence models include Recurrent Neural Networks (RNN) (Zhang et al. 2014), Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber 1997), and Gated Recurrent Unit (GRU) (Cho et al. 2014). For these two types of baselines, the flow of each node (building, floor or AP) is considered as an independent time series, and prediction is made only based on these time series without using the graph structure. Finally, we consider the following spatiotemporal GNNs which can all be integrated in our CrowdTelescope: DCRNN (Li et al. 2017) capturing the spatial dependency using bidirectional random walks on the graph and the temporal dependency using an encoder-decoder architecture; STGCN (Yu et al. 2017) combining two temporal gated convolution layers and a spatial graph convolution layer as a “sandwich” structure; STTN (Xu et al. 2020) combining a spatial and a temporal transformers to capture dynamical directed spatial dependencies and long-range temporal dependencies, respectively; HGCN (Guo et al. 2021) considering the hierarchical structure of location networks; GWNET (Wu et al. 2019) using a learnable adaptive dependency matrix to capture the hidden spatial dependencies; MTGNN (Wu et al. 2020) learning to extract uni-directed relations among multi-variate variables through a graph learning process for better capturing spatial dependencies. For each level of location granularities (building/floor/AP), we train each method using the first three week data and evaluate on the last week’s data. This experiment uses the algorithm implementation from LibCity (Wang et al. 2021).

Performance on human mobility trace extraction

Table 2 shows the results comparing different features and methods for device pair classification.

Table 2.

Accuracy of identifying devices of the same user. Best performing results in each category of the methods are highlighted in bold

Method		Accuracy
Single Grained Feature	Building, 10 mins	0.8811
	Floor, 10 mins	0.8889
	AP, 10 mins	0.8651
	Building, 5 mins	0.8844
	Floor, 5 mins	0.8901
	AP, 5 mins	0.8668
	Building, 1 min	0.8807
	Floor, 1 min	0.8847
	AP, 1 min	0.8764
	Building, 1 sec	0.8055
	Floor, 1 sec	0.8007
	AP, 1 sec	0.7671
Cross Grained Features	Multi-Layer Perceptron	0.8884
	Logistic Regression	0.8895
	Naive Bayes	0.8756
	Decision Tree	0.8591
	Random Forest	0.9019
	CrowdTelescope (CNNs)	0.9037

Open in a new tab

First, comparing single-grained features across different spatiotemporal granularities, we observe the varying performance. In particular, the finest spatiotemporal granularity (AP, 1 sec) yields the worst performance, failing to accommodate the network connection stochasticity when identifying devices from the same user; under this granularity, even two devices always carried by the same user will not have similar traces due to connection stochasticity. When moving to coarser spatiotemporal granularities, the performance increases, while the best performing granularity is (Floor, 5mins). When further coarsening the granularity, the performance slightly drops, due to a high false positive rate; the devices of different users will have similar traces under the coarser spatiotemporal granularities.

Second, compared to single-grained features, we observe the cross-grained features achieve better performance on average with an improvement of 3.3% (average accuracy of 0.8864 and 0.8576 for cross-grained and single-grained features, respectively). This implies that cross-grained features are more informative than single-grained features for predicting devices belonging to the same user. Furthermore, compared to baseline classification techniques, our proposed method achieves the best performance, showing the superiority of our designed CNN architecture when learning to tolerate the connection stochasticity across different spatiotemporal granularities.

Performance on crowd flow prediction

To evaluate the crowd flow prediction performance, we report three metrics: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Weighted Mean Absolute Percentage Error (WMAPE). Smaller values of these metrics imply better performance. The greater difference between MAE and RMSE, the greater the variance in the individual errors in the test set. Compared to MAE and RMSE, WMAPE discounts the absolute values of flow (e.g., different scales of flow values in building, floor and AP levels) and is robust against varying flows with very small values (e.g., zero flow values of some APs in some time slots); it can thus support the performance comparison across different levels of location granularities. Table 3 shows the results.

Table 3.

Crowd flow prediction performance. Best performing results are highlighted in bold for each metric. Note that “-” denotes the case where the method run out of GPU memory on our test PC with NVIDIA GeForce RTX 3090 of 24GB RAM

Method	Building			Floor			AP
Method	MAE	RMSE	WMAPE	MAE	RMSE	WMAPE	MAE	RMSE	WMAPE
ARIMA	10.585	18.173	0.253	2.920	4.909	0.677	0.483	1.135	1.499
HA	13.387	29.422	0.162	3.677	9.568	0.319	0.704	1.766	1.130
SVR	11.149	28.394	0.135	2.569	8.286	0.223	0.385	1.246	0.617
RNN	16.236	33.492	0.197	4.231	11.010	0.368	0.555	1.817	0.894
GRU	15.286	31.598	0.185	4.012	10.618	0.349	0.556	1.833	0.895
LSTM	14.941	31.464	0.181	3.858	10.380	0.336	0.556	1.826	0.895
DCRNN	8.800	19.164	0.107	2.502	7.849	0.218	–	–	–
STGCN	8.246	18.137	0.100	2.489	6.447	0.217	0.406	1.230	0.653
STTN	11.421	25.721	0.139	3.292	8.687	0.287	–	–	–
HGCN	8.876	16.876	0.109	2.414	5.466	0.212	0.317	1.096	0.515
GWNET	7.870	17.877	0.095	2.256	6.474	0.196	0.321	1.225	0.517
MTGNN	7.722	15.481	0.094	2.228	5.967	0.194	0.314	1.178	0.505

Open in a new tab

First, we observe that spatiotemporal GNNs achieve significantly better performance in general, compared to traditional time series prediction techniques and deep sequence models. This implies that our graph model captures crucial information on the spatial dependencies of locations of different granularities, which can significantly improve the crowd flow prediction performance. In particular, the best spatiotemporal GNN model, i.e., MTGNN, yields an improvement of 24.1%, 1.5% and 10.9% (on building, floor and AP levels, respectively) over the best-performing baselines without using location graphs. We thus adopt it in our prototype system.

Second, comparing the crowd flow performance across different levels of location granularities, we observe that finer-grained crowd flow usually has weaker mobility patterns. Specifically, comparing WMAPE of each method across different location granularities, we observe that finer granularities have a larger value of WMAPE, which is consistent for all methods. This implies that the finer-grained crowd flow shows weaker patterns and thus is more difficult to model. Note that MAE and RMSE are smaller for finer-grained locations, which is due to the smaller absolute values for flows of finer-grained locations; they are thus not appropriate for the performance comparison across different location granularities.

Discussion

Although we show that CrowdTelescope can achieve accurate crowd flow prediction, there are still inevitable data biases due to the data collection and preprocessing, causing the discrepancy between the mobility observed from Wi-Fi connection records and the actual mobility on campus. We discuss two major data biases below.

The coverage of Wi-Fi connection records over the actual population on campus. The population of Wi-Fi users may not cover the actual population on campus, such as some users who prefer using cellular networks rather than Wi-Fi. However, according to the official statistics of the University of Macau3,4 there are 13,787 staff and students in 2021, which is close to the number of users processed by the User Traces in Table 2. We thus believe that the Wi-Fi connection records can well represent the human mobility on the whole campus. Note that the raw data include much more user accounts due to the fact that the accounts of the same user for internal Wi-Fi and eduroam are different, leading to the almost doubled number of users compared to the number of actual users. By extracting device traces, the number of users is already reduced by half, because students and staff mostly prefer internal Wi-Fi instead of eduroam, while the latter is mostly for guests from other educational institutions.
The bias of integrating the mobility traces of the devices of the same user. When integrating the mobility traces of the devices of the same user, we may have both false positives and false negatives. For example, if two devices of two classmates are together quite often, they may be treated as the same user; if a user has two devices that are carried alternatively, they may be treated as different users. However, as CrowdTelescope can achieve over 90% accuracy in classifying device pairs, we believe the integrated mobility traces are informative to represent the overall on-campus mobility.

Conclusion

In this paper, we propose CrowdTelescope, a Wi-Fi-positioning-based multi-grained spatiotemporal crowd flow prediction framework for smart campus. Specifically, crowd flow prediction using Wi-Fi connection records faces not only non-trivial noises in the raw connection records, but also the trade-off between location granularities and mobility patterns. To address the first issue, we design a robust human mobility trace extraction method, which firstly uses a heuristic-based noisy data filter to remove those devices that cannot reflect human mobility and then learns to integrate mobility traces from devices carried by the same user using cross-grained features. To address the second issue, we adopt spatiotemporal Graph Neural Networks (GNNs) to model multi-grained crowd flow, by formulating the location graphs of different granularities under a unified graph model considering the three-level location hierarchy (“building-floor-AP”). We also develop a prototype system of CrowdTelescope, providing the interactive visualization of crowd flows on campus. We evaluate CrowdTelescope by collecting a Wi-Fi connection dataset on the campus of the University of Macau. Results show that CrowdTelescope cannot only effectively extract informative human mobility traces from the noisy Wi-Fi connection records (outperforming baselines by 3.3%), but also accurately predict on-campus crowd flow across different location granularities (yielding 1.5%–24.1% improvements over baselines).

In the future, we plan to further investigate unified spatiotemporal GNNs to directly learn from the hierarchical location graphs, jointly modeling crowd flows across different location granularities.

Acknowledgements

This work is funded by the University of Macau (MYRG2022-00048-IOTSC) and the Science and Technology Development Fund, Macau SAR (0038/2021/AGJ and SKL-IOTSC(UM)-2021-2023). This work was performed in part at SICC which is supported by SKL-IOTSC, University of Macau. The authors also appreciate the data support from the Information and Communication Technology Office (ICTO) at the University of Macau.

Biographies

Shiyu Zhang

received her Bachelor of Engineering degree in Software Engineering from Xidian University, China, in 2021. She is currently a master student with the State Key Laboratory of Internet of Things for Smart City and majoring in Artificial Intelligence Application, University of Macau, Macao, China. Her research interests lie in Urban Computing and Spatiotemporal Data Mining. graphic file with name 42486_2022_121_Figa_HTML.jpg

Bangchao Deng

Bangchao Deng received his Bachelor of Engineering degree in Computer Science and Technology from Nanjing University of Aeronautics and Astronautics, China, in 2019. He is currently a master student with the State Key Laboratory of Internet of Things for Smart City and Department of Computer and Information Science, University of Macau, Macao, China. His research interests lie in Spatiotemporal Data Mining and Urban Computing. graphic file with name 42486_2022_121_Figb_HTML.jpg

Dingqi Yang

is an Associate Professor with the State Key Laboratory of Internet of Things for Smart City and Department of Computer and Information Science, University of Macau. He received his Ph.D. degree in Computer Science from Pierre and Marie Curie University and Institut Mines-TELECOM/TELECOM SudParis in France, where he won both the CNRS SAMOVAR Doctorate Award and the Press Mention in 2015. Before joining the University of Macau, he worked as a senior researcher at the University of Fribourg in Switzerland. His research interests include big data analytics, ubiquitous computing, and smart city. graphic file with name 42486_2022_121_Figc_HTML.jpg

Declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Footnotes

https://eduroam.org/.

https://www.mapbox.com/.

https://reg.um.edu.mo/qfacts/y2021/staff/.

⁴

https://reg.um.edu.mo/about-reg/facts-and-figures/students-figures/.

S. Zhang and B. Deng have contributed equally to this work.

Contributor Information

Shiyu Zhang, Email: mc15428@um.edu.mo.

Bangchao Deng, Email: mc14969@um.edu.mo.

Dingqi Yang, Email: dingqiyang@um.edu.mo.

References

Blondel, V.D., Esch, M., Chan, C., Clérot, F., Deville, P., Huens, E., Morlot, F., Smoreda, Z., Ziemlicki, C.: Data for development: the d4d challenge on mobile phone data. arXiv preprint arXiv:1210.0137 (2012)
Boots, B., Sugihara, K., Chiu, S.N., Okabe, A.: Spatial tessellations: concepts and applications of voronoi diagrams (2009)
Chang S, Pierson E, Koh PW, Gerardin J, Redbird B, Grusky D, Leskovec J. Mobility network models of covid-19 explain inequities and inform reopening. Nature. 2021;589(7840):82–87. doi: 10.1038/s41586-020-2923-3. [DOI] [PubMed] [Google Scholar]
Chen, L., Zhang, D., Wang, L., Yang, D., Ma, X., Li, S., Wu, Z., Pan, G., Nguyen, T.-M.-T., Jakubowicz, J.: Dynamic cluster-based over-demand prediction in bike sharing systems. In: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 841–852 (2016)
Chen L, Jakubowicz J, Yang D, Zhang D, Pan G. Fine-grained urban event detection and characterization based on tensor cofactorization. IEEE Transactions on Human-Machine Systems. 2016;47(3):380–391. doi: 10.1109/THMS.2016.2596103. [DOI] [Google Scholar]
Chen L, Yang D, Nogueira M, Wang C, Zhang D. Data-driven c-ran optimization exploiting traffic and mobility dynamics of mobile users. IEEE Trans. Mob. Comput. 2020;20(5):1773–1788. doi: 10.1109/TMC.2020.2971470. [DOI] [Google Scholar]
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078 (2014)
del Peral-Rosado JA, Raulefs R, López-Salcedo JA, Seco-Granados G. Survey of cellular mobile radio localization methods: from 1g to 5g. IEEE Commun. Surv. Tutor. 2017;20(2):1124–1148. doi: 10.1109/COMST.2017.2785181. [DOI] [Google Scholar]
Delaunay, B., : Sur la sphere vide. Izv. Akad. Nauk SSSR, Otdelenie Matematicheskii i Estestvennyka Nauk 7(793-800), 1–2 (1934)
Fan, Z., Song, X., Shibasaki, R., Adachi, R.: Citymomentum: an online approach for crowd behavior prediction at a citywide level. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 559–569 (2015)
Feng, J., Yang, Z., Xu, F., Yu, H., Wang, M., Li, Y.: Learning to simulate human mobility. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3426–3433 (2020)
Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J. Recent advances in convolutional neural networks. Pattern Recogn. 2018;77:354–377. doi: 10.1016/j.patcog.2017.10.013. [DOI] [Google Scholar]
Guo, K., Hu, Y., Sun, Y., Qian, S., Gao, J., Yin, B.: Hierarchical graph convolution network for traffic forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 151–159 (2021)
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–1780. doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]
Laurila JK, Gatica-Perez D, Aad I, Blom J, Bornet O, Do TMT, Dousse O, Eberle J, Miettinen M. From big smartphone data to worldwide research: the mobile data challenge. Pervasive Mob. Comput. 2013;9(6):752–771. doi: 10.1016/j.pmcj.2013.07.014. [DOI] [Google Scholar]
Li, Y., Yu, R., Shahabi, C., Liu, Y.: Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926 (2017)
Liang, Y., Ouyang, K., Sun, J., Wang, Y., Zhang, J., Zheng, Y., Rosenblum, D., Zimmermann, R.: Fine-grained urban flow prediction. In: Proceedings of the Web Conference 2021, pp. 1833–1845 (2021)
Lin, Z., Feng, J., Lu, Z., Li, Y., Jin, D.: Deepstn+: Context-aware spatial-temporal neural network for crowd flow prediction in metropolis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 1020–1027 (2019)
Liu, X., Chen, H., Andris, C.: trajgans: Using generative adversarial networks for geo-privacy protection of trajectory data (vision paper). In: Location Privacy and Security Workshop, pp. 1–7 (2018)
Luca M, Barlacchi G, Lepri B, Pappalardo L. A survey on deep learning for human mobility. ACM Comput. Surv. (CSUR) 2021;55(1):1–44. doi: 10.1145/3485125. [DOI] [Google Scholar]
Lv Y, Duan Y, Kang W, Li Z, Wang F-Y. Traffic flow prediction with big data: a deep learning approach. IEEE Trans. Intell. Transp. Syst. 2014;16(2):865–873. [Google Scholar]
Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers. 1999;10(3):61–74. [Google Scholar]
Ravenstein EG. The laws of migration. J. Stat. Soc. Lond. 1885;48(2):167–235. doi: 10.2307/2979181. [DOI] [Google Scholar]
Shin, S., Jeon, H., Cho, C., Yoon, S., Kim, T.: User mobility synthesis based on generative adversarial networks: A survey. In: 2020 22nd International Conference on Advanced Communication Technology (ICACT), pp. 94–103 (2020). IEEE
Shumway, R.H., Stoffer, D.S., Stoffer, D.S.: Time Series Analysis and Its Applications vol. 3. Springer, ??? (2000)
Simini, F., Barlacchi, G., Luca, M., Pappalardo, L.: Deep gravity: enhancing mobility flows generation with deep neural networks and geographic information. arXiv preprint arXiv:2012.00489 (2020)
Stopczynski A, Sekara V, Sapiezynski P, Cuttone A, Madsen MM, Larsen JE, Lehmann S. Measuring large-scale social networks with high resolution. PLoS One. 2014;9(4):95978. doi: 10.1371/journal.pone.0095978. [DOI] [PMC free article] [PubMed] [Google Scholar]
Swain, V.D., Xie, J., Madan, M., Sargolzaei, S., Cai, J., De Choudhury, M., Abowd, G.D., Steimle, L.N., Prakash, B.A.: WiFi mobility models for COVID-19 enable less burdensome and more localized interventions for university campuses. medRxiv (2021)
Wang, L., Chai, D., Liu, X., Chen, L., Chen, K.: Exploring the generalizability of spatio-temporal traffic prediction: meta-modeling and an analytic framework. IEEE Transactions on Knowledge and Data Engineering (2021)
Wang, J., Jiang, J., Jiang, W., Li, C., Zhao, W.X.: Libcity: An open library for traffic prediction. In: Proceedings of the 29th International Conference on Advances in Geographic Information Systems. SIGSPATIAL ’21, pp. 145–148. Association for Computing Machinery, New York, NY, USA (2021). 10.1145/3474717.3483923
Wang L, Yu Z, Guo B, Yang D, Ma L, Liu Z, Xiong F. Data-driven targeted advertising recommendation system for outdoor billboard. ACM Transactions on Intelligent Systems and Technology (TIST) 2022;13(2):1–23. [Google Scholar]
Wu, Z., Pan, S., Long, G., Jiang, J., Chang, X., Zhang, C.: Connecting the dots: Multivariate time series forecasting with graph neural networks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 753–763 (2020)
Wu, Z., Pan, S., Long, G., Jiang, J., Zhang, C.: Graph wavenet for deep spatial-temporal graph modeling. arXiv preprint arXiv:1906.00121 (2019)
Wu R, Luo G, Shao J, Tian L, Peng C. Location prediction on trajectory data: A review. Big data mining and analytics. 2018;1(2):108–127. doi: 10.26599/BDMA.2018.9020010. [DOI] [Google Scholar]
Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SY. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems. 2020;32(1):4–24. doi: 10.1109/TNNLS.2020.2978386. [DOI] [PubMed] [Google Scholar]
Xu, M., Dai, W., Liu, C., Gao, X., Lin, W., Qi, G.-J., Xiong, H.: Spatial-temporal transformer networks for traffic flow forecasting. arXiv preprint arXiv:2001.02908 (2020)
Yang, D., Fankhauser, B., Rosso, P., Cudre-Mauroux, P.: Location prediction over sparse user mobility traces using rnns. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pp. 2184–2190 (2020)
Yang, D., Qu, B., Yang, J., Cudré-Mauroux, P.: Lbsn2vec++: Heterogeneous hypergraph embedding for location-based social networks. IEEE Transactions on Knowledge and Data Engineering (2020)
Yang, D., Qu, B., Yang, J., Cudre-Mauroux, P.: Revisiting user mobility and social relationships in lbsns: a hypergraph embedding approach. In: The World Wide Web Conference, pp. 2147–2157 (2019)
Yang, D., Zhang, D., Yu, Z., Wang, Z.: A sentiment-enhanced personalized location recommendation system. In: Proceedings of the 24th ACM Conference on Hypertext and Social Media, pp. 119–128 (2013)
Yang D, Zhang D, Chen L, Qu B. Nationtelescope: Monitoring and visualizing large-scale collective behavior in lbsns. J. Netw. Comput. Appl. 2015;55:170–180. doi: 10.1016/j.jnca.2015.05.010. [DOI] [Google Scholar]
Yang D, Zhang D, Qu B. Participatory cultural mapping based on collective behavior data in location-based social networks. ACM Trans. Intell. Syst. Technol. (TIST) 2016;7(3):1–23. doi: 10.1145/2814575. [DOI] [Google Scholar]
Yang D, Heaney T, Tonon A, Wang L, Cudré-Mauroux P. Crimetelescope: crime hotspot prediction based on urban and social media data fusion. World Wide Web. 2018;21(5):1323–1347. doi: 10.1007/s11280-017-0515-4. [DOI] [Google Scholar]
Yu, Z., Ma, H., Guo, B., Yang, Z.: Crowdsensing 2.0. Communications of the ACM 64(11), 76–80 (2021)
Yu, B., Yin, H., Zhu, Z.: Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875 (2017)
Yu Z, Xu H, Yang Z, Guo B. Personalized travel package with multi-point-of-interest recommendation based on crowdsourced user footprints. IEEE Transactions on Human-Machine Systems. 2015;46(1):151–158. doi: 10.1109/THMS.2015.2446953. [DOI] [Google Scholar]
Yu Z, Zhang D, Yu Z, Yang D. Participant selection for offline event marketing leveraging location-based social networks. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2015;45(6):853–864. doi: 10.1109/TSMC.2014.2383993. [DOI] [Google Scholar]
Yu Z, Yi F, Lv Q, Guo B. Identifying on-site users for social events: Mobility, content, and social relationship. IEEE Trans. Mob. Comput. 2018;17(9):2055–2068. doi: 10.1109/TMC.2018.2794981. [DOI] [Google Scholar]
Yuan, J., Zheng, Y., Zhang, C., Xie, W., Xie, X., Sun, G., Huang, Y.: T-drive: driving directions based on taxi trajectories. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 99–108 (2010)
Zhang, Y., Dai, H., Xu, C., Feng, J., Wang, T., Bian, J., Wang, B., Liu, T.-Y.: Sequential click prediction for sponsored search with recurrent neural networks. In: AAAI (2014)

[CR1] Blondel, V.D., Esch, M., Chan, C., Clérot, F., Deville, P., Huens, E., Morlot, F., Smoreda, Z., Ziemlicki, C.: Data for development: the d4d challenge on mobile phone data. arXiv preprint arXiv:1210.0137 (2012)

[CR2] Boots, B., Sugihara, K., Chiu, S.N., Okabe, A.: Spatial tessellations: concepts and applications of voronoi diagrams (2009)

[CR3] Chang S, Pierson E, Koh PW, Gerardin J, Redbird B, Grusky D, Leskovec J. Mobility network models of covid-19 explain inequities and inform reopening. Nature. 2021;589(7840):82–87. doi: 10.1038/s41586-020-2923-3. [DOI] [PubMed] [Google Scholar]

[CR4] Chen, L., Zhang, D., Wang, L., Yang, D., Ma, X., Li, S., Wu, Z., Pan, G., Nguyen, T.-M.-T., Jakubowicz, J.: Dynamic cluster-based over-demand prediction in bike sharing systems. In: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 841–852 (2016)

[CR5] Chen L, Jakubowicz J, Yang D, Zhang D, Pan G. Fine-grained urban event detection and characterization based on tensor cofactorization. IEEE Transactions on Human-Machine Systems. 2016;47(3):380–391. doi: 10.1109/THMS.2016.2596103. [DOI] [Google Scholar]

[CR6] Chen L, Yang D, Nogueira M, Wang C, Zhang D. Data-driven c-ran optimization exploiting traffic and mobility dynamics of mobile users. IEEE Trans. Mob. Comput. 2020;20(5):1773–1788. doi: 10.1109/TMC.2020.2971470. [DOI] [Google Scholar]

[CR7] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078 (2014)

[CR8] del Peral-Rosado JA, Raulefs R, López-Salcedo JA, Seco-Granados G. Survey of cellular mobile radio localization methods: from 1g to 5g. IEEE Commun. Surv. Tutor. 2017;20(2):1124–1148. doi: 10.1109/COMST.2017.2785181. [DOI] [Google Scholar]

[CR9] Delaunay, B., : Sur la sphere vide. Izv. Akad. Nauk SSSR, Otdelenie Matematicheskii i Estestvennyka Nauk 7(793-800), 1–2 (1934)

[CR10] Fan, Z., Song, X., Shibasaki, R., Adachi, R.: Citymomentum: an online approach for crowd behavior prediction at a citywide level. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 559–569 (2015)

[CR11] Feng, J., Yang, Z., Xu, F., Yu, H., Wang, M., Li, Y.: Learning to simulate human mobility. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3426–3433 (2020)

[CR12] Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J. Recent advances in convolutional neural networks. Pattern Recogn. 2018;77:354–377. doi: 10.1016/j.patcog.2017.10.013. [DOI] [Google Scholar]

[CR13] Guo, K., Hu, Y., Sun, Y., Qian, S., Gao, J., Yin, B.: Hierarchical graph convolution network for traffic forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 151–159 (2021)

[CR14] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–1780. doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]

[CR15] Laurila JK, Gatica-Perez D, Aad I, Blom J, Bornet O, Do TMT, Dousse O, Eberle J, Miettinen M. From big smartphone data to worldwide research: the mobile data challenge. Pervasive Mob. Comput. 2013;9(6):752–771. doi: 10.1016/j.pmcj.2013.07.014. [DOI] [Google Scholar]

[CR16] Li, Y., Yu, R., Shahabi, C., Liu, Y.: Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926 (2017)

[CR17] Liang, Y., Ouyang, K., Sun, J., Wang, Y., Zhang, J., Zheng, Y., Rosenblum, D., Zimmermann, R.: Fine-grained urban flow prediction. In: Proceedings of the Web Conference 2021, pp. 1833–1845 (2021)

[CR18] Lin, Z., Feng, J., Lu, Z., Li, Y., Jin, D.: Deepstn+: Context-aware spatial-temporal neural network for crowd flow prediction in metropolis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 1020–1027 (2019)

[CR19] Liu, X., Chen, H., Andris, C.: trajgans: Using generative adversarial networks for geo-privacy protection of trajectory data (vision paper). In: Location Privacy and Security Workshop, pp. 1–7 (2018)

[CR20] Luca M, Barlacchi G, Lepri B, Pappalardo L. A survey on deep learning for human mobility. ACM Comput. Surv. (CSUR) 2021;55(1):1–44. doi: 10.1145/3485125. [DOI] [Google Scholar]

[CR21] Lv Y, Duan Y, Kang W, Li Z, Wang F-Y. Traffic flow prediction with big data: a deep learning approach. IEEE Trans. Intell. Transp. Syst. 2014;16(2):865–873. [Google Scholar]

[CR22] Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers. 1999;10(3):61–74. [Google Scholar]

[CR23] Ravenstein EG. The laws of migration. J. Stat. Soc. Lond. 1885;48(2):167–235. doi: 10.2307/2979181. [DOI] [Google Scholar]

[CR24] Shin, S., Jeon, H., Cho, C., Yoon, S., Kim, T.: User mobility synthesis based on generative adversarial networks: A survey. In: 2020 22nd International Conference on Advanced Communication Technology (ICACT), pp. 94–103 (2020). IEEE

[CR25] Shumway, R.H., Stoffer, D.S., Stoffer, D.S.: Time Series Analysis and Its Applications vol. 3. Springer, ??? (2000)

[CR26] Simini, F., Barlacchi, G., Luca, M., Pappalardo, L.: Deep gravity: enhancing mobility flows generation with deep neural networks and geographic information. arXiv preprint arXiv:2012.00489 (2020)

[CR27] Stopczynski A, Sekara V, Sapiezynski P, Cuttone A, Madsen MM, Larsen JE, Lehmann S. Measuring large-scale social networks with high resolution. PLoS One. 2014;9(4):95978. doi: 10.1371/journal.pone.0095978. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] Swain, V.D., Xie, J., Madan, M., Sargolzaei, S., Cai, J., De Choudhury, M., Abowd, G.D., Steimle, L.N., Prakash, B.A.: WiFi mobility models for COVID-19 enable less burdensome and more localized interventions for university campuses. medRxiv (2021)

[CR29] Wang, L., Chai, D., Liu, X., Chen, L., Chen, K.: Exploring the generalizability of spatio-temporal traffic prediction: meta-modeling and an analytic framework. IEEE Transactions on Knowledge and Data Engineering (2021)

[CR30] Wang, J., Jiang, J., Jiang, W., Li, C., Zhao, W.X.: Libcity: An open library for traffic prediction. In: Proceedings of the 29th International Conference on Advances in Geographic Information Systems. SIGSPATIAL ’21, pp. 145–148. Association for Computing Machinery, New York, NY, USA (2021). 10.1145/3474717.3483923

[CR31] Wang L, Yu Z, Guo B, Yang D, Ma L, Liu Z, Xiong F. Data-driven targeted advertising recommendation system for outdoor billboard. ACM Transactions on Intelligent Systems and Technology (TIST) 2022;13(2):1–23. [Google Scholar]

[CR32] Wu, Z., Pan, S., Long, G., Jiang, J., Chang, X., Zhang, C.: Connecting the dots: Multivariate time series forecasting with graph neural networks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 753–763 (2020)

[CR33] Wu, Z., Pan, S., Long, G., Jiang, J., Zhang, C.: Graph wavenet for deep spatial-temporal graph modeling. arXiv preprint arXiv:1906.00121 (2019)

[CR34] Wu R, Luo G, Shao J, Tian L, Peng C. Location prediction on trajectory data: A review. Big data mining and analytics. 2018;1(2):108–127. doi: 10.26599/BDMA.2018.9020010. [DOI] [Google Scholar]

[CR35] Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SY. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems. 2020;32(1):4–24. doi: 10.1109/TNNLS.2020.2978386. [DOI] [PubMed] [Google Scholar]

[CR36] Xu, M., Dai, W., Liu, C., Gao, X., Lin, W., Qi, G.-J., Xiong, H.: Spatial-temporal transformer networks for traffic flow forecasting. arXiv preprint arXiv:2001.02908 (2020)

[CR37] Yang, D., Fankhauser, B., Rosso, P., Cudre-Mauroux, P.: Location prediction over sparse user mobility traces using rnns. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pp. 2184–2190 (2020)

[CR38] Yang, D., Qu, B., Yang, J., Cudré-Mauroux, P.: Lbsn2vec++: Heterogeneous hypergraph embedding for location-based social networks. IEEE Transactions on Knowledge and Data Engineering (2020)

[CR39] Yang, D., Qu, B., Yang, J., Cudre-Mauroux, P.: Revisiting user mobility and social relationships in lbsns: a hypergraph embedding approach. In: The World Wide Web Conference, pp. 2147–2157 (2019)

[CR40] Yang, D., Zhang, D., Yu, Z., Wang, Z.: A sentiment-enhanced personalized location recommendation system. In: Proceedings of the 24th ACM Conference on Hypertext and Social Media, pp. 119–128 (2013)

[CR41] Yang D, Zhang D, Chen L, Qu B. Nationtelescope: Monitoring and visualizing large-scale collective behavior in lbsns. J. Netw. Comput. Appl. 2015;55:170–180. doi: 10.1016/j.jnca.2015.05.010. [DOI] [Google Scholar]

[CR42] Yang D, Zhang D, Qu B. Participatory cultural mapping based on collective behavior data in location-based social networks. ACM Trans. Intell. Syst. Technol. (TIST) 2016;7(3):1–23. doi: 10.1145/2814575. [DOI] [Google Scholar]

[CR43] Yang D, Heaney T, Tonon A, Wang L, Cudré-Mauroux P. Crimetelescope: crime hotspot prediction based on urban and social media data fusion. World Wide Web. 2018;21(5):1323–1347. doi: 10.1007/s11280-017-0515-4. [DOI] [Google Scholar]

[CR44] Yu, Z., Ma, H., Guo, B., Yang, Z.: Crowdsensing 2.0. Communications of the ACM 64(11), 76–80 (2021)

[CR45] Yu, B., Yin, H., Zhu, Z.: Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875 (2017)

[CR46] Yu Z, Xu H, Yang Z, Guo B. Personalized travel package with multi-point-of-interest recommendation based on crowdsourced user footprints. IEEE Transactions on Human-Machine Systems. 2015;46(1):151–158. doi: 10.1109/THMS.2015.2446953. [DOI] [Google Scholar]

[CR47] Yu Z, Zhang D, Yu Z, Yang D. Participant selection for offline event marketing leveraging location-based social networks. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2015;45(6):853–864. doi: 10.1109/TSMC.2014.2383993. [DOI] [Google Scholar]

[CR48] Yu Z, Yi F, Lv Q, Guo B. Identifying on-site users for social events: Mobility, content, and social relationship. IEEE Trans. Mob. Comput. 2018;17(9):2055–2068. doi: 10.1109/TMC.2018.2794981. [DOI] [Google Scholar]

[CR49] Yuan, J., Zheng, Y., Zhang, C., Xie, W., Xie, X., Sun, G., Huang, Y.: T-drive: driving directions based on taxi trajectories. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 99–108 (2010)

[CR50] Zhang, Y., Dai, H., Xu, C., Feng, J., Wang, T., Bian, J., Wang, B., Liu, T.-Y.: Sequential click prediction for sponsored search with recurrent neural networks. In: AAAI (2014)

PERMALINK