Significance
Individual mobility models are important in a wide range of application areas. Current mainstream urban mobility models require sociodemographic information from costly manual surveys, which are in small sample sizes and updated in low frequency. In this study, we propose an individual mobility modeling framework, TimeGeo, that extracts required features from ubiquitous, passive, and sparse digital traces in the information and communication technology era. The model is able to generate individual trajectories in high spatial–temporal resolutions, with interpretable mechanisms and parameters capturing heterogeneous individual travel choices. The modeling framework can flexibly adapt to input data with different resolutions, and be further extended for various modeling purposes.
Keywords: human mobility, urban model, mobile phone data, networks, urban planning
Abstract
Well-established fine-scale urban mobility models today depend on detailed but cumbersome and expensive travel surveys for their calibration. Not much is known, however, about the set of mechanisms needed to generate complete mobility profiles if only using passive datasets with mostly sparse traces of individuals. In this study, we present a mechanistic modeling framework (TimeGeo) that effectively generates urban mobility patterns with resolution of 10 min and hundreds of meters. It ties together the inference of home and work activity locations from data, with the modeling of flexible activities (e.g., other) in space and time. The temporal choices are captured by only three features: the weekly home-based tour number, the dwell rate, and the burst rate. These combined generate for each individual: (i) stay duration of activities, (ii) number of visited locations per day, and (iii) daily mobility networks. These parameters capture how an individual deviates from the circadian rhythm of the population, and generate the wide spectrum of empirically observed mobility behaviors. The spatial choices of visited locations are modeled by a rank-based exploration and preferential return (r-EPR) mechanism that incorporates space in the EPR model. Finally, we show that a hierarchical multiplicative cascade method can measure the interaction between land use and generation of trips. In this way, urban structure is directly related to the observed distance of travels. This framework allows us to fully embrace the massive amount of individual data generated by information and communication technologies (ICTs) worldwide to comprehensively model urban mobility without travel surveys.
Our ability to correctly model urban daily activities for traffic control, energy consumption, and urban planning (1, 2) has critical impacts on people’s quality of life and the everyday functioning of our cities. To inform policy making of important projects such as planning a new metro line and managing the traffic demand during big events, or to prepare for emergencies, we need reliable models of urban travel demand. These are models with high resolution that simulate individual mobility for an entire region (3, 4). Traditionally, inputs for such models are based on census and household travel surveys. These surveys collect information about individuals (socioeconomic, demographic, etc.), their household (size, structure, relationships), and their journeys on a given day. Nonetheless, the high costs of gathering the surveys put severe limits on their sample sizes and frequencies. In most cases, they capture only of the urban household population once in a decade with information of only one or few days per individual. The low sampling rate has made it very costly to infer choices of the entire urban population (3, 5–7).
More recent studies try to learn about human behavior in cities by using data collected from location-aware technologies, instead of manual surveys, to infer the preferences in travel decisions that are needed to calibrate existing choice modeling frameworks (8–10). The problem, however, is that the geotagged data available from communication technologies, in the massive and low-cost form, cannot inform us about the detailed activity choices of their users, making most of the data useless for meaningful urban-scale mobility models. To make the best use of the massive and passive data, a fundamental paradigm shift is needed to model urban mobility and enhance new opportunities emerging through urban computing (11). This is our goal with TimeGeo, a modeling framework that extracts individual features and key mechanisms needed to effectively generate complete urban mobility profiles from the sparse and incomplete information available in telecommunication activities.
Mobile phones are the prevalent communication tools of the 21st century, with the worldwide coverage up to of the population (12). The call detailed records (CDRs), managed by mobile phone service providers for billing purposes, contain information in the form of geolocated traces of users across the globe. Mobile phone data have been useful so far to improve our knowledge on human mobility at an unprecedented scale, informing us about the frequency and the number of visited locations over long-term observations (13–18), daily mobility networks of individuals (15, 19), and the distribution of trip distances (13, 15, 17, 20–22). Due to the sparse nature of mobile phone use, these data sources have sampling biases and do not provide complete journeys in space and time for each individual (9). Nonetheless, it has been possible to extract and characterize from phone data where each individual may stay or pass by, and then infer the types of activities that they engage in at various urban locations depending on the time of their visits (23). By labeling visited location types for individual users as home, work, or other, representative traffic origin–destination (OD) matrices for an average day and by time of day can be generated (24, 25). They are aggregated estimates of person-trips between pairs of ODs within few hours, and these results have been successfully validated in various cities against existing travel demand models that required expensive surveys for calibration (24, 25).
A fundamental question still remains on how to perform a spatiotemporal mapping of raw mobile phone data to establish models of travel demand with high spatiotemporal resolution, through which individuals’ disaggregated daily journeys can be generated. In the current literature that analyzes sparse geotagged data, the daily temporal behavior of human mobility is either not modeled or oversimplified (13, 16). For example, previous studies on human dynamics do not explicitly model individual temporal choices, but randomly draw parameters such as waiting time or the number of activities in each active period from aggregated distributions measured from data (14, 15). The model in ref. 19 introduces time dependency in travel and tendency to arrange short out-of-home activities in consecutive sequences (i.e., bursts of activities) (26–30), but the stay duration at flexible (other) locations is fixed. Furthermore, it does not incorporate spatial choices or the heterogeneity of individual behavior.
To realistically model individual mobility in cities at both micro- and macrolevel, it is necessary to understand the essential features of a population distribution in space at different times. Here we show that these features can be extracted from big data sources. Instead of using social-demographic information to calibrate the set of detailed decisions involved in activity choices—as required by mainstream transportation modeling approaches––the framework consists of directly measurable parameters discovered from passive data. It represents a needed paradigm shift to model individual daily trajectories in cities, adapted to ubiquitously available sparse digital traces of individuals. The results are high-resolution travel diaries for a large sample of users based on their information and communication technology (ICT) data in the urban context. The presented set of parameters can be further refined as more information becomes available at the individual level.
Activity Extraction from Mobile Phone Data
To demonstrate the mechanistic modeling framework, we analyze a CDR data set of 1.92 million anonymous mobile phone users for a period of 6 wk in 2010 in the Greater Boston area. To have a control experiment, we also examine a donated set of self-collected mobile phone traces of a graduate student in the same region over a course of 14 mo in 2013 and 2014, recorded by a smartphone application. When an individual anchors at a location to conduct an activity, it is defined as a stay. We apply the stay extraction method discussed in the literature (23) to both data sets. We filter out signal jumps as well as pass-by records when mobile phone users were traveling. For each user, based on the start time and frequency of visits to each stay location, we infer the stay location type as home (H), work (W), or other (O).
We are able to identify home locations for 1.44 million users, which is of our initial user base. Next, we filter users who have more than 50 total stays and at least 10 home stays in the observation period. These are identified as active users and are used to extract the various parameters of TimeGeo (as explained in detail in the next sections). These active users can be labeled as commuters (133,448 individuals) who have journey-to-work trips, and noncommuters (43,606 individuals) who have no journey-to-work trips.
Fig. 1 illustrates the pipeline of extracting stays, labeling activity types, and deriving individual mobility features from raw mobile phone data for each of three demonstrated days. Fig. 1 A–C shows the raw cell phone records (in blue for 14 mo, and in purple for each day), and the extracted stay locations of the individual (in red). Fig. 1 D–F shows that for active users the extracted stays in each day define a daily journey (usually starting and ending at home). A trip is made when a user changes stay locations. The time bar shows the start time and duration for each stay, and activity types are color-coded.
Generating Mechanisms of Individual Mobility
The modeling framework of TimeGeo is presented in Fig. 2A. It integrates the temporal and spatial choice mechanisms of human mobility. We assume that for an individual agent, her work activity has a fixed location, start time, and duration; her home activity is fixed in terms of location but flexible with start time and duration; her other activity is flexible with regard to location, start time, and duration. The presented framework aims to model the flexible spatial and temporal mobility choices, whereas the schedule of the fixed activity (i.e., work) is assumed as predetermined (see SI Appendix, section 2 for details). We divide each day of a week into 144 discrete intervals of 10 min (i.e., 1,008 time intervals in a week). For each time interval t within a week, an individual first decides to stay or move. If she chooses to move, she then decides where to go. We improve from previous human mobility models (14, 19) by generating spatiotemporal patterns while introducing individual-specific mobility parameters, namely: a weekly home-based tour number, a dwell rate, and a burst rate (explicitly defined later). These parameters capture the heterogeneity of individual daily mobility observed in the passive digital traces. Nevertheless, due to the limited observation period of the CDR data used in this study, some parameters cannot be extracted at the individual level. These global parameters measure the preferential return and exploration rates, and the rank selection probability. As large-scale data with higher frequency (e.g., GPS traces) and longer observation periods (e.g., many months) become available, these global parameters could be measured at the individual level as well.
Temporal Choices.
To uncover the key generating mechanisms needed to reproduce individual daily trajectories, we propose a time-inhomogeneous Markov chain model with three individual-specific parameters—weekly home-based tour number (), dwell rate (), and burst rate ()—to capture individual circadian propensity to travel (16, 19, 31) and likelihood of arranging short activities in consecutive sequences (26–30). As work activity is assumed to have fixed start time and duration, we consider two Markov states: home and other. Home is considered as a less-active state, because the average stay duration at home is significantly longer than that at other states where people are more active (i.e., likely to travel).
When an individual l is at home, her individual travel circadian rhythm is defined as , representing her likelihood of making a trip originated from home in a time-interval t of a week. The weekly home-based tour number counts the total number of trips that an individual l initiated from home to other places. is the global travel circadian rhythm of the population in an average week. We differentiate for commuters and noncommuters (SI Appendix, section 3.1). For noncommuters, is measured as the fraction of all user-trips in the time interval t of the week for the population (i.e., , ), capturing the expected variation of travel in different time of the week (shown in Fig. 2B). For commuters, because work is modeled as a fixed activity, does not include trips to or from work. The product of the two, , less than 1, defines the individual travel probability at a specific time interval while she is at home.
To model an individual’s propensity to travel from an other (out-of-home) state, we introduce a dwell rate which measures how much more active (or likely to travel) the person is at an other state compared with home. The probability of traveling when an individual is at an other state is defined as . By capturing individual propensity to move from an other state, controls the stay duration for flexible activities. The higher the product , the more likely the person will choose to move and thus the shorter duration she will stay at other locations.
Next, if an individual is already out of home and chooses to move at time t, we then model her decision to either go home or go to an additional other location by introducing a burst rate . We define the probability that the individual travels from an other location to an additional other location as . It is assumed that for an individual who has decided to move, the probability of visiting an additional other location is proportional to . The ratio between the two choices of going to an additional other location or going home can be presented as follows:
[1] |
For a given value of , when is high (e.g., in the afternoon), people are more likely to visit additional other locations; when is low, people are more likely to return home. For a given , the higher the value of , the higher probability the individual will keep visiting flexible (other) locations, and thus the greater number of daily locations N she will visit.
Compared with previous models that randomly draw the stay duration (or waiting time ) or the number of visited locations (N) from aggregated empirical distributions (14, 15, 29), by introducing three individual-specific parameters including weekly home-based tour number , dwell rate , and burst rate , we explicitly model the temporal dynamics of individual mobility. The Markov model framework allows it to be analytically tractable and to derive explicit effects in the resulting stay-duration and daily-location distributions and (SI Appendix, section 6).
Spatial Choices.
To model the spatial choices of individual mobility, we propose a rank-based exploration and preferential return (r-EPR) model by incorporating a rank-based selection of new locations to the original EPR model (14). The EPR model explains well the differences in the frequency of visits of each location (13–18, 32). For each movement, an individual decides either to explore a new location with probability , or return to a previously visited location with probability . The exploration probability captures a decreasing propensity to visit new locations as the number of previously visited locations (S) increases with time, and effectively captures individual mobility choices between explorations and returns. If the individual decides to return to previously visited locations, she chooses a specific location i with probability defined as the visitation frequency of location i (14). Fig. 1 G–I illustrates with different circle sizes, using the volunteered student’s location records as an example. In each subfigure, we label the visitation frequency of each location up to the current day. We highlight locations visited in the current day in the foreground and show the previously visited ones in the background.
If the individual decides to explore a new location, she needs to choose a destination from a large number of possible alternatives. One limitation of the original EPR model proposed in ref. 14 is its lack of a mechanism for the new-location selection. To select a new location, the original EPR model randomly draws the exploration jump-size () from a global empirical distribution. To model the exploration mechanism more sensible to the urban structure, in this study, we incorporate a rank-based selection mechanism for newly explored locations (i.e., r-EPR model).
Our selection mechanism gives a rank k to each alternative destination based on their distances to the trip origin (33–36). Among all potential new destinations, the one closest to the current location is of k = 1, the second closest k = 2, etc. The empirical probability of selecting the kth location as a destination is quantified as ; the same form has been measured in various studies that analyze aggregated trips between locations for both commuting and noncommuting trips (33–36). For an individual to select an exploration destination, we measure aggregating all users’ destinations. Fig. 1 J–L illustrates probabilities of selecting different destinations (with higher ranks in red and lower ranks in blue). Each dot represents a location for an other activity extracted from the CDR data. The height of the dot on the z axis represents the dot density at the location.
Because the observation period of the empirical data in this study is 6 wk, most users have a limited number of exploration trips, making it difficult to estimate the spatial parameters of at the individual level. Given more abundant data, this distribution could be estimated at the individual level as well.
Role of Land Use on Travel Distance
Different spatial patterns of cities imply different geographical advantages to urban functioning (37). TimeGeo takes the spatial distribution of locations (e.g., observed from the CDR data) as an input. To explain and quantify the influence of land use on travel, we propose a hierarchical multiplicative cascade framework of analysis. It allows scenario tests on how changes in land-use patterns will affect individual travel. It can generate different scenarios of urban structure (i.e., spatial distribution of home and other activities).
Fig. 3 A–D shows the distribution of different types of locations (home and other) extracted from the mobile phone data set at two scales: At a scale with larger grids, home and other locations are mixed spatially, showing high spatial correlations. At a scale with smaller grids, the separation between home and other types of land use becomes clear (35). The intuition behind this phenomenon is that at a scale with smaller grids (e.g., similar to the census block level), land use is often separated—meaning that residential land use is separated from nonresidential one, whereas at a scale with larger grids (e.g., at the district, town, or regional level), residential and nonresidential land uses mix together. A hierarchical multiplicative cascade divides an area of interest into grids with different granularity and quantifies the spatial correlation of each type of land use at different scales.
The current framework integrates the two features that influence the spatial choices of exploration to other locations. These are (i) the spatial distribution of activity locations, and (ii) the rank-based location-selection mechanism (illustrated in Fig. 1 J–L). By characterizing the spatial distributions of population and facilities at various scales, here we formalize how these two features influence the observed trip–distance distribution.
To quantitatively represent home to other () trip distance, we denote home locations as the demand side D, and other locations as the supply side S. The entire region of interest is (taken as a unit square, shown in Fig. 3E). We progressively partition into , ,…, square tiles with side length . Each time a mother tile (at resolution level ) is partitioned into four daughter tiles (at resolution level i). Then, the probability that a trip goes outside its origin tile at resolution level i, , can be expressed as
[2] |
where M is the total number of supplies in the entire region ; is the probability that the k supplies in the origin tile are not chosen; is the probability of finding k supplies within the origin tile. The tile exceeding probability at different tile resolutions generates the resulting distribution of trip distances. Eq. 2 ties together the rank-based selection mechanism and the geographic distribution of locations , which can be calculated as
[3] |
where is the conditional probability that a trip originates in a tile at level i given D demands are in that tile. is the conditional probability of supply given demand. Q is the number of demand in the entire study area. In summary, to quantify trip distance through , we not only need the distribution of each type (home and other) of location, but also the correlation between them at different scales. The detailed introduction to the cascade method of analysis can be found in ref. 38 and in Materials and Methods; the derivation of the resulting trip distance distribution is presented in SI Appendix, section 5.
Results
Extracted Mobility Features from Mobile Phone Data.
In this section we show the results for noncommuters. For each individual, the weekly home-based tour number is directly extracted from the data, whereas the and parameters are calibrated using the temporal Markov model. The rest of the parameters needed are for the rank selection probability , and and for the preferential return mechanism . These three parameters are extracted from the aggregated data of the entire population (Fig. 2 D and E).
The individual values of and values are obtained by calibrating the Markov model to minimize the following statistic:
[4] |
where and are the distributions of the individual empirical and modeled stay duration, respectively. Scalar values and are the average daily number of visited locations measured from the individual’s empirical data and from the model simulation, respectively. The difference between and is that counts all trips whereas only counts trips starting at home. Metaparameter controls the weight between the two components. Because is a nonconvex function, discrete and values are used () to estimate the () pair that minimizes for each person. The empirical results of , , and for all of the individuals are presented in Fig. 2C. The median values of , , and for noncommuters are 7.4, 34.2, and 355.6, respectively. Median dwell rate , suggesting that when people are not at home, they are on average 4.6 times more likely to travel.
Simulated Mobility Features.
Taking the featured parameters measured directly from active users of the mobile phone data set, TimeGeo can generate realistic individual daily trajectories over a long time period at the urban scale.
We first use the student volunteer’s 14-mo mobile phone records as an example to explain the simulation and interpret the results of TimeGeo. We fix the locations of home and work (in this case school is identified as work) and apply the proposed modeling framework to simulate the spatiotemporal choices of flexible other activities and temporal choices of home activities. For the student, we computed that his dwell rate , burst rate , and weekly home-based tour number . His burst rate is lower than the population average, reflecting smaller likelihood to conduct consecutive short activities. Fig. 4 A–C shows three simulated days for the student. The days are predominated by home–work trips, with a few trips to other locations. The model is able to capture not only the number of locations visited each day, but also more detailed configuration of daily trip chains. Fig. 4D shows the distribution of the most frequent daily mobility networks, i.e., daily motifs, of the student. We represent unique locations as nodes and trips between locations as edges and count the motif distribution for days start and end at home. The dominating motif is traveling just between two locations in a day. To show the infrequent motifs clearer, we present the percentage in log scale.
A key value of TimeGeo is to use ICT records to generate individual trajectories from discovered mobility features at the urban scale. In Fig. 4 E–H, we illustrate a user with very sparse data. She only had four distinct locations in 30 d and we simulate her complete daily trajectories in space and time. We select two different sets of , , and from the joint distribution shown in Fig. 2C to generate two synthetic realizations of the user. Fig. 4 F and G shows the two resulting profiles of simulated journeys of the same sparse user and Fig. 4H shows the distinct motif distributions.
The importance of the individual features extracted from data (Fig. 2C) lies in their ability to capture diverse travel behaviors observed in the population. Fig. 5 A and B compares mobility patterns for different individual profiles. The individual 1 and 2 represent two extreme cases: one travels more frequently (shown in squares, , , ) and the other travels less frequently (shown in circles, , , ). As a comparison we also present the average case—a simulation using median values of the parameters , , and . Fig. 5 A and B shows that these three individuals have distinct and distributions. The less-frequent traveler has significantly longer stay duration and visits fewer locations per day. To quantify the differences between empirical distributions of data and the model simulation, we use the Kolmogorov–Smirnov (KS) test. The KS statistic between empirical and simulated for the two extreme individuals is 0.12 and 0.11, respectively. If we compare their empirical data with the average case, the KS statistic increases to 0.25 and 0.20, respectively. Similarly, for these two individuals, the KS statistic for is 0.05 and 0.12. When comparing with the average case, the KS statistic increases to 0.40 and 0.50, respectively. It confirms the importance of including individual-specific parameters to model temporal choices. With data of high frequency and longer observation period available in future studies, machine learning methods can be applied to better learn from choices at individual level when choosing return trips for improvement of our proposed modeling framework.
Fig. 5 C–F compares aggregated mobility features extracted from data and simulation for all of the active noncommuters. These results show that to reproduce individual mobility patterns realistically, it is critical to incorporate each of the mechanisms proposed in the current modeling framework, namely, the weekly home-based tour number, dwell rate, burst rate, and the rank-based EPR, over the land-use profile of the city under consideration. The results on the aggregated daily mobility motif distribution are presented in SI Appendix, section 4.2. For the dwell rate (), if , i.e., the model does not differentiate the mobility circadian rhythms of home or other activities. The resulting distribution will underestimate trips with short duration, and the KS statistic increases from 0.04 to 0.27. For the distribution, the KS statistic for the model with and without the burst rate is 0.03 and 0.22, respectively. The bursts of flexible activities, captured by the dwell and burst rates and , ensure realistic distributions of the stay duration and the number of daily visited locations . The improved rank-based EPR mechanism models the selection of locations. It improves the KS statistic of the trip distance distribution from 0.52 to 0.39. The visitation frequency to the Lth most visited location follows . In Fig. 3 D, measures the probability that a generic exploration trip goes outside its origin tile at resolution level i. At the largest four tile sizes (24, 12, 6, and 3 km), the cascade is a pure log-normal cascade, can be analytically calculated, and the result compares very well with the data. The empirical data, simulation, and analytical calculation all show that 10% of the trips cross the tile with a size of 24 km, and over 60% cross the tile with a size of 3 km.
Taken together, we now use the extracted features from active mobile phone users with the presented modeling framework to estimate the daily mobility for the entire metropolitan area. To do so, we expand the users (commuters and noncommuters) to the population (aged 16 and over), and generate 1-weekday mobility trajectories using TimeGeo for the population (see SI Appendix, section 4.3 for more details). In the Fig. 5 (Bottom), we compare our simulated daily mobility patterns for the population in Metro Boston (3.54 million individuals aged 16 and over) with traditional travel survey data, including the 2010–2011 Massachusetts Travel Survey (MTS) and the 2009 National Household Travel Survey (NHTS). When comparing the simulation results with the MTS and NHTS, respectively, the KS statistic for is 0.23 and 0.59 (Fig. 5G). Note that these stay duration distributions are significantly different among the surveys and our simulation. It is mainly because in the 1-d surveys people rarely report duration of stays longer than 12 h, whereas the active mobile phone users’ data records informed our simulation. This range of stays can add up to of the data, as seen in the cumulative distribution of Fig. 5A. Besides, the distribution of the daily visited locations compares well among the simulation and the surveys, as presented in Fig. 5H, with the KS statistic of 0.07 and 0.23, respectively. For , comparing the simulation with the MTS, the KS statistic is 0.24 (Fig. 5I). Here the model, which does not consider trip distances in the selection of return locations, overestimates long distance trips. We do not compare with travel distances from the national survey (NHTS), because spatial aspects of travel depend directly on the specific extension of the urban form, which varies across the nation (39).
Fig. 5J compares the total number of trips from home to work in our simulation with the estimates of the model developed by the Boston Region Metropolitan Planning Organization (MPO) for 2010 (40). The comparisons of the number of commuting trips are presented both for those between the 164 cities and towns in the metropolitan area (intertown) and for trips within them (intratown). The results for commuting trips are excellent, with a Pearson correlation coefficient of 0.90 and 0.99, respectively. More differences are present in the trips from home to other locations and between other types of locations. Finally, Fig. 5K compares the fraction of trips being initiated at different times of the day among our simulation, the 2009 NHTS, and the 2010 MTS. Although the total estimates compare well, we estimate more trips between nonhome destinations in the evening than those reported in the surveys (see SI Appendix, section 4.3 for detailed comparisons). Overall, the results show good agreement with existing MPO models which needed expensive travel survey for their calibration.
Conclusion
We present a mechanistic modeling framework to generate individual daily mobility with fine resolution at urban scale. Temporally, we introduce the weekly home-based tour number, dwell rate, and burst rate to model the bursts of short flexible activities in activity chains. This mechanism can reproduce individual distributions of stay duration, number of daily visited locations, and daily mobility motif distribution. Spatially, an improved rank-based EPR model is introduced to explain individual activity location selection choices. Compared with the original EPR model, the ranking mechanism quantifies the likelihood of selecting new destinations in space based on the distribution of facilities around trip origins. Moreover, the covariance of the distributions of population and facilities in a given region is characterized using a hierarchical multiplicative cascade framework of analysis. In this way, we take account of the influence of region-specific spatial structure on individual travel distances. This enables us to perform scenario tests on how changing land use in the city would affect microlevel individual travel behavior and macrolevel OD flows.
TimeGeo serves as a general modeling framework of urban trajectories that can be flexibly adapted to different application scenarios using population density and the distributions of facilities in any city. It can be coupled with sparse location data from ICTs that sample the visitation preferences of actual individuals and can complement or, for some applications, substitute the need for expensive travel surveys for modeling urban travel. The framework is flexible to generate trajectories with various data conditions. The minimum requirement is to have population and facility distributions. In the current results, the parameters to model exploration and returns (α, ρ, and γ) are assumed to be the same across population, whereas the temporal mobility rates of an individual are assumed to be independent of the actual location. In future studies, as more data of higher frequency and over longer periods become available, it is possible to further learn from the individual variations of the proposed parameters. It is also interesting to explore the variations of the model parameters across urban areas, and across population groups with different demographics and lifestyles.
Materials and Methods
All study procedures were carried out with Institutional Review Board approval from Massachusetts Institute of Technology (MIT) Committee on the Use of Humans as Experimental Subjects (COUHES) (Protocol 1405006399) approved on June 10, 2014. CDR data were collected by AirSage for operational purposes of two mobile phone carriers. The student, who donated his 14-mo self-collected mobile phone traces through a smartphone application (OpenPaths), provided informed consent for the research.
Mobile Phone Data.
We extracted activity stay locations of 1.92 million cell phone users from their CDRs in the Greater Boston area during an observation period of 6 wk in 2010. A stay means performing an activity at a location. A stay sequence, or an activity sequence, represents consecutive stays a person made in a period (usually a day). A trip is made between consecutive stay locations. These stay locations are also called trip origins and destinations. In the CDR data, a record is made when a user calls, sends text messages, or uses data through the cellular networks. Each record is in the following format: (UserID, longitude, latitude, time). The precision of the location is about 200–300 m in urban areas. For the voluntarily self-collected mobile phone user example, a record is made every time the smartphone application detects a significant spatial movement. The data are in the same format and similar spatial resolution as the CDR data. The detailed methods to extract stay locations and to label location types (as home, work, and other) are presented in SI Appendix, section 1. For the CDR data, the records do not directly correspond to a user’s stays—a stay could not be detected if a user did not use his or her cell phone more than once during a stay. Even for cases when more than one cell phone use was recorded, the stay duration can only be approximated for active phone users. Therefore, not all cell phone users have enough records to be measured for basic mobility patterns presented in this study. Meanwhile, we cannot determine if long stays at one location (for over 2 d) are caused by no cell phone use or actual stay at one location for over 2 d; therefore, these stays were removed from the analysis and not captured by the model.
The Hierarchical Multiplicative Cascade Model.
For any given subregion , is the number of trip origins in ω and is the number of trip destinations in ω. We use bivariate random measures to represent the number of demand and supply locations in ω, where X results from a cascade process in which the fluctuations at different spatial scales combine in a multiplicative way. The generation of bivariate cascades is illustrated in Fig. 3C. The demand and supply in a generic i-tile are and and the associated measure densities are and . One starts with uniform measure densities and in , then progressively partitions into , , …, square tiles of side length . The demand and supply densities in the daughter tiles are multiplied by independent realizations of nonnegative random factors and , with mean value 1. The random vectors are the generators of the cascade. Although the generators have independent values in different i tiles, their components and in a given i tile may be dependent. Moreover, the distribution of may vary with the resolution level i. These features provide important modeling flexibility. The measured densities at resolution level and i are related as
[5] |
According to Fig. 3 A–D), at larger tile sizes almost all tiles are nonempty and the supply and demand have positive correlation. Consequently for small i values (large tile sizes) the generator can be described as joint log-normal variables (38). If the log generators and have joint normal distribution with variances and , mean values and , and correlation coefficient , then and have joint normal distribution with mean values and , variances and , and correlation coefficient given by
[6] |
[7] |
[8] |
Therefore, once we can estimate , , and , the rest of the variables can be calculated.
At smaller tile sizes, empty tiles cannot be ignored and extreme forms of dependence like mutual exclusion may occur. In this case the generator is better modeled as a β-cascade, in which a tile is either filled or empty. The generators of a bivariate β-cascade have a discrete distribution with probability masses concentrated at four () points: mass at , mass at , mass at , and mass at . , , and . Thus, a tile is either filled or empty. The correlation between the supply and demand is .
Supplementary Material
Acknowledgments
We thank Chaoming Song for enlightening discussions during the design of this work. The research reported herein was funded in part by the MIT–Ford Alliance, MIT–Philips Alliance, the MIT-Brazil Program, the MIT-Portugal Program, the Samuel Tak Lee Real Estate Entrepreneurship Laboratory at MIT, US Department of Transportation via the program New England University Transportation Center (UTC) Year 25, and the Center for Complex Engineering Systems at King Abdulaziz City for Science and Technology (KACST).
Footnotes
Conflict of interest statement: The authors declare a conflict of interest. The presented work is part of a patent pending: Massachusetts Institute of Technology Case 18887, “UrbanFlows: Improving Urban Traffic Without Surveys,” by M.C.G.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1524261113/-/DCSupplemental.
References
- 1.Goodchild MF. Citizens as sensors: The world of volunteered geography. GeoJournal. 2007;69(4):211–221. [Google Scholar]
- 2.Batty M. The New Science of Cities. MIT Press; Cambridge, MA: 2013. [Google Scholar]
- 3.Nagel K, Beckman RJ, Barrett CL. 1999. Transims for urban planning. 6th International Conference on Computers in Urban Planning and Urban Management, Venice, Italy (Los Alamos National Laboratory, Los Alamos, NM). Available at https://www.researchgate.net/publication/243768002. Accessed August 6, 2016.
- 4.Ben-Akiva M, Bierlaire M. Handbook of Transportation Science. Springer; New York: 1999. Discrete choice methods and their applications to short term travel decisions; pp. 5–33. [Google Scholar]
- 5.Balmer M, et al. 2nd TRB Conference on Innovations in Travel Modeling, Portland, Oregon. Eidgenössische Technische Hochschule Zürich; Zurich: 2008. Agent-based simulation of travel demand: Structure and computational performance of MATSim-T. [Google Scholar]
- 6.Arentze T, Timmermans H. Albatross: A Learning Based Transportation Oriented Simulation System. EIRASS, Eindhoven; The Netherlands: 2000. [Google Scholar]
- 7.Bowman JL, Ben-Akiva ME. Activity-based disaggregate travel demand model system with activity schedules. Transp Res Part A Policy Pract. 2001;35(1):1–28. [Google Scholar]
- 8.Danalet A, Tinguely L, de Lapparent M, Bierlaire M. 2016. Location Choice with Longitudinal WiFi Data Location choice with longitudinal WiFi data. Journal of Choice Modelling 18:1–17.
- 9.Zilske M, Nagel K. Studying the accuracy of demand generation from mobile phone trajectories with synthetic data. Procedia Comput Sci. 2014;32:802–807. [Google Scholar]
- 10.Zilske M, Nagel K. A simulation-based approach for constructing all-day travel chains from mobile phone data. Proc Comput Sci. 2015;52:468–475. [Google Scholar]
- 11.Zheng Y, Capra L, Wolfson O, Yang H. Urban computing: Concepts, methodologies, and applications. ACM Trans Intell Syst Technol. 2014;5(3):38. [Google Scholar]
- 12.Blondel VD, Decuyper A, Krings G. 2015. A survey of results on mobile phone datasets analysis. arXiv:1502.03406.
- 13.González MC, Hidalgo CA, Barabási AL. Understanding individual human mobility patterns. Nature. 2008;453(7196):779–782. doi: 10.1038/nature06958. [DOI] [PubMed] [Google Scholar]
- 14.Song C, Koren T, Wang P, Barabási AL. Modelling the scaling properties of human mobility. Nat Phys. 2010;6(10):818–823. [Google Scholar]
- 15.Perkins TA, et al. Theory and data for simulating fine-scale human movement in an urban environment. J R Soc Interface. 2014;11(99):20140642. doi: 10.1098/rsif.2014.0642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Song C, Qu Z, Blumm N, Barabási AL. Limits of predictability in human mobility. Science. 2010;327(5968):1018–1021. doi: 10.1126/science.1177170. [DOI] [PubMed] [Google Scholar]
- 17.Hasan S, Schneider CM, Ukkusuri SV, González MC. Spatiotemporal patterns of urban human mobility. J Stat Phys. 2013;151(1-2):304–318. [Google Scholar]
- 18.Toole JL, Herrera-Yaqüe C, Schneider CM, González MC. Coupling human mobility and social ties. J R Soc Interface. 2015;12(105):20141128. doi: 10.1098/rsif.2014.1128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Schneider CM, Belik V, Couronné T, Smoreda Z, González MC. Unravelling daily human mobility motifs. J R Soc Interface. 2013;10(84):20130246. doi: 10.1098/rsif.2013.0246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kölbl R, Helbing D. Energy laws in human travel behaviour. New J Phys. 2003;5(1):48. [Google Scholar]
- 21.Balcan D, et al. Multiscale mobility networks and the spatial spreading of infectious diseases. Proc Natl Acad Sci USA. 2009;106(51):21484–21489. doi: 10.1073/pnas.0906910106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Viswanathan G, et al. Lévy flight search patterns of wandering albatrosses. Nature. 1996;381(6581):413–415. [Google Scholar]
- 23.Jiang S, et al. 2013. A review of urban computing for mobile phone traces: Current methods, challenges and opportunities. Proceedings of the 2nd ACM SIGKDD International Workshop on Urban Computing, UrbComp ’13 (ACM, New York), pp 2:1–2:9.
- 24.Toole JL, et al. The path most traveled: Travel demand estimation using big data resources. Transp Res, Part C Emerg Technol. 2015;58(B):162–177. [Google Scholar]
- 25.Alexander L, Jiang S, Murga M, González MC. Origin–destination trips by purpose and time of day inferred from mobile phone data. Transp Res, Part C Emerg Technol. 2015;58:240–250. [Google Scholar]
- 26.Vázquez A, et al. Modeling bursts and heavy tails in human dynamics. Phys Rev E Stat Nonlin Soft Matter Phys. 2006;73(3 Pt 2):036127. doi: 10.1103/PhysRevE.73.036127. [DOI] [PubMed] [Google Scholar]
- 27.Barabási AL. The origin of bursts and heavy tails in human dynamics. Nature. 2005;435(7039):207–211. doi: 10.1038/nature03459. [DOI] [PubMed] [Google Scholar]
- 28.Hidalgo R, César A. Conditions for the emergence of scaling in the inter-event time of uncorrelated and seasonal systems. Physica A. 2006;369(2):877–883. [Google Scholar]
- 29.Malmgren RD, Stouffer DB, Motter AE, Amaral LA. A Poissonian explanation for heavy tails in e-mail communication. Proc Natl Acad Sci USA. 2008;105(47):18153–18158. doi: 10.1073/pnas.0800332105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Karsai M, Kaski K, Barabási AL, Kertész J. Universal features of correlated bursty behaviour. Sci Rep. 2012;2:397. doi: 10.1038/srep00397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Jo HH, Karsai M, Kertész J, Kaski K. Circadian pattern and burstiness in mobile phone communication. New J Phys. 2012;14(1):013055. [Google Scholar]
- 32.Pappalardo L, et al. Returners and explorers dichotomy in human mobility. Nat Commun. 2015;6:8166. doi: 10.1038/ncomms9166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Simini F, González MC, Maritan A, Barabási AL. A universal model for mobility and migration patterns. Nature. 2012;484(7392):96–100. doi: 10.1038/nature10856. [DOI] [PubMed] [Google Scholar]
- 34.Noulas A, Scellato S, Lambiotte R, Pontil M, Mascolo C. A tale of many cities: Universal patterns in human urban mobility. PLoS One. 2012;7(5):e37027. doi: 10.1371/journal.pone.0037027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Yang Y, Herrera C, Eagle N, González MC. Limits of predictability in commuting flows in the absence of data for calibration. Sci Rep. 2014;4:5662. doi: 10.1038/srep05662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Noulas A, Shaw B, Lambiotte R, Mascolo C. Proceedings of the 24th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee; New York: 2015. Topological properties and temporal dynamics of place networks in urban environments; pp. 431–441. [Google Scholar]
- 37.Batty M. The size, scale, and shape of cities. Science. 2008;319(5864):769–771. doi: 10.1126/science.1151419. [DOI] [PubMed] [Google Scholar]
- 38.Veneziano D, Gonzalez MC. 2010. Trip length distribution under multiplicative spatial models of supply and demand: Theory and sensitivity analysis. arXiv:1101.3719.
- 39.Newman PG, Kenworthy JR. Cities and Automobile Dependence: An International Sourcebook. Gower, Aldershot; UK: 1989. pp. 1–388. [Google Scholar]
- 40.CTPS 2013. Methodology and assumptions of central transportation planning staff regional travel demand modeling. Available at www.ctps.org/Drupal/data/pdf/about/mpo/recert_2014/CTPS_GLE_Modeling_Method_20130416.pdf. Accessed March 17, 2016.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.