Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2020 Jan 13;48(1):105–123. doi: 10.1080/02664763.2019.1711363

A statistical framework for measuring the temporal stability of human mobility patterns

Zhihang Dong a,b, Yen-Chi Chen a, Adrian Dobra a,CONTACT
PMCID: PMC9042129  PMID: 35707234

ABSTRACT

Despite the growing popularity of human mobility studies that collect GPS location data, the problem of determining the minimum required length of GPS monitoring has not been addressed in the current statistical literature. In this paper, we tackle this problem by laying out a theoretical framework for assessing the temporal stability of human mobility based on GPS location data. We define several measures of the temporal dynamics of human spatiotemporal trajectories based on the average velocity process, and on activity distributions in a spatial observation window. We demonstrate the use of our methods with data that comprise the GPS locations of 185 individuals over the course of 18 months. Our empirical results suggest that GPS monitoring should be performed over periods of time that are significantly longer than what has been previously suggested. Furthermore, we argue that GPS study designs should take into account demographic groups.

Keywords: Density estimation, global positioning systems (GPS), human mobility, spatiotemporal trajectories, temporal dynamics

2010 Mathematics Subject Classifications: 62G07, 62M10, 91D25, 62H11

1. Introduction

Recent developments on global positioning systems (GPS) for wearable technology such as smartphones have drawn a great amount of interest from scientists studying the effects of environmental influences on different population groups [2,10,12,13,17,20–23,28,30]. A recent article [18] documents more than 100 studies from 20 disciplines that collect and analyze human time-stamped GPS location data. This type of data is key for learning about the places where people routinely spend their time during activities of daily living in order to establish their relationship with socio-economic outcomes, crime victimization, and physical and mental well-being. There have been extensive studies on the social stratification of mobility, such as health disparities of different neighborhoods, mental health, and substance abuse intervention [9,24,28], on the assessment of human spatial behavior and spatiotemporal contextual exposures [12,17,20], on the characterization of the relationship between geographic and contextual attributes of the environment (e.g. the built environment) and human energy balance (e.g. diet, weight, physical activity) [2,30], on the study of segregation, environmental exposure, and accessibility in social science research [13], or on the understanding of the relationship between health-risk behavior in adolescents (e.g. substance abuse) and community disorder [1,27,28].

Notwithstanding a general consensus across disciplines about the tremendous potential of GPS location data for studying human mobility, very little is currently known about how long a GPS study should last. There is an inherent trade-off between collecting location data from people for longer vs. shorter periods of time. Recording more GPS locations yields more information about the locations where an individual spends their time, as well as about the frequency, duration and timing of their visits to these places. However, an individual's participation in a GPS study comes with burdens that often become significant if accumulated over longer periods of time: the individual needs to carry the device recording the data (a GPS tracker) everywhere they go and needs to make sure the device is properly charged at all times and functions properly. Until recently, most GPS study designs stipulated mandatory regular visits to project coordination sites to download data from the location trackers, to replace batteries, and replace the GPS tracking devices that were lost or were malfunctioning. While some of these issues have been addressed by using specialized apps on smartphones to collect GPS data and wirelessly transmit them into secure cloud databases, the costs of distributing smartphones to study participants, data plans, software development, and cloud computing are quite significant. In addition, there are important privacy considerations related to recording locations that might be sensitive for study participants for long periods of time. For these reasons, it is desirable to design GPS studies that are as short as possible to reduce the costs of the projects and the burden of study participants, while at the same time still providing guarantees that sufficient location data have been collected to properly address the research aims.

Despite the constant growth in the number of human mobility studies that collect GPS location data in the last 20 years, the question about the determination of the amount of time of GPS monitoring has not been asked until recently [29]. In this paper, the authors argue that an effective GPS study should last until a minimum of 14 to 15 days of valid GPS data have been collected. While this finding is relevant for numerous research groups that, in the past, have designed GPS studies with a duration of 7 days (see [29] and the references therein), two weeks seems to severely underestimate the duration of other, more recent, GPS studies whose duration is significantly longer. For example, Refs. [4,19] represent studies that tracked adolescents in the San Francisco Bay area for one month. Another study [8] employs a more complex three site design that comprises five assessments that take place every six months over two years of follow-up for participants enrolled in Chicago, and three assessments that take place every six months over one year of follow-up for participants enrolled in Jackson and New Orleans. During each assessment, participants wear a GPS tracker for 2 weeks. Thus, this study [8] records GPS locations for a total of 10 weeks and 6 weeks, respectively, but splits the period of observation into several contiguous 2-week periods of GPS monitoring. These longer periods of observation time were suggested in [16] who found 17 weeks to be an adequate period of time to monitor human mobility based on geotagged social media data.

In this paper, we lay out a theoretical framework for assessing the temporal stability of human mobility based on GPS location data. Such a framework is missing from the current statistical literature. Previous work [16,29] on the assessment of the duration of GPS observation periods is based on empirical findings and lacks any theoretical underpinnings. We address this gap by introducing several measures of the temporal dynamics of spatiotemporal trajectories of individuals. We illustrate the use of these measures with publicly available data from a study that recorded GPS locations of 185 individuals that live in a city in Switzerland over the course of 18 months.

2. Methods

The spatiotemporal trajectory of an individual in a reference time frame [tmin,tmax] and spatial observation window WR+2 is a curve

X[tmin,tmax]={X(t)=(x1(t),x2(t)):t[tmin,tmax]}W, (1)

where x1() and x2() represent the longitude and latitude coordinates, respectively, and X(t) is the location visited by this individual at time t. We assume that this curve is smooth: x1() and x2() have continuous derivatives. The length of the curve in Equation (1) is defined as [6]:

L(X[tmin,tmax])=tmintmaxdx1(t)dt2+dx2(t)dt2dt. (2)

The complete trajectory X[tmin,tmax] is never observed in the real world. Instead, n observation times t1,,tn are sampled from a distribution on [tmin,tmax] with density ρ(), and the corresponding locations X(t1),,X(tn) on the curve X[tmin,tmax] are recorded. These locations are realizations of a random variable X(T) where Tq(). Ideally, we would like T to follow a uniform distribution to have the same chance of recording a visited location anywhere in the reference time frame [tmin,tmax]. Due to technological limitations (e.g. GPS devices running out of power), heterogeneous built environments that prevent GPS devices to obtain a location (e.g. skyscrapers in downtown areas or buildings without windows and WIFI coverage), or human behavioral factors (e.g. individuals turning off their GPS devices around certain locations sensitive to them) the distribution of T can be far from the uniform distribution.

We assume that GPS positional data from K study participants were recorded. We denote by Xk[tmin,tmax]={Xk(t):t[tmin,tmax]} the unobserved spatiotemporal trajectory of the kth study participant. The observation times in the reference time frame [tmin,tmax] can vary between study participants. The GPS data for the kth study participant are the time-stamped longitude and latitude locations:

{Xk,i=Xk(tk,i):i=1,,nk}, (3)

where nk1, the time tk,i was sampled from a distribution with density ρk() independently of the rest of the observation times, and tmintk,1tk,nktmax. Here tk,i represents the time when the ith location of study participant k was recorded. Our framework allows for the possibility of having different reference time frames for various groups of study participants.

2.1. Measuring the temporal stability of human mobility patterns

One possible measure of the dynamics of the spatiotemporal trajectory X[tmin,tmax] is the average velocity V(τ) at time τ which is a function V(τ) of the length of the subcurve X[tmin,tmin+τ] of X[tmin,tmax] from Equation (1):

V(τ)=1τL(X[tmin,tmin+τ]), (4)

for τ(0,tmaxtmin] and V(0)=0. A sample estimator of the average velocity for the kth study participant is

Vˆk(τ)=1τ{i:tk,i+1τ}Xk,i+1Xk,i. (5)

where Xk,i+1Xk,i represents an estimate of the distance traveled between times tk,i and tk,i+1. The average velocity is a straightforward way to quantify the dynamical characteristics of an individual, hence its stability can be used as an intuitive, easy to understand the measure of temporal stability.

In what follows we will assume that study participants traveled in a straight line or ‘as the crow flies’ between two consecutive observed GPS locations. This is the simplest assumption one can make which leads to an easy way of calculating Great Circle (WGS84 ellipsoid) distances between two spatial locations [3]. However, this assumption underestimates actual distances traveled and consequently underestimates the average velocity. More accurate approximations of distances traveled can be defined based on the shortest distances between two locations on a road network that spans the spatial observation window W. Calculating distances based on a road network is more complex than calculating straight-line distances and involves significant GIS work since the maximum speed of travel on different segments of road needs to be taken into account [7]. Nevertheless, as the span of time between two consecutive observed locations becomes shorter, the difference between the road network and straight line distances decrease.

More generally, consider a stochastic process Z={Z(τ):τ[0,tmaxtmin]}, where Z(τ) is a mapping f() of the subcurve X[tmin,tmin+τ] into R+. The mapping f() is chosen such that limτ(tmaxtmin)Z(τ)=Z(tmaxtmin). We define the absolute percentage error (APE, henceforth) φ(Z;τ) which measures the error made when approximating Z(tmaxtmin) with Z(τ) for τ[0,tmaxtmin]:

φ(Z;τ)=|Z(τ)Z(tmaxtmin)|Z(tmaxtmin).

We quantify the temporal stability of the process Z by introducing a related process called the last crossing time process LCTZ={LCTZ(γ):γ0}, where

LCTZ(γ)=maxτ[0,tmaxtmin]:φ(Z;τ)>γ. (6)

In Equation (6), LCTZ(γ) is the last time when the APE made when Z(tmaxtmin) is approximated with Z(τ) is above a threshold γ. The last crossing time is well defined since limτ(tmaxtmin)φ(Z;τ)=0.

Consider the process Zk={Zk(τ):τ[0,tmaxtmin]} associated with the kth study participant, Zk(τ)=f(Xk[tmin,tmin+τ]), and let Zˆk be its sample estimator based on the positional data in Equation (3). The average velocity in Equation (4) and its sample estimator in Equation (5) are examples of processes Zk and Zˆk. A sample estimator of the last crossing time LCTZk(γ) is

LCTˆZk(γ)=maxi=1,,nktk,itmin:φ(Zˆk;tk,itmin)>γ. (7)

We note that Zˆk(τ) in the APE φ(Zˆk;τ) is determined based on the locations recorded for the kth study participant before time τ: {Xk,i:tmintk,iτ}. As an illustration, Figure 1 shows estimates of the average velocity of an individual in the MDC data, together with the last crossing time estimate at γ=0.1. The threshold γ is a precision threshold specified by the user. It reflects the analyst's requirement on how stable the estimator (7) has to be. By decreasing γ, the stability of this estimator increases. Smaller values of γ correspond with a more stable estimator. For example, the choice γ=0.1 expresses a 10% relative error to a long-term study.

Figure 1.

Figure 1.

Estimate of the average velocity (gray curve) of an individual in the MDC data over tmax=21 weeks. The dashed line indicates the value of Vˆ(tmax), and the two dotted lines represent the lower bound (1γ)Vˆ(tmax) and the upper bound (1+γ)Vˆ(tmax) for γ=0.1. These bounds correspond with times τ for which the APE φ(V;τ)γ. The crosses denote the times τ for which φ(V;τ)=γ. The last crossing time for γ=0.1 is marked with a triangle and occurs at the end of week 10.

The last crossing time of the APE associated with a process that is a function of the spatiotemporal trajectory of a study participant represents a measure of this individual's mobility. Study participants that have more irregular mobility patterns (e.g. regular travel to locations at various distances from the individual's residence that change after a few days or weeks) are expected to have larger last crossing times compared to study participants that travel to the same locations each week. An example individual with a very regular mobility pattern that travels every day from his home to his office and back by following the same route and goes nowhere else will record an APE equal to 0 after one day which leads to last crossing times of less than one day in Equation (7).

Previous work [29] on the temporal stability of spatiotemporal trajectories has used the mean absolute percentage error (MAPE) which is the average of the APE across study participants:

φ¯K(τ)=1Kk=1Kφ(Zˆk;τ). (8)

We define two measures of the overall temporal stability of the spatiotemporal trajectories of multiple study participants. The first overall measure is the last crossing time process LCTφ¯K={LCTφ¯K(γ):γ0} of the MAPE process φ¯K={φ¯K(τ):τ[0,tmaxtmin]}. We refer to this measure as LCT--MAPE(Z). The second overall measure is defined as the average of the last crossing times of the APE of Zˆk for k=1,,K, i.e. LCT¯K={LCT¯K(γ):γ0} where

LCT¯K(γ)=1Kk=1KLCTZk(γ).

We denote this second measure by LCT¯APE(Z). These two measures are the same only if they are calculated for a single study participant (K = 1). They are useful for comparing the temporal regularity of mobility patterns of groups of study participants (e.g. younger vs. older individuals, men vs. women, high SES vs. low SES).

2.2. The activity distribution of human mobility patterns

The average velocity associated with the spatiotemporal trajectory of an individual does not provide any information about the spatial configuration of locations visited. Consider two example individuals that drive without stopping with the same speed for a long period of time. The first example individual drives back and forth between two places A1 and A2. The second example individual drives in a cycle from a place A1 to another place A2, then to places A3 and A4, then back to place A1. Since the spatiotemporal trajectory of the second individual involves two additional places, more sample locations will be needed to understand the mobility pattern of the second individual compared to the mobility pattern of the first individual. However, the mobility patterns of these two example individuals will be indistinguishable based on the last crossing time process associated with their average velocity processes. We address this issue by introducing a distribution of the locations visited by an individual.

We assume that the observation window W is partitioned into a set of grid cells G={G1,,GN}. Each location X(t) on the curve X[tmin,tmax] representing the spatiotemporal trajectory of an individual is mapped into a grid cell G(t)G. The observed locations for this individual mapped into G are the sequence of grid cells g1=G(t1),,gn=G(tn) that are realizations of a random variable G(T) where T is a random variable on [tmin,tmax] with a distribution with density ρ().

We define the activity distribution π=(π1,,πN) over the grid cells G. Here πj represents the proportion of time in [tmin,tmax] spent by an individual in cell GjG. We assume that T follows a uniform distribution on [tmin,tmax] and define:

πj=P(G(T)=Gj),for j=1,,N. (9)

The activity distributions associated with the two example individuals we introduced earlier can differentiate between their mobility patterns if the grid cells in which A3 and A4 do not coincide with the grid cells of A1 and A2 and will show that the first example individual did not spend any time in the grid cells associated with A3 and A4. To employ activity distributions we need to have a method for recovering them from the available data.

The simplest estimator πˆ=(πˆ1,,πˆN) of the activity distribution π is based on the relative frequency of visitation of the grid cells G:

πˆj=1ni=1n1(gi=Gj),for j=1,,N.

However, this estimator of π is reasonable only if T follows a uniform distribution as in Equation (9). When T follows an arbitrary distribution with density ρ(), a better approach is to use a weighted average estimator π~=(π~1,,π~N) where:

π~j=i=1nρ1(ti)1(gi=Gj)=1nρ1(t),for j=1,,N. (10)

Although this estimator can be shown to be statistically consistent, it requires knowledge of the density ρ(). There are many methods for estimating ρ() from the data such as histograms or kernel density estimators [26]. We suggest using an estimation method that assumes that the distribution of T is approximated by a piecewise uniform distribution. We take t0=tmin and tn+1=tmax. If T is approximately uniform in [ti1,ti+1] for i=1,,n, then ρ1(ti)ti+1ti1. This is a reasonable assumption if the times when locations are collected are roughly equally spaced in time (e.g. a location is collected every 10 minutes) since the mean of ti is (ti+1ti1)/2. Thus, an estimator of ρ() is

ρˆ(ti)=ω(ti)=1nω(t),ω(ti)=1ti+1ti1,for  i=1,,n.

The weighted average estimator from Equation (10) becomes

πˆo,j=i=1nω1(ti)1(gi=Gj)=1nω1(t)=i=1n(ti+1ti1)1(gi=Gj)tmaxtmin+tnt1,for j=1,,N. (11)

We call πˆo=(πˆo,1,,πˆo,N) the ordinary proportional time estimator of the activity distribution π. This estimator relies on the assumption that the length of the time intervals in which an individual transitions between two grid cells is added to the time spent in both the grid cell they leave from, and the grid cell they arrive in. More specifically, assume that the consecutive observation times ti and ti+1 are such that gigi+1. Then πˆo allocates (ti+1ti) to the total time spent in both gi and gi+1.

We introduce a second estimator πˆc=(πˆc,1,,πˆc,N) of the activity distribution π:

πˆc,j=i=2n(titi1)1(gi=gi1=Gj)i=2n(titi1)1(gi=gi1),for j=1,,N. (12)

We call πˆc the conservative proportional time estimator. This estimator is more conservative than the ordinary proportional time estimator πˆo from Equation (11) in the sense that any time interval defined by consecutive observation times ti and ti+1 such that gigi+1 is ignored. That is, the time spent in a grid cell is calculated only based on time intervals in which an individual is known to have remained in that cell.

We show two important properties of the ordinary and the conservative proportional time estimators. First, we prove that both estimators are asymptotically equivalent. Second, we prove that both estimators are statistically consistent, that is, they will eventually recover the true activity distribution π if sufficient location data are available. These properties rely on the assumptions (S1), (S2) and (S3) below:

  1. The length of the time intervals between consecutive observation times maxi=1,,n1|ti+1ti|0 as the sampling rate n.

  2. The sampling period is such that t1tmin and tntmax when n.

  3. The number of transitions between grid cells is finite, i.e. there exists M< such that t[tmin,tmax]1(G(t+)G(t))M, where G(t) and G(t+) are the left and right limits of G() at t.

Assumptions (S1) and (S2) describe the meaning of asymptotics in our context. They imply that the observation times t1,,tn will eventually be dense in the reference time frame, i.e. there will not exist a fixed region of [tmin,tmax] without any observation times when n. Assumption (S3) requires that the spatiotemporal trajectory X[tmin,tmax] is sufficiently smooth such that it will not jump between grid cells infinitely often.

Theorem 2.1 Asymptotic Equivalence Rule with Large Sampling Rate —

Under assumptions (S1), (S2) and (S3), the ordinary proportional time estimator πˆo from Equation (11) and the conservative proportional time estimator πˆc from Equation (12) are asymptotically the same.

The proof of this result is given in Appendix A.1. We can also show that the same assumptions imply that the two estimators are statistically consistent.

Theorem 2.2 Convergence Rule with Large Sampling Rate —

Under assumptions (S1), (S2) and (S3), the ordinary proportional time estimator πˆo from Equation (11) and the conservative proportional time estimator πˆc from Equation (12) converge to the true activity distribution π from Equation (9).

The proof of this result is given in Appendix A.2.

2.3. Measuring the temporal stability of human activity distributions

We are interested in determining the temporal stability of the activity distribution of an individual. We assume that the reference time frame [tmin,tmax] is divided into Dmax time periods of equal lengths (e.g. days or weeks). We denote by π(d) the activity distribution from Equation (12) associated with time period D, D=1,,Dmax. Then π(D) can be viewed as an N-dimensional random vector whose distribution reflects the variability from time period to time period of the individual's mobility patterns. With this understanding, we are interested in determining the expectation π¯=E(π(D)). We call π¯ the time period activity distribution (e.g. daily or weekly activity distribution). The jth component of π¯ is interpreted as the average proportion of time spent by the individual in grid cell Gj in a given time period (a day or a week).

A simple estimator of π¯ is

π¯ˆ(D)=1Dd=1Dπˆ(d),for D=1,,Dmax, (13)

where πˆ(d) is the ordinary proportional time estimator πˆo from Equation (11) or the conservative proportional time estimator πˆc from Equation (12).

Because π¯ˆ(D) is a consistent estimator of π¯, the error we make when approximating π¯ with π¯ˆ(D) decreases as we observe the spatiotemporal trajectory of the individual for a larger number of time periods Dmax. We define the last crossing time of the sequence of estimators {π¯ˆ(D):D=1,,Dmax} as follows:

LCTˆdist(γ)=maxD=1,,DmaxD:π¯ˆ(D)π¯ˆ(Dmax)1>γ, (14)

where v1 is the usual L1 norm for a vector v, i.e. v1=i|vi|. Note in Equation (14) we used the fact that π¯ˆ(D)1=1 for any D.

The last crossing time in Equation (14) is a measure of the temporal stability of the entire time period activity distribution π¯. Individuals that spend approximately the same amount of time in the same places in every time period need to be observed for a smaller number of time periods to calculate estimator π¯ˆ(D) with the same APE compared to individuals with heterogeneous mobility patterns that spend different amounts of times at locations that change substantially across time periods. Therefore, LCTˆdist(γ) will be smaller for individuals whose time period to time period mobility changes less, and larger for individuals with irregular mobility patterns.

The disadvantage of using the last crossing time in Equation (14) as a measure of temporal stability comes from the fact that it gives the same weight to the error made when estimating the proportion of time spent in grid cells in which an individual spends a lot of their time, and to the grid cells in which the individual rarely visits. The number of grid cells with a large proportion of time spent in them is likely significantly smaller than the total number of grid cells N because most people tend to spend time at their residence, to their work place and perhaps in a few other select locations. For this reason, the error made when estimating the proportion of time spent in grid cells with sparse presence could dominate the overall APE of π¯ˆ(D) and lead to larger values of LCTˆdist(γ). To remedy this issue, we define a new measure of temporal stability that focuses on the grid cells in which an individual spends larger proportions of time.

We define the ranking time period activity distribution r¯=(r¯1,,r¯N) associated with π¯ by replacing each component of π¯ with the sum of those components of π¯ that are no larger than that component, as follows [5]:

r¯j=l=1Nπ¯l1(π¯lπ¯j),for j=1,,N. (15)

The α-level set (α[0,1]) of r¯ is defined to consist of all the grid cells whose corresponding components in r¯ exceed α:

Lα={Gj:r¯jα}. (16)

It turns out that the α-level set covers grid cells whose total sum of components of π¯ is larger than 1α:

GjLαπ¯j1α.

Levels sets have an easy to understand interpretation: for a given level α, say α=0.7, all the grid cells with a ranking time period activity distribution above 0.7 will jointly cover at least (10.7)100=30% of the time in the time period. Values of α closer to 1 lead to level sets Lα with a smaller coverage that comprise only the grid cells in which the individual spends the largest amounts of time. Values of α close to 0 lead to level sets Lα with a larger coverage that comprise the majority of grid cells the individual spent time in.

Let r¯ˆ(D) be the ranking distribution of the estimator π¯ˆ(D) of π¯ in Equation (13), and Lα(D) be the α-level set associated with r¯ˆ(D) as in Equation (16). Given a level α[0,1] and a stability threshold γ>0, we define the last crossing time of the sequence of level sets {Lα(D):D=1,,Dmax} as follows:

LCTˆlevel,α(γ)=maxD=1,,DmaxD:Lα(D)Lα(Dmax)Lα(Dmax)>γ, (17)

where denotes the symmetric difference of two sets, and denotes the number of elements in a set.

The LCT of the level sets from Equation (17) is a measure of temporal stability of the time period activity distribution π¯ that takes into account only the error made when estimating the time spent in the grid cells in which an individual spent most of their time. For the same value of γ, LCTˆlevel,α(γ) is decreasing as the level α is increasing.

3. Application

The data we analyze come from Nokia's Mobile Data Challenge (MDC) [11,14,15]. This was a mobile computing research initiative focusing on generating a deeper scientific understanding of social and behavioral patterns related to mobile technologies. The study took place in Switzerland and collected various types of longitudinal information including time stamped GPS data from the cell phones of 185 study participants over the course of 18 months. Demographic data such as age and sex are also available. There are approximately 57.5 million GPS location records. The average length of observation for study participants was about 55 weeks. These data are publicly available upon request from the Idiap Research Institute.

Most activities of daily living of the study participants took place in a rectangular area that we partitioned into 40002 square grid cells with sides of length 28 m. The locations that do not belong to this spatial observation window were dropped. These locations typically correspond with longer trips took by study participants away from their places of residency. Figure 2 displays summaries of the GPS locations that fall in our chosen spatial observation window.

Figure 2.

Figure 2.

Summary information of the GPS location data. Left panel: histogram of the total length of observation for each study participant expressed in weeks. Right panel: histogram of the average number of GPS locations per week for each study participant.

For each study participant, we calculated three measures of temporal stability of their mobility patterns: the last crossing time of the average velocity (LCT-velocity) as defined in Equations (5) and (7), the last crossing time of the activity distribution (LCT-distribution) as defined in Equation (14), and the last crossing time of the level sets of the weekly activity distribution as defined in Equation (17). In the calculation of LCT-distribution and LCT-level set, we used the ordinary proportional time estimator defined in Equation (11). We chose to use the ordinary proportional time estimator over the conservative proportional time estimator because the conservative proportional time estimator disregards the pairs of consecutive time points that are located in different grid cells. The conservative proportional time estimator would most likely yield a smaller sample size compared to the ordinary proportional time estimator. We used α=0.2 in the determination of level sets, and γ=0.2 as the stability threshold for all three measures. The results are summarized in Table 1.

Table 1. Means, medians and sample standard deviations of three measures of temporal stability of mobility patterns.

Mobility measure Mean Median St. dev.
LCT-velocity 30.04 26 17.29
LCT-distribution 37.18 37 16.06
LCT-level set (α=0.2) 17.69 17 9.50

The unit of time is weeks.

About 30 weeks of observation is needed until the mobility patterns stabilize according to the LCT-velocity measure. A longer period of time, 37 weeks, is needed until the weekly activity distribution stabilizes. The increased length of the period of observation for this measure is not surprising since it is based on an estimated of the full weekly activity distribution in N=40002 grid cells. About half of this observation time (18 weeks) is needed to obtain estimates of the 0.2-level set of the weekly activity distribution which comprise the grid cells in which the study participants spend 80% of their weekly time.

We exemplify how the α-level set Lα from Equation (16) and its corresponding LCT-level set LCTˆlevel,α(0.2) from Equation (17) change for different values of α[0,1]. To this end, we define an adjacency graph Ggrid whose vertices are the N=40002 grid cells in the spatial observation window. Two grid cells are connected by an edge in Ggrid if they share an edge or a corner in their arrangement in the spatial observation window [3,25]. We denote by Ggrid(Lα) the subgraph of Ggrid defined by the grid cells in Lα. We chose a study participant and determined the level set Lα, the last crossing time LCTˆlevel,α(0.2) and the number of connected components of Ggrid(Lα) for α{0.1,0.2,,1} – see Figure 3. For smaller values of α, Lα contains grid cells in which the study participant spend the largest proportion of time. When α{0.1,0.2,0.3,0.4}, Ggrid(Lα) has one connected component which implies that the grid cells that belong to Lα are spatially adjacent and define a single area in which the study participant spends larger amounts of time. The corresponding values of LCTˆlevel,α(γ) are less than 20 weeks which represents the length of observation time needed for reliably detecting this spatial area. For α{0.5,0.6}, Ggrid(Lα) has two connected components, and for α{0.7,0.8}, Ggrid(Lα) has three connected components. Thus, this study participant spends their time in grid cells that define two or three spatially contiguous areas. Since these areas include grid cells in which the study participant spends smaller proportions of their weekly time, the length of the observation time needed to identify these areas doubles to about 40 weeks. For α=1, Ggrid(Lα) has 72 connected components because Lα includes grid cells in which the study participant spends very little time. Figure 3 shows that approximately 70 weeks of observation time are needed to detect these grid cells. The same type of plots constructed for other study participants show similar relationships between α, Lα, and LCTˆlevel,α(0.2).

Figure 3.

Figure 3.

Values of the LCT-level sets LCTˆlevel,α(0.2) for α{0.1,0.2,,1} for an MDC study participant. The unit of time is weeks. The number of connected components of Ggrid(Lα) defined by the α-level sets Lα is shown above the curve.

Next, we want to determine whether the temporal stability of activity distributions varies by the demographic characteristics of the population. We group the study participants by sex (male, female) and age group (young age 15–34 years old, middle age 35–54 years old, and old age 55 years old). For each of these five demographic groups, we calculated the average of the last crossing times of the activity distribution LCTˆlevel,α(0.2) for every α{0.1,0.2,,1}. The resulting curves are presented in Figure 4. The last crossing times at all levels are similar for men and women (see the top left panel). As such, there do not seem to be any sex-based differences in the temporal stability of men and women who live in Switzerland. However, since Switzerland is known to be a country with very high equality between the two sexes, this finding might not extend to other countries with profound sex inequality.

Figure 4.

Figure 4.

Mean values and 90% confidence intervals of the LCT-level sets LCTˆlevel,α(0.2) for α{0.1,0.2,,1} calculated for five demographic groups: sex (male, female), and age (young, middle, old).

In the top right and bottom panels of Figure 4, we find evidence that the average last crossing times decrease with age especially for levels below 0.5. This means that mobility patterns are more regular, and consequently are more temporally stable for older study participants compared to younger study participants. The average last crossing times are larger and become very similar across demographic groups for levels above 0.5 compared to smaller levels below 0.5. Thus, study participants that belong to any of the five demographic groups tend to visit locations they do not typically visit. Longer observation periods are needed to successfully determine these locations. Nevertheless, in order to identify the areas in which study participants spend most of their time, Figure 4 suggests that 10 weeks of observation of GPS locations should suffice for individuals older than 55. Middle age individuals require about 15 weeks of observation time, while young individuals require about 20 weeks.

4. Discussion

The contribution we made in this paper is two fold. On the theoretical side, we proposed the use of last crossing time processes associated with spatiotemporal trajectories of individuals to assess the temporal stability of their mobility patterns. We defined several measures of the temporal dynamics of spatiotemporal trajectories based on the average velocity process, and on human activity distributions in a spatial observation window. We defined the ordinary and the conservative proportional time estimators of human activity distributions and proved that they are consistent and asymptotically equivalent. We introduced the time period and the ranking time period activity distributions that capture the change in human activity distributions across time periods. We presented related estimators based on GPS location data.

On the empirical side, we analyzed GPS location data collected over a period of 18 months. The previous empirical study [29] that focused on assessing the duration of GPS studies is based on data collected over 30 days. By using our new statistical methods and GPS data collected over a much longer period of time, we determined that GPS monitoring needs to be done for at least 15 weeks which represents a minimum study duration about seven times longer than the 14 days minimum duration recommended in [29]. We also put forward the idea that the duration of GPS studies should be assessed by demographic groups. We determined that younger population groups should be monitored for longer periods of time compared to middle age population groups because of their more irregular patterns of mobility. On the other hand, shorter monitoring periods might be needed for older population groups that exhibit mobility patterns that are temporally more stable. We also suggest using our methods to assess the need for different time spans of GPS monitoring for men and women in countries with a known history of inequality between the two sexes. To the best of our knowledge, differential periods of GPS data collection based on demographic groups have not been discussed before. Our work suggests that GPS study designs should take demographic groups into account.

Acknowledgments

Portions of the research in this paper used the MDC Database made available by Idiap Research Institute, Switzerland and owned by Nokia.

Appendix. Proofs of theoretical results.

A.1. Proof of Theorem 2.1

Proof.

We note that the ordinary proportional time estimator in Equation (11) can be written as

πˆo,j=12i=2n1(ti+1ti1)1(gi=Gj)12(T+tnt1), (A1)

where T=tmaxtmin. We will first show that the denominators of πˆo,j and πˆc,j are asymptotically the same. Assumption (S2) implies that 12(T+tnt1)T, which shows the asymptotic behavior of the denominator of πˆo,j. For πˆc,j, we have

i=2n(titi1)1(gi=gi1)=i=2n(titi1)i=2n(titi1)1(gigi1),=Ti=2n(titi1)1(gigi1),TMmaxi|ti+1ti|,T,

where M is the constant from assumption (S3). The limit in the above equation is due to assumption (S1). Thus, the denominators of πˆo,j and πˆc,j are asymptotically the same. Next we focus on the numerators of the two estimators.

The numerator of πˆc,j can be written as

i=2n(ti+1ti)1(gi+1=gi=Gj)=i=2nAi,

where Ai=(ti+1ti)1(gi+1=gi=Gj). Let Bi=((ti+1ti1)/2)1(gi=Gj). Using Equation (A1), the numerator of πˆo,j can be written as

12i=2n1(ti+1ti1)1(gi=Gj)=i=2n1Bi.

When gi1=gi=gi+1=Gj, we have 2Bi=Ai+Ai1. By assumption (S3), there are at most 2M number of time points ti such that the equality gi1=gi=gi+1=Gj does not hold. Thus

i=2n1Bi1(gi1=gi=gi+1=Gj)i=2n1Bi2Mmaxi|ti+1ti|,

which implies that

πˆo,j1Ti=2n1Bi1(gi1=gi=gi+1=Gj),=1Ti=2n1Ai+Ai121(gi1=gi=gi+1=Gj). (A2)

Again, using the fact that there are at most 2M number of time points ti such that the equality gi1=gi=gi+1=Gj does not hold, we obtain

i=2n1Ai1(gi1=gi=gi+1=Gj)i=2nAi(2M+1)maxi|ti+1ti|,i=2n1Ai11(gi1=gi=gi+1=Gj)i=2nAi(2M+1)maxi|ti+1ti|.

It follows that

πˆc,j=i=2n(titi1)1(gi=gi1=Gj)i=2n(titi1)1(gi=gi1)1Ti=2nAi,1Ti=2n1Ai+Ai121(gi1=gi=gi+1=Gj),

which is the same limit in Equation (A2) we obtained for πˆo,j. Therefore the numerators of πˆo,j and πˆc,j are asymptotically the same, which proves that πˆo,j and πˆc,j are asymptotically equal.

A.2. Proof of Theorem 2.2

Proof.

Theorem 2.1 proves that the two estimators are asymptotically equivalent. Thus, we only need to derive the convergence of one of the two estimators to the true activity distribution π=(π1,,πN) from Equation (9). In what follows we focus on the conservative proportional time estimator.

Without loss of generality, we assume that there exist K1 disjoint time intervals in which the individual is inside grid cell Gj, i.e. there are [a1,b1],,[aK,bK] such that ai<bi<ai+1 for i=1,,K1, tmina1, bKtmax and

{t:G(t)Gj}=[a1,b1][aK,bK].

Since, in the definition of the true activity distribution π, T follows a uniform distribution on the reference time frame [tmin,tmax], we can express πj as

πj=P(G(T)Gj)=k=1KP(T[ak,bk])=1Tk=1K(bkak).

As before, T=tmaxtmin.

For the interval [ak,bk], we let ti be the first observation time after ak, and ti be the last observation time before bk:

tiak,ti1<ak,ti+1>bk,tibk.

Because G(t)Gj for all t[ak,bk], we have giGj for all i{i,i+1,,i}. The conservative proportional time estimator estimates the length of the interval [ak,bk] based on the length of the interval [ti,ti]. The corresponding error is

|(bkak)(titi)|tiak+bkti,(titi1)+(ti+1ti),2maxi=1,,n1|ti+1ti|0,

due to assumption (S1).

By applying the above argument to each interval [ak,bk], k=1,,K, we conclude that

i=2n(titi1)1(gi=Gj)k=1K(bkak).

Because

i=2n(titi1)1(gi=Gj)i=2n(titi1)1(gi=gi1=Gj)Mmaxi=1,,n1|ti+1ti|,

we further conclude that

i=2n(titi1)1(gi=gi1=Gj)k=1K(bkak).

This proves the convergence of the conservative proportional estimator to the true activity distribution:

πˆc,ji=2n(titi1)1(gi=gi1=Gj)T,k=1K(bkak)T,=πj.

Funding Statement

The work of Z.D. and A.D. was partially supported by the National Science Foundation [grant number DMS/MPS-1737746] to University of Washington. Y.C. received partial support from the National Science Foundation [grant number DMS-1810960] and National Institutes of Health [grant number U01-AG016976]. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

  • 1.Basta L.A., Richmond T.S., and Wiebe D.J., Neighborhoods, daily activities, and measuring health risks experienced in urban environments, Soc. Sci. Med. 71 (2010), pp. 1943–1950. doi: 10.1016/j.socscimed.2010.09.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Berrigan D., Hipp J.A., Hurvitz P.M., James P., Jankowska M.M., Kerr J., Laden F., Leonard T., McKinnon R.A., Powell-Wiley T.M., Tarlov E., Zenk S.N., and The TREC Spatial and Contextual, Measures and Modeling Work Group , Geospatial and contextual approaches to energy balance and health, Ann. GIS 21 (2015), pp. 157–168. doi: 10.1080/19475683.2015.1019925 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Bivand R.S., Pebesma E., and Gómez-Rubio V., Applied Spatial Data Analysis with R, Springer, New York, 2013. [Google Scholar]
  • 4.Byrnes H., Miller B.A., Morrison C.N., Wiebe D.J., Woychik M., and Wiehe S.E., Association of environmental indicators with teen alcohol use and problem behavior: Teens' observations vs. objectively-measured indicators, Health and Place 43 (2017), pp. 151–157. doi: 10.1016/j.healthplace.2016.12.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chen Y.C., Generalized cluster trees and singular measures, Ann. Stat. 47 (2019), pp. 2174–2203. doi: 10.1214/18-AOS1744 [DOI] [Google Scholar]
  • 6.Courant R. and John F., Introduction to Calculus and Analysis, Vol. I, Springer, New York, 1991. [Google Scholar]
  • 7.Dobra A. and Williams N.E., Spatiotemporal detection of unusual human population behavior using mobile phone data, PLoS ONE 10 (2015), p. e0120449. doi: 10.1371/journal.pone.0120449 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Duncan D.T., Hickson D.A., Goedel W.C., Callander D., Brooks B., Chen Y.T., Hanson H., Eavou R., Khanna A.S., Chaix B., Regan S., Wheeler D.P., Mayer K.H., Safren S.A., Carr M.S., Draper C., Magee-Jackson V., Brewer R., and Schneider J.A., The social context of HIV prevention and care among black men who have sex with men in three U.S. cities: The neighborhoods and networks (N2) cohort study, Int. J. Environ. Res. Public Health 16 (2019), p. 1922. doi: 10.3390/ijerph16111922 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Elgethun K., Yost M.G., Fitzpatrick C.T., Nyerges T.L., and Fenske R.A., Comparison of Global Positioning System (GPS) tracking and parent-report diaries to characterize children's time–location patterns, J. Expo. Sci. Env. Epid. 17 (2007), pp. 196–206. doi: 10.1038/sj.jes.7500496 [DOI] [PubMed] [Google Scholar]
  • 10.Entwisle B., Putting people into place, Demography 44 (2007), pp. 687–703. doi: 10.1353/dem.2007.0045 [DOI] [PubMed] [Google Scholar]
  • 11.Kiukkonen N., Blom J., Dousse O., Gatica-Perez D., and Laurila J., Towards rich mobile phone datasets: Lausanne data collection campaign, Proc. ACM Int. Conf. on Pervasive Services (ICPS), Berlin, July 2010.
  • 12.Kwan M.P., The uncertain geographic context problem, Ann. Assoc. Am. Geogr. 102 (2012), pp. 958–968. doi: 10.1080/00045608.2012.687349 [DOI] [Google Scholar]
  • 13.Kwan M.P., Beyond space (as we knew it): Toward temporally integrated geographies of segregation, health, and accessibility, Ann. Assoc. Am. Geogr. 103 (2013), pp. 1078–1086. doi: 10.1080/00045608.2013.792177 [DOI] [Google Scholar]
  • 14.Laurila J.K., Gatica-Perez D., Aad I., Blom J., Bornet O., Do T., Dousse O., Eberle J., and Miettinen M., The mobile data challenge: Big data for mobile computing research, Proc. Mobile Data Challenge Workshop (MDC) in Conjunction with Int. Conf. on Pervasive Computing, Newcastle, June 2012.
  • 15.Laurila J.K., Gatica-Perez D., Aad I., Blom J., Bornet O., Do T.M.T., Dousse O., Eberle J., and Miettinen M., From big smartphone data to worldwide research: The mobile data challenge, Pervasive Mob. Comput. 9 (2013), pp. 752–771. doi: 10.1016/j.pmcj.2013.07.014 [DOI] [Google Scholar]
  • 16.Lee J.H., Davis A.W., Yoon S.Y., and Goulias K.G., Activity space estimation with longitudinal observations of social media data, Transportation 43 (2016), pp. 955–977. doi: 10.1007/s11116-016-9719-1 [DOI] [Google Scholar]
  • 17.Matthews S.A. and Yang T.C., Spatial polygamy and contextual exposures (SPACEs): Promoting activity space approaches in research on place and health, Am. Behav. Sci. 57 (2013), pp. 1057–1081. doi: 10.1177/0002764213487345 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Mazimpaka J.D. and Timpf S., Trajectory data mining: A review of methods and applications, J. Spatial Inform. Sci. 13 (2016), pp. 61–99. [Google Scholar]
  • 19.Morrison C.N., Byrnes H.F., Miller B.A., Kaner E., Wiehe S.E., Ponicki W.R., and Wiebe D., Assessing individuals' exposure to environmental conditions using residence-based measures, activity location-based measures, and activity path-based measures, Epidemiology 30 (2019), pp. 166–176. doi: 10.1097/EDE.0000000000000940 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Perchoux C., Chaix B., Cummins S., and Kestens Y., Conceptualization and measurement of environmental exposure in epidemiology: Accounting for activity space related to daily mobility, Health and Place 21 (2013), pp. 86–93. doi: 10.1016/j.healthplace.2013.01.005 [DOI] [PubMed] [Google Scholar]
  • 21.Richardson D.B., Volkow N.D., Kwan M.P., Kaplan R.M., Goodchild M.F., and Croyle R.T., Spatial turn in health research, Science 339 (2013), pp. 1390–1392. doi: 10.1126/science.1232257 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Šimon M., Vašát P., Daňková H., Gibas P., and Poláková M., Mobilities and commons unseen: Spatial mobility in homeless people explored through the analysis of GPS tracking data, GeoJournal (2019). doi: 10.1007/s10708-019-10030-4. [DOI] [Google Scholar]
  • 23.Šimon M., Vašát P., Poláková M., Gibas P., and Daňková H., Activity spaces of homeless men and women measured by gps tracking data: A comparative analysis of Prague and Pilsen, Cities 86 (2019), pp. 145–153. doi: 10.1016/j.cities.2018.09.011 [DOI] [Google Scholar]
  • 24.Vazquez-Prokopec G.M., Bisanzio D., Stoddard S.T., Paz-Soldan V., Morrison A.C., Elder J.P., Ramirez-Paredes J., Halsey E.S., Kochel T.J., Scott T.W., and Kitron U., Using GPS technology to quantify human mobility, dynamic contacts and infectious disease dynamics in a resource-poor urban environment, PLoS ONE 8 (2013), p. e58802. doi: 10.1371/journal.pone.0058802 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Waller L.A. and Gotway C.A., Applied Spatial Statistics for Public Health Data, John Wiley & Sons, Hoboken, NJ, 2004. [Google Scholar]
  • 26.Wasserman L., All of Nonparametric Statistics, Springer Texts in Statistics, Springer, New York, 2007. [Google Scholar]
  • 27.Wiehe S.E., Carroll A.E., Liu G.C., Haberkorn K.L., Hoch S.C., Wilson J.S., and Fortenberry J.D., Using GPS-enabled cell phones to track the travel patterns of adolescents, Int. J. Health. Geogr. 7 (2008), pp. 22–22. doi: 10.1186/1476-072X-7-22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wiehe S.E., Kwan M.P., Wilson J., and Fortenberry J.D., Adolescent health-risk behavior and community disorder, PLoS ONE 8 (2013), p. e77667. doi: 10.1371/journal.pone.0077667 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Zenk S.N., Matthews S.A., Kraft A.N., and Jones K.K., How many days of Global Positioning System (GPS) monitoring do you need to measure activity space environments in health research?, Health and Place 51 (2018), pp. 52–60. doi: 10.1016/j.healthplace.2018.02.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Zenk S.N., Schulz A.J., Matthews S.A., Odoms-Young A., Wilbur J., Wegrzyn L., Gibbs K., Braunschweig C., and Stokes C., Activity space environment and dietary and physical activity behaviors: A pilot study, Health and Place 17 (2011), pp. 1150–1161. doi: 10.1016/j.healthplace.2011.05.001 [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES