Skip to main content
IEEE - PMC COVID-19 Collection logoLink to IEEE - PMC COVID-19 Collection
. 2021 Mar 17;5(3):321–331. doi: 10.1109/TETCI.2021.3059007

Quantifying Mobility and Mixing Propensity in the Spatiotemporal Context of a Pandemic Spread

Satyaki Roy 1,, Preetom Biswas 2, Preetam Ghosh 3
PMCID: PMC8545005  PMID: 36694698

Abstract

COVID-19 is the most acute global public health crisis of this century. Current trends in the global infected and death numbers suggest that human mobility leading to high social mixing are key players in infection spread, making it imperative to incorporate the spatiotemporal and mobility contexts to future prediction models. In this work, we present a generalized spatiotemporal model that quantifies the role of human social mixing propensity and mobility in pandemic spread through a composite latent factor. The proposed model calculates the exposed population count by utilizing a nonlinear least-squares optimization that exploits the intrinsic linearity in SEIR (Susceptible, Exposed, Infectious, or Recovered). We also present inverse coefficient of variation of the daily exposed curve as a measure for infection duration and spread. We carry out experiments on the mobility and COVID-19 infected and death curves of New York City to show that boroughs with high inter-zone mobility indeed exhibit synchronicity in peaks of the daily exposed curve as well as similar social mixing patterns. Furthermore, we demonstrate that several nations with high inverse coefficient of variations in daily exposed numbers are amongst the worst COVID-19 affected places. Our insights on the effects of lockdown on human mobility motivate future research in the identification of hotspots, design of intelligent mobility strategies and quarantine procedures to curb infection spread.

Keywords: Human mobility policies, lockdown, optimization, social mixing, spatiotemporal model

I. Introduction

The scourge of epidemics and pandemics has been a part of human history since time immemorial. Considering the past millennium alone, right from as early as 1317, innumerable outbreaks such as plague, flu and Ebola have globally claimed millions of lives [1]. The latest addition to the list of outbreaks is COVID-19, which, since its inception in China in December 2019, has brought the world to a veritable standstill. COVID-19 has followed a similar course like the plague, flu and Ebola and claimed over 2 million lives globally as of January 2021, while its severity continues to burgeon in the US, U.K., Brazil and parts of Asia [2] with a sizable number still projected to die in the subsequent waves of this pandemic.

Most countries reacted poorly to the looming dangers of COVID-19. In the absence of credible vaccination treatment [3], social distancing and ensuing lockdown are threatening to bring the global economy to a halt. There has been a drop in industrial productivity, stock exchange percentage and increase in the price of goods [4] as well as a potential contraction in US GDP [5]. The world is on the brink of a COVID-19-induced recession (with as many as 709 000 seeking unemployment aid in the US alone [6]). Nations are now willing to relax the lockdown to mitigate economic losses [7]. Research on clinical, epidemiological or socioeconomic implications of COVID-19 is stymied by the absence of prior knowledge [3], [8]. The reliability of the data is challenged by high variability in testing and surveillance or contact tracing-based detection of the prospective infected population. Finally, logistical factors such as dearth of or the accuracy in testing, reluctance in reporting death and recovery [9] and dubious information in print and social media [10] further misguide precautionary and mitigation measures. It is known that human mobility across neighboring geographic regions (such as imports of international travellers [11]) leading to high social contact is the primary mode for spread, yet there exists no model that quantifies the joint effect of human mobility and social mixing in the spatiotemporal dynamics of pandemic spread.

Epidemiological models such as SEIR, SIR, SEIRD, SEIRS (susceptible, exposed, infected (or infectious), recovered, or death), etc. and their variants have been employed to study the spread of infection [12], [13]. As per the susceptible exposed infected recovered (SEIR) model, the susceptible (S) class comprises individuals who are not exposed to the infection. Once exposed to an infected individual, the susceptible may transfer to the exposed (E) category. The E class represents asymptomatic or untested individuals, who transition to the (tested) infected (I). Individuals in Inline graphic transition to the recovered (R) categories [14]. There is also the non-epidemiological modeling analysis proposed by the Institute for Health Metrics and Evaluation (IHME) [15] squarely on the basis of mortality rates. Both of these techniques have their shortcomings. With regard to stochastic epidemiological models, combinations of state transition parameters can show a good fit with the training data but yield disparate model predictions – a problem defined as parameter identifiability [16]. On the other hand, increasing efforts to trace the passage of infection from mobility statistics [17] militates against the efficacy of the IHME model that assumes that the mortality rate follows a normal distribution, while not factoring in effects of transmission dynamics nor contact patterns on spread.

Given the lack of prior knowledge on dealing with a public health crisis of this scale, the policymakers are ill-equipped to design mitigation strategies. To bridge the gap, the research community of epidemiologists, clinicians and computer scientists are applying their expertise to seek out factors and their effects on infection spread as well as the impending economic crash [3]. First, machine learning (ML) is helping build prediction models on epidemiological and clinical data. Given existing clinical data, prediction models [18] and therapeutic approaches can help identify vulnerable groups [19], [20]. Epidemiologists are trying to identify spread dynamics of COVID-19. Inga Holmdahl et al. [21] analyze the pros and cons of forecasting models that make predictions through curve fitting or mechanistic models, while supervised and unsupervised ML is helping trace the trends in infection dynamics [22]. Khan et al. performed regression analysis, cluster analysis and principal component analysis on Worldometer infection count data to gauge the variability and effect of testing in prediction of confirmed cases [4]. Roy et al. used regression analysis to pinpoint pre-lockdown factors that affect post-lockdown pandemic numbers [23].

Modeling COVID-19 using SEIR: There have been efforts to employ SEIR to study the effects of demography, immunity and social distancing on infection spread. He et al. employed the particle swarm optimization on the COVID-19 data of Hubei province of China to calculate the parameters of the SEIR model. They discuss how these parameters can vary with demography [24]. Pandey et al. utilized the SEIR model with regression on the COVID-19 data of India collected by Johns Hopkins University in the interval of 30th January to 30th March, 2020 to show the reproduction number to be approximately 2 [25]. Yang et al. applied artificial intelligence on the COVID-19 data of Hubei, China into the SEIR model to estimate the date when the infection peaks. They also predicted how the quarantine will affect the dynamics of contagion [12]. Annas et al. calculated the parameters of SEIR model by incorporating the factors of vaccination and isolation. They applied the model on the COVID-19 data of Indonesia to study the long-term effects of vaccine and isolation on curbing spread [26]. Radulaescu et al. adapted SEIR to study spread dynamics in an age-heterogeneous scenario. As a case study, they simulate a small community in New York and assess the effects of control measures such as restricted mobility, social distancing and lockdown [27]. Iwata performed simulation using the SEIR model to predict the effect of secondary outbreak in a community outside China. They demonstrate that the timing of hospital visits may affect the outbreak [28]. Mwalili et al. applied the modified SEIR model to study the effect of pathogens and intervention measures on disease spread. They discuss the ill-effects of flouting social distancing and basic hygiene measures on COVID-19 [29]. Tang et al. adapted the SEIR model to incorporate the assumption that the infected person may act as a vector of infection during the incubation period. They use the model to make recommendations and prediction of the disease spread [30]. Lopez et al. utilize the SEIR epidemic model to study the consequence of quarantine on the population of Spain and Italy. They show that isolation can help achieve a 10 times decline in disease spread. This has been corroborated by studying contagion in the pre- and post-COVID intervention in Italy [31].

Contributions: In this paper, we make three major contributions. First, we introduce a generalized spatiotemporal framework, the first of its kind, that quantifies the components affecting infection spread through a latent factor. Specifically, this latent factor is a metric quantifying the joint influence of human mobility and social mixing on the exposure to an infection (see Fig. 1). We demonstrate its efficacy by employing this spatiotemporal model on New York City mobility traces and COVID-19 data trends. Second, we argue that the extent and spread of infection can be gauged in terms of the projected exposed (i.e., asymptomatic individuals) numbers, instead of the infected and mortality count (that has been deemed a reliable measure for the extent of infection spread for a geographical region [15]). Third, we adapt a well-studied measure for dispersion in statistical distribution, called coefficient of variation, as a measure to quantify the potential for infection spread and duration, and demonstrate that nations with a high inverse of the coefficient of variation in daily exposed numbers are amongst the most COVID-19 affected. Finally, we discuss how the proposed spatiotemporal model can identify pandemic hotspots as well as the ideal time and extent of lockdowns to minimize contact during a pandemic. The exposed population of a region is an input to the spatiotemporal model that estimates latent factors. The proposed approach employs a nonlinear least-squares optimization to infer the daily exposed numbers. It incorporates an exposed to infection transition step of the complete SEIR (i.e., S Inline graphic E Inline graphic I Inline graphic R). It is important to mention here that the stated optimization is just one approach to gauge the exposed numbers and that the spatiotemporal model will work seamlessly for the exposed estimates using other approaches as well.

Fig. 1.

Fig. 1.

Contributions of this work. First, we present an optimization that employs the daily infected (I) to infer the daily exposed (E) numbers of a region. Second, we utilize E, in combination with the mobility pattern (obtained from real human mobility traces), to calculate the latent factors for infection spread that quantifies mobility and social mixing.

This paper is organized as follows. Section II introduces the major contributions of this work, namely, the optimization to estimate exposed, spatiotemporal model and inverse coefficient of variation to quantify spread. Section III presents the experimental results on traffic and COVID-19 data from New York City and the world. Finally, Section IV draws the conclusions.

II. Approach

Susceptible Exposed Infected Recovered Death model: In the SEIR model [14], the susceptible (S) class comprises individuals who are not exposed to infection. Once exposed to infected individuals, they may transfer to the exposed (E) category. E class are asymptomatic or untested individuals, who transition to the (tested) infected (I). The individuals in Inline graphic transition to either recovered (R) (or dead) (Fig. 2).

Fig. 2.

Fig. 2.

State transitions in the SEIR model are shown in black arrows. The optimization to calculate the daily exposed from infected, by calculating the fraction and duration of transition from E to I (Inline graphic, respectively) is highlighted in red.

Estimation of daily exposed: We discuss the optimization that utilizes the daily infected numbers to estimate the daily exposed numbers (see Fig. 2). This is based on the SEIR model that states that a fraction of susceptible individuals transition to exposed on contact with infected state, while a fraction (say, Inline graphic) over time (say, Inline graphic days) transfer to infected. We estimate Inline graphic by assuming that a mean fraction (Inline graphic) of Inline graphic transition to the infected category in mean duration Inline graphic time. We minimize average squared error between the fraction of the predicted daily exposed at time Inline graphic (i.e., Inline graphic) and infected population Inline graphic at time Inline graphic (i.e., Inline graphic).

II.

Ex. 1 ensures that the daily exposed curve scaled by a factor Inline graphic and shifted by Inline graphic days on the time axis is nearly identical (i.e., having low mean squared error) to the daily infected curve. Constraint 2 causes the incubation period Inline graphic and infection rate Inline graphic to be in range Inline graphic and Inline graphic, respectively. Finally, constraint 3 ensures that, given a place Inline graphic with population Inline graphic, the optimizer considers the upper bound for daily exposed Inline graphic to be a fraction, say Inline graphic, of Inline graphic. We illustrate an example in Fig. 3, where Inline graphic days and Inline graphic. Given a daily infected curve (shown in blue) that peaks on day 50, the optimizer should infer a daily exposed curve (shown in green) that shows a higher curve peaking at day Inline graphic.

Fig. 3.

Fig. 3.

Daily exposed curve (colored green) and daily infected curve (colored blue) for Inline graphic days and Inline graphic.

A. Inverse Coefficient of Variation

Coefficient of variation (CV) is a statistical measure of the variability of a distribution with respect to its mean. It was conceived to compare data varying in units, say the height of a child and an adult [32]. We posit that the inverse of CV (i.e., ICV) can be an effective measure for the potential threat posed by a pandemic in a geographical region. It is measured as Inline graphic, where Inline graphic and Inline graphic are the mean and standard deviation of an exposed curve, respectively. ICV was termed the reward-to-variability ratio by American economist and Nobel laureate William Sharpe and used to gauge the performance of mutual funds as a ratio between return on investment and market variability [33]. In the context of pandemic, ICV of the daily exposed curve quantifies the ratio between the potential for infection spread over time to its variability, suggesting that it can be an effective measure for the potential extent and duration of pandemic spread in any geographical region.

B. Spatiotemporal Modeling

We present a spatiotemporal model that helps to quantify the daily exposed numbers in terms of a latent factor combining social mixing and human mobility (refer Fig. 1). We discuss the preliminaries on matrix normalization as well as frequency and transition matrix before formalizing the model.

1). Column Normalization of a Matrix

Given any two-dimensional matrix Inline graphic, we define a left stochastic matrix (i.e., matrix with column summing to 1), as follows:

1).

2). Frequency and Transition Matrix

Given a geographical region with a set of geographical sub-regions (or zones) Inline graphic, the frequency matrix Inline graphic is created from the human mobility traces, where Inline graphic denotes the number of trips made from zone Inline graphic to Inline graphic. We generate a transition matrix Inline graphic performing column normalization of Inline graphic (as defined in Section II-B1). Each element of this matrix Inline graphic is the probability of making a trip from Inline graphic to Inline graphic. The frequency or transition matrix captures the overall mobility trends within and across zones of any given geographical region.

a) Quantifying trip count: In keeping with Markov chain, we calculate the c-th power of Inline graphic that represents the probability of transitioning from one zone to another in exactly Inline graphic trips [34]. Given that Inline graphic, the Inline graphic-th entry in Inline graphic raised to power 2 can be written as:

2).

If Inline graphic, Inline graphic and Inline graphic in (5), we obtain the likelihood for a trip from Inline graphic to Inline graphic in 2 hops (with Inline graphic (Inline graphic) as an intermediate stop). In (5), the term Inline graphic is the probability of traveling from Inline graphic to Inline graphic in the order Inline graphic; analogously, Inline graphic is the probability of commute in the following order Inline graphic, and so on.

Let us assume that an individual makes Inline graphic trips. We calculate the stochastic matrix corresponding to the inter- and intra-zone transition for less than or equal to Inline graphic trips (Inline graphic), where Inline graphic is defined as:

2).

We assume that each trip length is independent of another. In other words, an individual can independently choose to make Inline graphic, Inline graphic or more trips across different zones in a region.

3). Formal Definition

Let Inline graphic be the number of daily exposed individuals at zone Inline graphic at time Inline graphic and Inline graphic (i.e., Inline graphic for some Inline graphic) defined in Section II-B2) be the transition matrix. We define the relationship Inline graphic, written as:

3).
3).

Explanation: Recall from the discussion on the SEIR model in Section I, the susceptible (S) population contract the infection via contact with the infected (I) individuals. Inline graphic is the composite combination of mobility and mixing among the S and I population over time, and Inline graphic controls the extent of contact among S and I due to intra- and inter-zone mobility, resulting in the generation of the final matrix of exposed individuals over time Inline graphic. The other features are summarized below.

  • Latent factor Inline graphic is a unified metric for mobility and social mixing and an element Inline graphic can be calculated as:
    graphic file with name M90.gif
  • Recall that the frequency of trips from Inline graphic to Inline graphic may be inferred from element Inline graphic in frequency matrix Inline graphic (defined in Section II-B2). For each Inline graphic, we calculate the trip frequency factor (Inline graphic) as the total number of trips made from Inline graphic to all boroughs including itself (i.e., Inline graphic).

  • We posit that an element of the latent factor Inline graphic is a combination of the three factors of a borough Inline graphic: (a) frequency of trips made by Inline graphic (Inline graphic), (b) fraction of infected individuals in Inline graphic (Inline graphic), and (c) intra- and inter-borough mixing of Inline graphic. Thus, Inline graphic can be written as:
    graphic file with name M107.gif
    Here, Inline graphic is the long-term mean trip count made within and across boroughs and Inline graphic is the population of Inline graphic. The first term Inline graphic is a measure of expected number of trips starting at Inline graphic at time Inline graphic; the second term Inline graphic is the ratio between the number of infected people at time Inline graphic, Inline graphic, and the number of people in borough Inline graphic (barring cumulative recovered Inline graphic and dead Inline graphic). The third term Inline graphic is the mixing factors that account for several region-specific parameters, such as susceptible count, testing frequency, strain of infection, immunity acquired against infection, etc.

4). Modeling Lockdown

Lockdown is modeled as restricted mobility achieved by scaling down the frequency of trips made by borough Inline graphic (Inline graphic). Given a lockdown rate be Inline graphic, we achieve trip minimization, by simply scaling each element of latent factor matrix Inline graphic by Inline graphic, as shown in the equation below.

4).

In the above equation, Inline graphic is the scaled down frequency of trips. Note that the drop in exposed numbers is commensurate with the decrease in Inline graphic, since Inline graphic in (7) can be written as Inline graphic. However, the knowledge of the transition matrix Inline graphic and latent factor Inline graphic can allow us to devise more intelligent lockdown strategies. In the experimental results (Section III-C3) we consider a scenario where, instead of a uniform lockdown rate Inline graphic, lockdown levels can vary over time (i.e., Inline graphic). Since the magnitude of elements in Inline graphic and Inline graphic vary across boroughs and time and lockdown entails economic losses, it is possible to utilize the latent factor matrix to balance joint goals of minimizing exposure and economic losses.

5). Determination of Latent Factors

Given transition and exposed matrices Inline graphic and Inline graphic, we solve for the latent factor matrix Inline graphic, while constraining Inline graphic to be positive real numbers.

5).

C. Mean-Centered Cosine Similarity

We estimate the similarity between two vectors Inline graphic and Inline graphic using the cosine index of mean-centered vectors Inline graphic that measures the cosine of angle between vectors Inline graphic and Inline graphic as Inline graphic. Mean-centering is a standard practice in statistical models and data-driven recommendation systems [35], [36] that allows comparison of data with varying orders of magnitude.

III. Experimental Results

The results are classified into four subsections: (A) parameter identification and quantification of pandemic spread, (B) mobility patterns within and across zones in a region, (C) influence of mobility and spatiotemporal mixing on pandemic spread and (D) exploratory analysis. Simulation parameters are summarized in Table I. We consider the incubation period Inline graphic. Although symptoms show up in about 5 days after contact, symptoms have also been reported to appear as early 2 days after exposure [37]. For 10% of the population, the incubation period was longer than 2 weeks and, in a few cases, more than 20 days [38]. This period can potentially be extended due to delays and inaccuracies in testing.

TABLE I. List of Parameters and Their Values.

Parameter Notation Value
Upper bound for daily exposed (1) s 0.3
Lower bound of incubation period (Inline graphic) 2
Upper bound of incubation period (Inline graphic) 30
Savitzky-Golay window size - 31
Savitzky-Golay window order - 3
Average number of trips per day Inline graphic Inline graphic
Transition matrix threshold Inline graphic Inline graphic

Data collection: We discuss the NYC map and mobility traces and the (NYC and global) infected and death numbers.

1) Map Generation and Location Identification: The list of NYC boroughs and districts is extracted from Wikipedia [39], and the latitude and longitude of the 5 boroughs and 59 districts are taken from the Python library for geocoding services, called GeoPy [40]. The distance between any pair of points (i.e., boroughs or districts) on the NYC map is calculated using the geodesic distance function of GeoPy.

2) NYC Mobility Data: We source the mobility data of NYC traffic from NYCOpenData [41] – a data repository for fields ranging from city government, education, environment, health to public safety, recreation, social services and transportation. The stated data (spanning a period from 2014 to 2019), collected by the Department of Transportation of New York Metropolitan Transportation Council (NYMTC), has following fields: ID, road name, source and destination intersecting street name, compass direction, date and time. We use this data to calculate the transition matrix (see Section II-B) that captures the probability of travelling within and across boroughs.

3) Cumulative Daily Infected and Death for NYC: We collect COVID-19 daily infected numbers from the website of the NYC Department of Health and Mental Hygiene repository [42] that contains the data on Coronavirus Disease 2019 (COVID-19) in New York City (NYC). The data spans a period starting March, 2020 (which happens to be the date of first documented laboratory-confirmed cases) to November 2020.

4) Global Cumulative Daily Infected and Death: The time-series data of the world daily infected and death numbers is sourced from the World Health Organization, over a period spanning January 03, 2020 - October 23, 2020 [43].

A. Parameter Identification and Spread Quantification

We estimate the zone-specific parameters of infection spread (i.e., Inline graphic and Inline graphic) for countries and quantify the duration and spread of infection using the inverse coefficient of variation.

1). Effect of Inline graphic and Inline graphic

For a fixed infected curve (black curve), we study the variation in exposed curve with varying rate parameters Inline graphic controlling the fraction of population transitioning from exposed to infected and delay parameter (in days) Inline graphic (Fig. 4). For Inline graphic and Inline graphic, the smallest and largest fraction of exposed individuals (shown in green) transition to infected, while Inline graphic and Inline graphic (red curve) cause the lowest and highest delay in exposed to infected transition respectively.

Fig. 4.

Fig. 4.

Predicted exposed for varying Inline graphic and Inline graphic values.

2). Spatial Context in Global Infection Spread

We utilize the global COVID-19 infected and death numbers (discussed in Section III-4) to estimate Inline graphic, Inline graphic values as well as the daily exposed and recovered numbers (as per the optimization discussed in Section II). Parameters for the select 20 countries are enlisted in Table II. Fig. 5 depicts each country in a different color and the day in the observed 300-day period when its projected daily exposed numbers peak. There are considerable variations in daily infected (and consequently exposed) numbers, as illustrated by the exposed curves of China and USA in Fig. 6(a). It is worth noting that several countries in close proximity, such as (Group 1) Iran, Iraq UAE and India (shown in red dotted circle) and (Group 2) Italy, Belgium, Germany, Austria and Romania (shown in blue dotted circle) peak nearly at the same time (see Fig. 5), alluding to the fact that mobility across neighboring zones oftentimes plays a role in a pandemic spread and affect the timing of exposed (and infected) peaks.

TABLE II. Optimization Parameters Inline graphic and Inline graphic Corresponding to the Different Countries, Along With Goodness of Fit Inline graphic Score.
Country Inline graphic Inline graphic
Algeria Inline graphic 0.99
Argentina Inline graphic 1.0
Austria Inline graphic 0.99
Belgium Inline graphic 1.0
Chile 14,0.69 0.99
China Inline graphic 0.99
Ecuador Inline graphic 0.76
Germany Inline graphic 1.0
India Inline graphic 0.99
Iran Inline graphic 0.99
Iraq Inline graphic 1.0
Italy Inline graphic 0.99
Japan Inline graphic 1.0
New Zealand Inline graphic 0.99
Romania Inline graphic 0.99
Russia Inline graphic 0.99
Spain Inline graphic 0.99
Turkey Inline graphic 0.99
UAE Inline graphic 1.0
USA Inline graphic 1.0
Fig. 5.

Fig. 5.

Days for the exposed curve to peak for 20 countries, where each country is shown in a different color and annotated by the day in the observed 300-day period when its projected daily exposed numbers peak. There are two groups of countries (marked in red and blue dotted circles, respectively) in close proximity where the exposed numbers peak at the same time.

Fig. 6.

Fig. 6.

Quantifying infection. (a) daily exposed of the two countries with the highest ICV (USA and Iran) and lowest ICV (China and New Zealand) smoothed using Savitzky-Golay filter, (b) inverse coefficient of variation for 20 countries.

3). Quantification of Infection Spread

In addition to high population density, variations in Inline graphic and Inline graphic affect the extent and rate of transition from exposed to infected states. We attempt to quantify this dynamic of spread using the inverse coefficient of variation (ICV) (defined in Section II-A) of the exposed curve. This is because, high ICV of the daily exposed curve for any given region implies a high Inline graphic (i.e., high exposed numbers) or low Inline graphic (i.e., steady exposed numbers), or both. For instance, in Fig. 6(a), the high mean exposed counts of USA contribute to its high ICV; while Iran, despite having Inline graphic of the population of USA, has a steady (i.e., low standard deviation in the) daily exposed curve. In Table II, we summarize Inline graphic and Inline graphic of the 20 countries, along with the goodness of fit Inline graphic for the least squared optimization (see Expression 1). It is noteworthy that inverse coefficient of variation (ICV) is useful particularly when the available time-series data covers a considerable duration, allowing for the curve to reach its first major peak within the data collection period. If the exposed curve peaks towards the end, we see near-exponential growth, resulting in high Inline graphic and low ICV. In Fig. 6(b), we plot the ICV for the 20 nations, where China, New Zealand and USA, Iran have the least ICV and highest ICV, respectively. Reports corroborate these numbers, suggesting that ICV is indeed a reliable measure of infection duration. Though the earliest cases of COVID-19 was reported in China, the nation prides itself on curbing spread by enforcing the strictest lockdown measures [44]. New Zealand has a similar story of becoming the “emblematic champion of proper prevention” due to smart and early intervention measures [45]. On the other hand, USA continues to register record new cases which are projected to grow in the months to come [46]. Iran too has reported unprecedented growth in new cases in October 2020 [47].

B. Spatial Context to Human Mobility Patterns

We carry out a case study on the mobility pattern of NYC and its implications on any pandemic spread. Fig. 7(a) shows the 5 boroughs of NYC. We process the human mobility data of NYC (discussed in Section III-2) to generate the frequency matrix (Inline graphic) and represent the mobility within and across boroughs in a directed graph in Fig. 7(b). Each borough and district is placed according to its latitude-longitude coordinates and the size of the borough nodes and the opaqueness of a directed edge Inline graphic are proportional to the fraction of total trips originating at borough Inline graphic that have a destination borough Inline graphic.

Fig. 7.

Fig. 7.

Mobility pattern: (a) borough map of NYC, and (b) directed graph representation of the boroughs and mobility pattern of NYC; large circles are boroughs marked by the respective colors. The size of a borough node is proportional to the frequency of intra-borough trips, and the opaqueness of the directed edge Inline graphic is proportional to the propensity of trips made from borough Inline graphic to borough Inline graphic.

Fig. 7(b) shows that Staten Island to Brooklyn, followed by Brooklyn to Queens exhibit the highest inter-borough mobility. Fig. 8(a) is the transition matrix (Inline graphic) from column borough to row borough labeled by the corresponding transition probabilities (discussed in Section II-B2) in the form of a heatmap, showing that intra-borough trips outnumber inter-borough trips for all boroughs. Fig. 8(b) is frequency plot of NYC trips against the distance (in miles) between the source and destination zones, where short trips are preferred over long trips.

Fig. 8.

Fig. 8.

Spatial context in mobility of NYC. (a) Heatmap showing the transition matrix Inline graphic, where Inline graphic is the probability of moving from borough Inline graphic to Inline graphic (written in blue), (b) histogram of bin-size 5 showing the relationship between frequency of trips made and corresponding distances in miles.

a) Factors affecting human mobility: Human mobility is a combination of several deterministic and non-deterministic factors such as intent, convenience, environmental constraints, and so on. There are pedestrian based mobility models, such as Least Action Trip Planning [48], that suggest that a person chooses a destination (called waypoint) close to its current position, while another mobility framework called ORBIT [49] suggests that individuals cyclically move from one predetermined hub to another (as illustrated in Figs. 7(b), 8(a) and 8(b)). Social network-based mobility models, such as Social Network Theoretical (SNT) [50], suggest that people preferentially select next stops based on social affinity, such as work, social ties or friendships. Note that there are factors besides distance, such as intent (this can be a function of occupation, social affinity, etc.) that determine inter and intra-zone trips. Thus, despite high distance, there are a high number of trips made from Staten Island to Brooklyn and from Brooklyn to Queens. However, mobility (based on intent or proximity) across neighboring zones affect social mixing.

C. Spatiotemporal Model for Pandemic Spread

Based on the infected data (see Section III-3), we solve the optimization problem (Expression 1) to estimate the daily exposed population count. We use the Python SciPy differential evolution solver [51] that stochastically finds the minima by searching large areas of the candidate space. Fig. 9 shows the comparison of predicted daily exposed (dotted) Inline graphic scaled down by the infection rate Inline graphic against that of daily infected (solid) Inline graphic curve, while the lags in the corresponding peaks of the Inline graphic and Inline graphic curves capture the incubation period Inline graphic for a borough.

Fig. 9.

Fig. 9.

Comparison of predicted daily exposed Inline graphic and Inline graphic for Manhattan, Bronx, Brooklyn, Queens and Staten Island.

Observe that Brooklyn and Queens, the boroughs with a high intra- and inter-zone mobility, record the highest exposed count. Since there are few trips with Staten Island as destination, it has a low exposed count. As per the COVID-19 Tracking Project and the Center for Systems Science and Engineering at Johns Hopkins University, Queens and Brooklyn are truly the worst affected, as of November 2020 [52].

1). Peaking of the Exposed Curve and the Effect of Lockdown

We plot the variation in daily exposed numbers in each borough (Fig. 10). Lockdown was formally initiated in the state of New York on March 20, 2020 [53], which is shown in solid blue line. Note that the exposed numbers briefly continued to rise for a week after the imposition of lockdown. However, the exposed curve is showing new peaks since October 2020. Finally, the exposed curves corresponding to Brooklyn and Queens peak at nearly the same time due to the high mobility between the two boroughs as depicted in Fig. 7(b).

Fig. 10.

Fig. 10.

Daily exposed curve for each borough and the starting and ending dates for lockdown is shown as a blue vertical line.

2). Latent Factor (Inline graphic) Analysis

We discuss in Section I that infection spread is not merely a function of human mobility, but a joint effect of mobility and social mixing, e.g., Bronx, despite its low inter-zone mobility has relatively high daily exposed numbers. We quantify the combination of mobility and mixing as a latent factor (Inline graphic) (Section II-B). When we rank the boroughs in the non-increasing order of ICV of exposed, we see the following order: Manhattan (0.84), Brooklyn (0.80), Queens (0.70), Bronx (0.69) and Staten Island (0.63). In Fig. 11(a), we plot latent factor Inline graphic for each borough. Observe that Queens and Brooklyn once again exhibit the highest Inline graphic values. We already know that the latent factor is a combination of trip frequency, infected fraction and social mixing (see II-B3), we calculate the mixing factor (Inline graphic) from sampled Inline graphic (using II-B3) for each borough. We apply mean-centered cosine similarity (Section II-C) to show (with heatmap in Fig. 11(b)) that regions with high inter-zone mobility also show similar mixing, reinforcing infection spread.

Fig. 11.

Fig. 11.

Effect of Latent factor on infection spread and its variation during lockdown: (a) latent factor for each borough, (b) cosine similarity of mixing factors of NYC boroughs.

3). Lockdown Policymaking

In Section II-B3, we discuss that the latent factor Inline graphic can be scaled down by a fractional lockdown rate Inline graphic, where Inline graphic and 0 corresponds to no lockdown and complete lockdowns respectively. Using the new latent factor matrix Inline graphic, we obtain a resultant exposed count Inline graphic, where Inline graphic (Inline graphic). It is worth mentioning that the knowledge of the latent factor for each borough Inline graphic at time Inline graphic Inline graphic allows us to determine the ideal time and extent for Inline graphic in order to minimize contagion. To prove our point, we introduce a vector of time varying eta at each timepoint Inline graphic, Inline graphic and calculate Inline graphic by scaling the Inline graphic-th column of X (denoted by Inline graphic) by Inline graphic.

Given Inline graphic, let Inline graphic and Inline graphic (each of length Inline graphic) be two sets of timepoints with the highest and lowest sum of Inline graphic, respectively. We consider the following scenarios: Inline graphic =

  • Case 0: Inline graphic

  • Case 0.5: Inline graphic

  • Case 0.5+: Same as Case 0.5, except overwrite Inline graphic with 0.75 and 0.25 if Inline graphic and Inline graphic, respectively.

  • Case 0.5-: Same as Case 0.5, except overwrite Inline graphic with 0.25 and 0.75 if Inline graphic and Inline graphic, respectively.

We plot the total exposed in NYC in the pre-lockdown period for the four scenarios. Fig. 12 shows that we get the highest exposed for no lockdown (i.e., Case 0) and exactly half the exposed for Inline graphic lockdown (Case 0.5). In Case 0.5+, we assign lesser lockdown (Inline graphic) to a timepoint with higher latent factor sum, and vice versa, and end up with a higher exposed sum than Case 0.5. Finally, in Case 0.5-, we achieve a considerably lower exposed by enforcing a higher lockdown (Inline graphic) at timepoints with higher latent factor sum. This suggests that the latent factors can help identify the ideal timepoints of imposing lockdowns to curb spread.

Fig. 12.

Fig. 12.

The exposed numbers corresponding to four lockdown scenarios.

We apply principal component analysis (PCA) to visualize the reflection of the ensuing lockdown on the latent factor. The latent factor is a two-dimensional vector Inline graphic comprising data-points Inline graphic. Each point on the PCA plot (Fig. 13(a)) corresponds to a timepoint. We identify four clusters of timepoints, namely pre-lockdown (March 3 - 11), early-lockdown (March 22 - April 6), later-lockdown (April 21 - July 20) and post-lockdown (July 21 - October 15). In Fig. 13(b), we compare the daily infected numbers for these timelines to show how the infection peaked from pre-lockdown to early lockdown phases and subsided thereafter.

Fig. 13.

Fig. 13.

Effect of lockdown. (a) principal component analysis (with two components) of the latent factor showing four clusters of time intervals, (b) comparison of the projected infected numbers for the four lockdown timelines.

D. Exploratory Analysis

The results presented so far show that the proposed spatiotemporal model is a generalized approach that can make informed spread predictions. It is worth highlighting how this model differs from existing efforts to study the evolution of contagion and the effects of public health intervention measures. Sun et al. adapt the susceptible, exposed, infected, confirmed and removed (comprising recovered and death) to model COVID-19 transmission at Wuhan, China. This model uses additional parameters to quantify infection coefficients under lockdown as well as emigration and immigration rates [54]. Similarly, Tian et al. performed curve-fitting on the time series of cases reported in Hubei province to learn the SEIR model parameters for COVID-19 and reported the immediate effect of lockdown on curbing the rate of contagion [55]. These models are instances of top-down approaches relying on applying curve-fitting to learn the SEIR model parameters. The proposed approach, however, is inherently different as it uses a simplified version of the SEIR model based on the daily infection counts alone, and in doing so, identifies a zone-specific measure of spread. Its benefit lies in the fact that it (1) reduces the number of parameters to be used in the SEIR model for fitting, (2) incorporates the variations in spread dynamics due to the exact inter- and intra-zone mobility patterns, and (3) lends itself to a more generalized time-varying analysis that a traditional fitting-based approach may fail to capture.

Let us discuss how this model inspires several research directions to design mobility policies to combat future outbreaks. First, this model can infer the effect of trip lengths on the exposed numbers. As discussed in Section II-B, we decompose the latent factor Inline graphic into trip frequency, infected ratio and mixing factor. For instance, the exposed population count as a result of trips of length Inline graphic or more can be expressed as Inline graphic; here Inline graphic (refer (6)) are the stochastic matrices estimating probability of trips of length Inline graphic or more. One may approximate Inline graphic to Inline graphic, where Inline graphic is less than some threshold Inline graphic. Given that the average inter- and intra- borough distance in NYC is 14.4 miles, for Inline graphic, we observe that Inline graphic accounts for a little over Inline graphic of the total exposed numbers. It is worth noting that since the length of Inline graphic of trips in NYC are less than 2 miles, the pandemic spread can be contained effectively by restricting shorter trips, as longer trips have little bearing on the exposed numbers. Second, the exposed numbers often lends great insights into the time it would take to flatten the curve. Nonlinear curve-fitting on the daily exposed numbers for NYC boroughs explain why (in the current state of lockdown) the new exposed numbers started dropping by the end of May and daily infected curve (that lagged by roughly a week) started stabilizing by early June.

IV. Conclusion

COVID-19 has insidiously affected every facet of human existence over the last 11 months. Global infected and death numbers for COVID-19 suggest that geography, time, and mobility patterns are key factors affecting spread, making it imperative to factor in the spatiotemporal and mobility context in future prediction models. In this work, we present a spatiotemporal model for pandemic spread that unifies mobility and social mixing into a latent factor. We apply the model on the NYC data to show that boroughs of high inter-zone mobility (namely, Brooklyn, and Queens exhibit similar trends in exposed numbers as well as mixing. We carry out principal component analysis to depict the temporal variation of the latent factor in pre- and post-lockdown epochs. Next, we argue that the inverse coefficient of variation (ICV) of daily exposed curve can explain the duration as well as the spread of infection in a zone. We show that the ICV of nations (such as USA, Iran, China, New Zealand, etc.) correlate with the true extent and period of COVID-19 spread. We show the validity of the proposed method by estimating the exposed numbers for different countries. Furthermore, we report that the peaks in daily exposed of neighboring nations are reached at nearly the same time, underpinning the role of proximity in spread.

The discussion on lockdown policies (Section III-C3) and exploratory analysis (Section III-D) shows that the proposed spatiotemporal model motivates new research directions for studying the mixing propensity of individuals at both inter- and intra-borough levels. The drop in the latent factor (Inline graphic) post-lockdown was achieved by a reduced number of trips (although we will consider real post-lockdown mobility traces in our future experiments); however, lockdown or social distancing measures are expected to impact the mixing propensity of individuals in different contexts, such as mixing at grocery stores, restaurants, work-places or at home. Ideally, the mixing factor can be considered as a vector of latent items each of which may be impacted differently by constraints on individual mobility. These observations can feed into future SEIR models and quarantine procedures to limit the infection spread. Since regions with high inter-zone mobility patterns exhibit similar trends in exposed numbers, it is advisable to consider clusters of such zones in future infection models. A SEIR model executing on a single zone needs to otherwise consider the high inflow/outflow rates of susceptible/infected individuals into and out of the zones which again increases model complexity and can make the parameters non-identifiable and hence hard to justify [16]. A clustered view of zones with high mobility and mixing keeps the models simpler to the general SEIR format and helps generate more realistic predictions. Similarly, quarantine procedures on such entire clusters can help better contain the infection spread while additionally alleviating the economic loss that results from limiting the two-hop trips.

Biographies

graphic file with name roy-3059007.gif

Satyaki Roy received the Ph.D. degree in computer science from the Missouri University of Science and Technology, Rolla, MO, USA, in 2019. He is currently a Postdoctoral Research Associate with the Department of Genetics, University of North Carolina, Chapel Hill, NC, USA. His research interests include computational biology, network science and optimization, wireless sensor networks, epidemiology, machine learning, and parallel computing.

graphic file with name biswa-3059007.gif

Preetom Biswas is currently an undergraduate student with the Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh. He is currently a Math and Programming Enthusiast. His research interests include the application of machine learning and graph theoretic approaches to solving real-world problems.

graphic file with name ghosh-3059007.gif

Preetam Ghosh received the B.S. degree in computer science from Jadavpur University, Kolkata, India, and the M.S. and Ph.D. degrees in computer science and engineering from the University of Texas at Arlington, Arlington, TX, USA. He is currently a Professor with the Department of Computer Science and directs the Biological Networks Lab, Virginia Commonwealth University, Richmond, VA, USA. His research interests include algorithms, stochastic modeling and simulation, network science and machine learning related approaches in systems biology and computational epidemiology and mobile computing related issues in pervasive grids that have resulted in more than 170 conferences and journal articles and several federally funded research projects from the NSF, the NIH, the DoD and the US-VHA. He is currently the Secretary or the Treasurer of the ACM SIGBio.

Funding Statement

This work was supported by NSF under Grant CBET-1802588.

Contributor Information

Satyaki Roy, Email: satyakir@unc.edu.

Preetom Biswas, Email: preetomicc@gmail.com.

Preetam Ghosh, Email: pghosh@vcu.edu.

References


Articles from Ieee Transactions on Emerging Topics in Computational Intelligence are provided here courtesy of Institute of Electrical and Electronics Engineers

RESOURCES