Abstract
Motivated by the need for robust models of the Covid-19 epidemic that adequately reflect the extreme heterogeneity of humans and society, this paper presents a novel framework that treats a population of N individuals as an inhomogeneous random social network (IRSN). The nodes of the network represent individuals of different types and the edges represent significant social relationships. An epidemic is pictured as a contagion process that develops day by day, triggered by a seed infection introduced into the population on day 0. Individuals’ social behaviour and health status are assumed to vary randomly within each type, with probability distributions that vary with their type. A formulation and analysis is given for a SEIR (susceptible-exposed-infective-removed) network contagion model, considered as an agent based model, which focusses on the number of people of each type in each compartment each day. The main result is an analytical formula valid in the large N limit for the stochastic state of the system on day t in terms of the initial conditions. The formula involves only one-dimensional integration. The model can be implemented numerically for any number of types by a deterministic algorithm that efficiently incorporates the discrete Fourier transform. While the paper focusses on fundamental properties rather than far ranging applications, a concluding discussion addresses a number of domains, notably public awareness, infectious disease research and public health policy, where the IRSN framework may provide unique insights.
Keywords: Social network, Infectious disease model, Complex systems, Agent based model, Cascade model, Poisson random graphs
MSC: 05C80, 91B74, 91G40, 91G50
1. Introduction
Heterogeneity proliferates in human society at every level, and new types of mathematical modelling are needed to understand how these many layers of heterogeneity interweave and influence people’s lives. The COVID-19 pandemic is a singularly far reaching and catastrophic event, and it will continue to negatively impact humanity for a long time to come. Layers of heterogeneity are especially relevant to a deep understanding of an infectious disease like COVID. Viral transmission may be through aerosols, droplets and fomites; the viral load may get absorbed by and do damage to a variety of tissues within the body; people’s immune systems function in diverse ways. The virus itself may evolve into different forms. People are tremendously varied in their habits, their friendships, their living arrangements, their range of movements. These multifarious factors are all important to consider, and some will prove to be the most significant factors in determining where the disease will have its gravest damage, and the best actions to take to ameliorate this damage.
Network science has arisen in recent decades as the most helpful conceptual framework for handling potentially overwhelming complexity. Networks can provide the architecture and structure for agent based modelling of contagion, leading to the massive computer simulations such as those of Ferguson and Ghani (2020) that have been used to develop a comprehensive picture of how such a disease may progress. As we shall show in this paper, network science can also provide shortcuts to dramatically speed up such computations, allowing us to quickly explore a vast array of alternative scenarios of the disease.
This paper provides a novel network framework for society, called inhomogeneous random social networks (IRSNs) and then models the propagation of an infectious disease like COVID-19 in such a society. It can be interpreted as an agent based contagion model, with the useful feature that an analytical shortcut is available for large-scale simulations of the disease dynamics. The framework starts with a so-called inhomogeneous random graph (IRG), henceforth called the social graph, whose nodes represent people classified into a finite number of types, interconnected by edges representing their random social contacts. The people in this social network are provided with random immunity buffers that measure their resistance to the disease and social contact links are labelled by random weights called exposures that quantify the viral load transmitted by infected individuals to their social contacts. Then, when a seed infection is introduced randomly into the population of susceptible individuals, a sequence of contagion shocks will develop that will be modelled as iterations of a cascade mapping or cascade mechanism.
The main contributions of this paper are:
-
1.
Introduction of the inhomogeneous random social network (IRSN) framework that provides a flexible and scalable architecture for describing a heterogeneous society of size N with complex community structure. Individuals are classified by arbitrary types with random characteristics within each type.
-
2.
To develop infection cascade models for such networks based on a threshold mechanism for transmission. This transmission mechanism can incorporate arbitrary dose-response functions, replacing over-simple transmission assumptions typically used in epidemic models.
-
3.
To develop large N asymptotics for SEIR infection cascades in IRSN models, leading to Theorem 3 that provides explicit and efficiently computable recursive probabilistic formulas for the daily update of the state of the disease within the population.
-
4.
To show how the contagion analytics can be used to provide large scale investigations into potential policy interventions that one might invoke to mitigate or suppress the progress of the contagion.
-
5.
Overall, to provide a purely analytical toolkit for networks with potentially thousands of different types of individuals, that can run on a laptop. The network framework is capable of providing much faster results, with a similar degree of accuracy, than is possible with large-scale agent based epidemic models sometimes used for informing health policy.
Studying the spread of infectious diseases using the tools of network science has a substantial literature, reviewed for example in Keeling and Eames (2005) and Danon et al. (2011). The book by Newman (2010) provides a broad overview of networks in all areas of science, including applications to epidemic modelling, while Pellis et al. (2015) explores current challenges in network epidemic models. Of particular interest is the review of epidemic processes in complex networks by Pastor-Satorras et al. (2015): Many ingredients of the framework developed here can be traced to references described there. In particular, we see there that our model has its roots in the network cascade model of Watts (2002), generalized to allow for random edge weights as in Hurd and Gleeson (2013).
The IRSN model is presented here in a form exactly equivalent to a simple agent based model. This equivalence provides important motivation and justification of the underlying assumptions, and gives a vivid intuitive picture for interpreting the IRSN model. An important example of the intuition gained is a form of selection bias inherent in agent based models, and real epidemics, that we call susceptibility bias. Susceptibility bias, akin to Darwinian evolution, is the effect that less resistant individuals tend to be infected earlier, leaving remaining susceptibles who tend to be more resistant than the original population. We will find that accounting for susceptibility bias presents a mathematical difficulty that our framework can partially, but not fully, solve. In general, the IRSN framework lies between agent based models of the type developed by Ferguson and Ghani (2020) and the literature on compartment ordinary differential equation models (ODEs) stemming from the pioneering work of Kermack et al. (1927). Full exploration of the conceptual links between these three distinct modelling frameworks is a promising avenue to deeper understanding of real world epidemics.
Section 2 of the paper introduces the essential structure of IRSNs and defines the SEIR infection cascade mechanism that characterizes the daily propagation of the disease on such networks. This section also explores the equivalent agent based model, and its heuristic properties. Section 3 explores the large N analytical properties of the IRSN model, leading to Theorem 3 that characterizes the infection cascade mapping on the first day. This result is extended to successive days by an additional mixing assumption, providing a recursive characterization of the daily infection cascade mapping in the N = ∞ limit. Section 4 provides the ingredients for a numerical implementation of the SEIR cascade mapping that uses the discrete Fourier transform. It is shown that the flop count for computing a daily update is O(M2 ×Ndft) where M is the number of types and Ndft is the number of lattice points in each one-dimensional integration. This is efficient enough that complex specifications of IRSN models can be explored quickly on a laptop. Section 5 addresses the issue of calibrating IRSN models to real health and social data. In Section 6, we explore a simple illustration showing how the method can be used to understand potential policy interventions to protect the residents of a seniors’ residential centre while a pandemic rages in the community outside. Finally, a concluding discussion addresses how this novel modelling framework can lead to improved understanding of epidemics by practitioners in several different domains.
Notation:
-
1.
For a positive integer N, [N] denotes the set {1, 2, …, N}.
-
2.
For a random variable X, its cumulative distribution function (CDF), probability density function (PDF) and characteristic function (CF) will be denoted FX, ρX = FX′, and respectively. Note that where denotes the Fourier transform:
We also make use of the function .
-
3.
For any event A, 1(A) denotes the indicator random variable, taking values in {0, 1}.
-
4.
Landau’s “big O″ notation f(N) = O(Nα) for some is used for a sequence f(N), N = 1, 2, …to mean that f(N)N−α is bounded as N → ∞.
-
5.
The L2 Hermitian inner product of two complex valued functions f(x), g(x) on a domain D is defined to be . The L2 norm of a function f(x) on a domain D is defined to be .
2. SEIR model on IRSNs
This section provides the core modelling assumptions of the network epidemic framework, in the classic susceptible-exposed-infected-recovered setting (see Anderson and May (1992)) in which individuals progress through the stages of the disease, moving from compartment to compartment:
The social network describes a population of individuals as nodes of a graph, whose undirected edges represent the existence of a significant social connection. Our network setting for the spread of an infectious disease has the following structure.
-
1.
The population is classified into a finite disjoint collection of “types” that represent people’s important attributes, such as age, gender, living arrangement, profession, country and location.
-
2.
Individuals within a type have random attributes drawn from type-dependent probability distributions.
-
3.
The network of social contacts, initially random, is taken to be constant during the epidemic. This implies in particular that aging effects such as births and deaths are ignored.
-
4.
The outbreak is monitored in discrete time, with a period Δt assumed for convenience to be one day. At the start of the outbreak on day 0, most of the population is susceptible (S), but a small number of individuals are exposed (E) or infective (I).
-
5.
Each day infective individuals pass on a random viral dose to their infective contacts, a random subset of their social contacts.
-
6.
A susceptible individual’s state of health at the beginning of each day is represented by a random immunity buffer. During the day, they experience an accumulation of random viral doses through their infective contacts, and if the total viral load exceeds their buffer they become exposed, meaning infected but not yet infectious, and are moved into compartment E.
-
7.
Each day certain individuals move from E to I, meaning they become infectious. Others move from I to removed (R) meaning that they either die or recover and are no longer infectious. Removed individuals are assumed to have permanent immunity.
This framework is a kind of agent based model (ABM) that focusses solely on the occurrence of infective contacts, simpler than ABMs that also simulate the movements of individuals. Note that for each agent, their actual infection event is modelled as a threshold event that occurs if the total viral load received in a period of time Δt = 1 day through multiple contacts exceeds their natural immunity.
The population with its social structure will be represented at any moment as an inhomogeneous random social network, or IRSN. An IRSN is the specification of a multidimensional random variable that captures two levels of structure. The primary level, called the social graph, is an undirected random graph with N nodes labelled by a type classification, where each undirected edge represents the existence of a significant social connection, such as a family, collegial or friend relationship. The secondary layer specifies the infective contacts, mutual exposures and health of people. Inhomogeneity in the IRSN model arises through classifying people by a finite number of types that can account for a wide range of attributes.
It is important to note that the IRSN will be assumed to change over time in a prescribed fashion: the primary level remains constant, while the secondary layer varies stochastically each daily time step. The primary level is fixed because the calibration of the social graph is assumed to be based on studies such as Mossong et al. (2008) and Prem et al. (2017) that studied contact data gathered over many years prior to the outbreak. On the other hand, the secondary layer changes to reflect the stochastic nature of the pandemic on a daily scale.
2.1. Social graph
The social graph is modelled as an undirected inhomogeneous random graph (IRG), generalizing Erdös-Renyi random graphs, in which edges are drawn independently between unordered pairs of nodes, not with equal likelihood but with likelihood that depends on their types. This class of random graph has its origins in Chung and Lu (2002) and has been studied in generality in Bollobás et al. (2007) and the textbook by van der Hofstad (2016).
Assumption 1
[Social Graph] The primary layer of an IRSN, namely the social graph , is an inhomogeneous random graph with N nodes labelled by v ∈ [N]. It can be defined by two collections of random variables: Tv for v ∈ [N] and Avw for (v, w) ∈ [N] × [N].
- 1.
Each node v ∈ [N], representing a person, has type Tv drawn independently with probability from a finite list of types [M] of cardinality M ≥ 1. Note that .
- 2.
Each undirected edge (v, w) ∈ [N] × [N] corresponds to a non-zero entry of the symmetric random adjacency matrix A. For each pair (v, w), Avw = Awv is the indicator for w to be (significantly) socially connected to v. Conditioned on the collection of all types {Tv}, the collection of edge indicators {Avw} is an independent family of Bernoulli random variables with probabilities
(1) It is an important observation that the sequence of with the same and varying size N have uniform probabilistic characteristics that tend to a central limit as N → ∞. In particular, the probability mapping kernel κ, the symmetric matrix that determines the likelihood that two people v, w of the given types have a social connection, is divided by N − 1 to ensure this uniformity and sparseness of the graph for large N. For consistency we require that N − 1 ≥ maxT,T′κ(T, T′).
2.2. Infective contacts, Viral Exposures and Immunity Buffers
The relevant health attributes of all people are summarized by an independent collection of multivariate random variables, conditioned on the social graph.
Definition 1
- 1.
The infective contact indicator pair between w and v is a pair of Bernoulli variables (ζvw, ζwv). ζvwAvw = 1 means that the social relationship between v and w leads to a close infective contact on a given day, such that when v is infectious a viral dose will be transmitted to w.
- 2.
The potential viral exposure pair between w and v is a pair (Ωvw, Ωwv) of positive values: Ωvw represents the viral load transmitted from v to w should v, w have a single infective contact, and if v is infective.
- 3.
The immunity buffer Δv of node v is a non-negative value that represents the resistance of that person to the virus.
Assumption 2
[Infective Contacts, Viral Exposures and Immunity Buffers] The secondary layer of an IRSN, the collection of infective contacts, potential exposures and immunity buffers ζvw, Ωvw, Δv are non-negative random variables that are chosen to be independent of {Avw}, conditioned on {Tv}.
- 1.
For each edge (v, w), (ζvw, ζwv) is a bivariate Bernoulli random variable. Conditioned on Tv = T, Tw = T′, ζvw = 1 with probability z(T, T′).
- 2.
For each edge (v, w), (Ωvw, Ωwv) is a bivariate random variable. Conditioned on Tv = T, Tw = T′, Ωvw has a continuous marginal density ρΩ(x∣T, T′) supported on and associated distribution functions .
- 3.
For each individual v, Δv conditioned on Tv = T ∈ [M] has a continuous density ρΔ(x|T) = F′Δ(x|T) supported on . Thus the cumulative distribution function (CDF) is
(2) We also record the characteristic function (CF) and .
In summary, an IRSN of finite size N representing the population of N individuals amounts to a collection of random variables {T, A, ζ, Ω, Δ} satisfying Assumptions 1 and 2.
2.3. Infection transmission and the epidemic trigger
Infection transmission is a stochastic process that we idealize here as proceeding in discrete time with a period taken for convenience to be one day. This time scale can be thought to correspond to the length of time a transmitted viral load remains active within the body. The most important factors in determining the probability that a susceptible individual becomes infected in a day are the total viral load they accumulate during that day and their immunity buffer. We adopt a threshold infection assumption, as described for example in Pastor-Satorras et al. (2015)[Ch X].
We assume that the random social graph determined by {T, A} is chosen at time t = 0 and remains fixed for the duration of the contagion process. On the other hand, {ζ, Ω, Δ} form a conditionally IID sequence of multivariate random variables that are drawn daily. Thus, only the secondary layer of the IRSN changes over time.
Consider a typical day starting at time , at which time the compartments S,E,I,R are assumed to be a union over T ∈ [M] of disjoint random subsets S(t|T), E(t|T), I(t|T), R(t|T) of the node set [N]. The initial compartments, and the possible compartment changes each day are determined by the following rules:
Assumption 3
[Initial Trigger and Transmission]
- 1.
The epidemic trigger at the beginning of day t = 0 randomly assigns each type T individual to one of the compartments S, E, I, R independently with probabilities s(0|T) = 1 − e(0|T) − i(0|T), e(0|T), i(0|T), r(0|T) = 0. This determines the initial compartments S(0) = [N] ∖ (E(0) ∪ I(0)), E(0), I(0), R(0) = ∅; these compartments are partitioned by types: S(0) = ∪TS(0|T), etc.
- 2.
Each day t ≥ 0, a new collection {ζ(t), Ω(t), Δ(t)} of random variables are sampled satisfying Assumption 2.
- 3.
For each successive day t ≥ 0, the transmissions from S to E, E to I and I to R are determined by the following SEIR transmission assumptions:
- (a)
Each v ∈ S(t|T) will be exposed and moved to E(t + 1|T) if
(3)
- (b)
Each v ∈ E(t|T) becomes infectious and moves to I(t + 1|T) independently with probability β(T) ∈ [0, 1].
- (c)
Each v ∈ I(t|T) is removed to R(t + 1|T) independently with probability γ(T) = γd(T) + γr(T) ∈ [0, 1], where γb, γr are the probabilities of death and recovery respectively.
Note that from the above assumptions, z(T′, T) represents the conditional probability on a given day that w and v have an infective contact, given that they have a social contact and w ∈ I(t|T′), v ∈ S(t|T); these are entries of an M × M possibly non-symmetric matrix.
As will be discussed in detail in Section 3.4, the threshold infection assumption captured in (3) can be directly interpreted as a dose-response model as reviewed in Haas (2015).
2.4. IRSN agent based simulation
Assumptions 1,2,3 for an infection contagion cascade of Tmax days duration on a finite social network of N people can be realized by the following algorithm for the IRSN-ABM, a simple agent based model.
-
Step 0
Initialize the primary level random variables Tv, Avw according to Assumptions 1. Set t = 0 and assign each node v ∈ [N] independently to one of the compartments S(0), E(0), I(0), R(0) according to the initial probabilities s(0|Tv), e(0|Tv), i(0|Tv), r(0|Tv) as in Assumption 3.
-
Step 1While t < Tmax:
-
(a)Update the secondary random variables: For each w ∈ I(t) and v ∈ S(t), generate , according to Assumptions 2.
-
(b)Exposure: For v ∈ S(t), if , move v to E(t + 1), otherwise keep v ∈ S(t + 1).
-
(c)For v ∈ E(t), independently move v to I(t + 1) with probability β(Tv), otherwise keep v ∈ E(t + 1).
-
(d)For v ∈ I(t), independently move v to R(t + 1) with probability γ(Tv), otherwise keep v ∈ I(t + 1).
-
(e)Increment t = t + 1 and repeat Step 1.
-
(a)
Each simulation of the model leads to the collection of random compartments S(t|T), E(t|T), I(t|T), R(t|T) with fractional sizes
for days t = 0, 1, …, Tmax and types T ∈ [M].
Remark 1
The above specification for the IRSN-ABM is one of many natural possibilities. In particular, choosing to freeze the social graph to remain constant, while making the secondary layer change unpredictably every day is a strong immunological assumption that is open to debate. For example, one might propose an alternative assumption that the secondary layer exhibits serial correlation, or more strongly, remains constant day by day.
Susceptibility bias refers to a type of selection bias, akin to Darwinian evolution, that in a heterogeneous population where individuals have slowly varying innate characteristics, the less resistant individuals tend to succumb to the disease earlier than more resistant individuals, and consequently the susceptible population becomes more resilient over time. Random variables such as the social graph that remain constant lead to susceptibility bias, while making random characteristics serially independent reduces susceptibility bias. The IRSN-ABM will have some susceptibility bias arising from the constancy of the social graph because highly connected individuals of a given type will tend to receive more infectious shocks than less connected individuals of the same type.
3. Analytical asymptotics of the IRSN model
The IRSN framework just introduced specifies the joint distributions of the random variables {T, A, ζ, Ω, Δ} and the random compartments S(0), E(0), I(0), R(0), thereby providing a compact stochastic representation of the state of a network of N individuals at the moment an outbreak is triggered. The same distributional data defines a sequence of random networks with varying N.
The main objective is to study the dependence on t of the size of the random compartments S(t|T), E(t|T), I(t|T), R(t|T). It is important that we consider relationships between finite N networks and the asymptotic limit N → ∞. To this end, for each N we define the fractional expected sizes to be
(4) |
(5) |
(6) |
(7) |
By permutation symmetry, etc. Throughout the remainder of the paper, the quantities s(t|T), e(t|T), i(t|T), r(t|T) without superscript (N) denote the large N limiting values.
The most important result of the paper will be N → ∞ asymptotic recursion formulas mapping the quantities s(t|T), e(t|T), i(t|T), r(t|T) from day t to t + 1 for t ≥ 0, subject to specified initial conditions for t = 0. This system of equations is a discrete dynamical system on a simplex defined by relations for each T ∈ [M], lying within the hypercube [0,1]4M. The mapping generating this dynamics will be called the infection cascade mapping.
3.1. Degree distribution of the social graph
The distribution of the number of social contacts of nodes in IRGs, in other words their social degree distribution, has a natural Poisson mixture structure in the large N limit. By permutation symmetry, one only needs to consider individual 1 with arbitrary type T1 = T, whose social degree is defined as , a sum of conditionally IID Bernoulli random variables. Since , each term has the identical conditional characteristic function (CF)
(8) |
The conditional CF of d1 is the N − 1 power of this function, and can be written
(9) |
to display its asymptotic structure as N → ∞.
Proposition 1
The characteristic function of the social degree dv of an individual v, conditioned on its type T ∈ [M], is 2π-periodic on and has the N → ∞ limiting behaviour:
(10)
(11) where λ(T) = ∑T′λ(T′, T) with . Here, convergence of the logarithm of (10) is in L2([0, 2π]).
Proof of Proposition 1. The proof is immediate by applying the following Lemma 2 to the logarithm of (9), with N − 1 = y−1 and .▪
Lemma 2
Let I be any hyperinterval in and . Suppose is a bivariate function such that are pointwise bounded and in L2(I) for each value . Then
Proof of Lemma 2. Under the assumptions, one can show directly that f(x, y)≔ log(1 + yg(x, y)) − yg(x, 0) satisfies limy→0f(x, y) = limy→0∂yf(x, y) = 0 and hence by Taylor’s remainder theorem
One can also show that is in L2(I) for each value provided is small enough. Then, by Fubini’s Theorem, for
for some constant M, from which the result follows.▪
Proposition 1 tells us that for different values of T, the conditional social degree converges in distribution to a Poisson random variable with mean parameter λ(T) = ∑T′λ(T, T′). Now, recall that a finite mixture of a collection of probability distribution functions is the probability distribution formed by a convex combination. Thus the asymptotic unconditional social degree distribution of any individual is a finite mixture with characteristic function:
(12) |
Each mixture component has a Poisson distribution with Poisson parameters λ(T) and the mixing variable is the individual-type T with mixing weight .
3.2. The first infection cascade step
The most important quantity on day 1 is the exposure probability EP(1|T) for a type T individual v that is susceptible on day 0 to become exposed on day 1. For a finite network of size N, by permutation symmetry, we can take v = 1 and the required conditional probability can be expressed as . By our assumptions, in particular (3), this is
(13) |
where , the total viral load received by 1, is the sum of viral shocks from w≠1 to 1
(14) |
As studied in Hurd and Gleeson (2013), threshold probabilities such as (13) are efficiently computable via characteristic functions. Assuming that X, Y are independent non-negative random variables such that X has a density ρX(x) and the CDF FY of Y is continuous, and letting these functions have Fourier transforms , then by the Parseval Identity
(15) |
Note that conditioned on T1 = T, the shocks for all w≠1 are independent and identically distributed (IID). Since is a Bernoulli random variable that is independent of ,
and hence for any w≠1 the characteristic function of the shock conditioned on the type T1 = T is given for finite N by
(16) |
By the independence of viral shocks, the total viral load has CF
(17) |
The desired large N approximation is uniform in , and follows by the argument proving Proposition 1:
(18) |
(19) |
The expected fractional number of type T newly exposed individuals will be EP(N)(1|T)s(0|T). By combining (13) and (15) with and , and applying the dominated convergence theorem, we have
The expected fractional number of type T exposed individuals that become infective will be β(T)e(0|T) and expected fractional number of type T infective individuals that are removed will be γ(T)i(0|T). Putting these pieces together, one obtains the main result.
Theorem 3
Consider the sequence of IRSNs for all N, satisfying all the assumptions in Section 2. Then for each N, the fractional expected compartment sizes on day 1 are
(20)
(21)
(22)
(23) The type T exposure probability on day 1 is uniformly approximated as N → ∞:
(24)
(25) where is given by (2) and is given by
(26)
3.3. The mixed infection cascade mapping
As discussed in Section 2.4, the IRSN model exhibits susceptibility bias. Due to the constancy over time of the social graph, highly connected individuals tend to be infected earlier than less connected individuals, and hence the average connectivity of susceptibles decreases over time. This implies that the assumptions underlying Theorem 3 do not hold for the infection cascade mapping on subsequent days t > 0. Indeed, we have not been able to generalize Theorem 3 to cope with susceptibility bias for t > 0.
Instead, we propose to depart from the IRSN model of Section 2.3 by introducing an additional randomization called mixing that eliminates the susceptibility bias and ensures that Theorem 3 holds for t > 0. The required form of conditional independence is achieved by introducing for each t > 0 a random reassignment of the labels S, E, I, R for each type T, consistent with the total fractions of nodes in each subcompartment. Specifically, for each t ≥ 0 we replace Step 1(e) of the IRSN agent based simulation of Section 2.4 by the following:
Step 1’(e): Reassign each node v ∈ [N] independently to one of the compartments S(t + 1), E(t + 1), I(t + 1), R(t + 1) according to the probabilities s(t + 1|Tv), e(t + 1|Tv), i(t + 1|Tv), r(t + 1|Tv). Increment t = t + 1 and repeat Step 1.
Since the proposed mixing is inconsistent with any true agent based model, we call this a pseudo-agent based model, the IRSN P-ABM. Under model specifications where susceptibility bias of the ABM is small, the large N limit of the P-ABM will closely mimic the corresponding ABM. Under the IRSN P-ABM, the form given by Theorem 3 continues to apply for subsequent time steps, and we are justified in proposing the following mapping as a consistent network model for infectious disease spread.
Mixed Infection Cascade Mapping: Consider the limit N = ∞ of the sequence of IRSN P-ABMs for all N, satisfying all the assumptions in Section 2.4, with the modified Step 1’(e) just discussed. Then on day t ≥ 1,
-
1.
The type T exposure probability is
(27) |
-
2.
The transmitted viral shock has PDF with
(28) |
-
3.
The fractional expected compartment sizes are
(29) |
(30) |
(31) |
(32) |
3.4. Dose-response model of transmission
To obtain a more specific threshold model of transmission, leading to a better understanding of the immune buffers and exposures, this section develops the idea of dose-response, as discussed in e.g. Haas (2015), as a model of viral transmission. In a simple dose-response model for airborne disease transmission, each viral dose transmitted from an infective to a susceptible host is assumed to be carried by a very large number Ω of airborne particles, thought of as either aerosol or droplet. These particles settle on tissues within the host, where each is assumed to have an independent identical small chance α to cause the host to become exposed. The probability that exposure occurs is therefore
(33) |
where Bin(N, α, ⋅) denotes the values of a binomial distribution, and the approximation is the Demoivre-Laplace limit theorem.
There are many reasons why (33) is oversimple, and it is common to replace it by a more general dose-response relationship
(34) |
for an increasing function with F(0) = 0, F(∞) = 1.
We can view this general dose-response as a threshold model. The parameter α = αT, or the specific function F = F(⋅|T) can be assumed to depend strongly on the host’s type T. We should also assume that Ω is random, depending on all of the infecting individuals and the host’s type. If we assume a type T susceptible’s buffer Δv is distributed with CDF F(⋅|T) and is independent of the total exposure Ω which is given as a random sum of infectious exposures then, conditioned on the exposures , the probability of v being exposed will be
(35) |
consistent with (13). Taking an expectation over Ω leads to the probability of v being exposed after day t, conditioned on v ∈ S(t − 1|T). This will be where
(36) |
(37) |
For the simplest dose-response (33), these expectations factorize, and with the help of Lemma 2 one finds
(38) |
(39) |
(40) |
Here is the probability that v becomes exposed from a single viral dose from a type T′ infective.
When (29) and (40) are combined, we obtain
(41) |
This can be recognized as the solution of the vector-valued ordinary differential equation at the heart of multi-type compartment epidemic models:
(42) |
with the continuous functions i(t|T′) replaced by the piece-wise constant approximations i(⌊t⌋|T′), and the transmission parameter given by α(T, T′) = δt−1τ(T′, T)κ(T′, T)z(T′, T). Thus, combining (40) with the steps leading to Theorem 2 provides a direct derivation of the classic SEIR ODE model from a particular specification of a more fundamental agent based contagion model.
4. Discrete Fourier Transform implementation
The core of the numerical implementation of the mixed infection cascade mapping will be to approximate the integral (27) for EP(t|T) using the Discrete Fourier Transform (DFT). The DFT works most effectively on a grid of nonnegative integers we denote by [Ndft]≔{0, 1, 2, …, Ndft − 1} whose log-size log 2(Ndft) is an integer chosen to compromise between precision and computational efficiency. All immunity buffers will be taken to have integer values on [Ndft] that represent multiples of a unit of viral dose. The exposures will have values on a smaller grid {0, 1, 2, …, omegamax −1}. To avoid the aliasing problem familiar in applications of the DFT, we assume Ndft is sufficiently large compared to omegamax so that is a negligible probability when node 1 has any possible type T1.
Thus we assume that the PDF and CDF ρX, FX of any continuous random variable X can be replaced by dimension Ndft probability vectors with components ρX(x), FX(x), x ∈ [Ndft]. The characteristic function is then replaced by the DFT of ρX, , defined for each k ∈ [Ndft] by
The DFT is an invertible linear operator (in fact an isometry under the Euclidean metric) on ; the inverse DFT is given by
Given two independent positive random variables X, Y with values in [Ndft], one then has the identities
where .
Based on these identities, with the grid [Ndft] set in this way, we can implement the mixed infection cascade mapping given by equations (29), (30), (31), (32) of Section 3.3 with equations (27), (28) replaced by
(43) |
(44) |
(45) |
One sees that for a single day t, the computational complexity of the algorithm is dominated by (44) which amounts to O(Ndft × M2) flops for the complex matrix-vector multiplication, followed by Ndft × M complex exponentiations. Memory usage is dominated by storing the constant matrix R with Ndft × M2 components. Since Ndft = 210 is a typical value, there is clearly no difficulty in computing the general model with several thousand types on an ordinary laptop.
5. Calibrating IRSNs
This section addresses implementation of the infection cascade model on IRSNs, and its generalizations, for a real world network of individuals. The central issue is to construct a sequence of IRSNs of size N increasing to infinity, that is statistically consistent with the real world network when . Then the statistical model for N = ∞ can be subjected to epidemic triggers with any initial infection probabilities, and the resultant infection cascade analytics developed in Section 3 will yield the chronology of the epidemic, and measures of the resilience of the real world network.
The type of network data available to policy makers varies widely from one health jurisdiction to another. Here we imagine a minimal dataset for individuals classified into M types labelled by T ∈ [M], where denotes the number of individuals of type T. Individual types, and the population sizes , will be assumed not to change over the data sampling period. As a first estimation step, we choose the empirical type distribution:
Typically, this vector is determined by census data.
5.1. Social contact matrix
Next suppose for illustration that the interconnectivity, exposures and health statistics of the real network have been observed for an extended period prior to the epidemic. In particular, social connectivity has been observed, and edges having the meaning of a “significant social contact” are drawn between any ordered pair (v, w) of individuals if the average daily contact time of individual w to individual v exceeds a specified threshold.
Let the social contact matrix represent the expected total daily number of significant T to T′ social contacts in the given population. Such matrices have been studied in great depth, for many countries and communities, and are available in public databases such as Prem et al. (2017). Following the discussion of Section (3.1), the average number of T′ contacts per type T individual, , is matched to the conditional mean to identify the empirical connection kernel
(46) |
Theoretically, social contact matrices can be constructed as a very large sum over settings that represent the different places people meet, see Mistry et al. (2020). Each setting s is assumed to involve a finite number of people, with an equal likelihood z(s) ∈ [0, 1] of a contact between any pair. Let denote the column vector counting the number of individuals of each type in the setting s. The construction then amounts to representing the matrix by the weighted sum
(47) |
whose T, T′ component is
where δTT′ denotes the Kronecker delta. Note that ∗ in (47) represents an outer-product of vectors, that in general yields a rectangular matrix. This sum over settings can be disaggregated into different types of settings, such as school, hospital and workplace, leading to contact matrices within subcommunities.
5.2. Buffers and exposures
Recall from the previous section that exposures are assumed to take values on the integer grid {0, 1, 2, …, omegamax} for some moderately large integer omegamax, where 1 represents a choice of a unit dose. It should be supposed that Ωe is observed for a certain random sample of directed T → T′ edges e. It is then reasonable to infer empirical distributions ρΩ(⋅, T, T′) from a parametric family of discrete distributions on {0, 1, 2, …, omegamax} that match the sample means and variances . In a similar way, can be estimated from a random sample of observed values of the buffer variable for type T nodes.
Gamma distributions on , parametrized by the shape parameter k > 0 and scale parameter θ > 0, form a particularly nice family suitable for theoretical studies of the IRSN framework. Of particular interest are the exponential distributions with k = 1, due to their “memoriless” property. When Δ is exponential, the dose-response curve leads to the assumed serial independence of infection arising from successive viral doses. As shown in Section 3.4, this specification leads to an approximate solution of an SEIR compartment ODE model. Thus when Δ is exponential, the IRSN model should closely mirror properties of the ODE model. Heuristics seem to suggest however that the true dose-response function, i.e. the CDF of Δ, is better taken to be “S”-shaped with k > 1. There is little literature on the statistics of transmitted viral doses from which to infer properties of the exposures Ω, although COVID-19 pharyngal swab test studies such as Jones et al. (2020) provide some insight. Interestingly, that study suggests that COVID-19 viral loads measured for individuals that tested positive may to be very fat-tailed. For our present exploratory purposes, gamma distributions may reasonably be used for both Δ and Ω.
5.3. Intercompartmental parameters
Under Assumption 3, the latency period of a type T individual (the period between exposure and infectiousness) is a geometrically distributed random time with expected value (β(T))−1. Similarly, the duration of the infectious period is geometrically distributed with expected value (γ(T))−1. Properties of these random variables, in particular the expected durations, are typically well-studied for different diseases. Straightforward extensions of the IRSN model involving multiple copies of compartments E, I, exactly as implemented in compartment ODE models, can accommodate more realistic random durations.
5.4. Infective contact parameters
Finally, one needs to identify the fractions z(⋅, ⋅) of social contacts that are close infective contacts. First note that these fractions can be directly targeted by health policy, and will therefore be manipulated and changed dramatically during the pandemic. Since the fraction z(T, T′) applies where the type T individual is infectious and the type T′ individual is susceptible, lowering this parameter by restricting type T behaviour will directly reduce the exposure of type T′ individuals.
Noting that there is little fundamental theory that informs the choices of buffer and exposure distributions, and because the z parameters are the direct target of policy interventions, a practical approach is to select benchmark values as the last step of calibration, to obtain a strong match for the contagion dynamics in the earliest stages of the pandemic. Thus, for example, the observed effective R-naught value, R0, should be used to calibrate the benchmark values of . For later stages of an epidemic, the value of the z parameters can be adjusted to account for proposed and actual changes in policy.
6. Illustrative example: seniors’ residential centre
The purpose of this example is to provide an easy-to-visualize context for the IRSN framework, namely the setting of a seniors residence with 100 residents (type T = 1), 50 trained staff workers (type T = 2) within a town of total population N0 = 10000. We also consider the same IRSN specification scaled up by an integer multiplier N = jN0.
In anticipation of an oncoming contagion, the workers have been trained to high standards of hygiene and care and the residents (who are elderly but healthy) have been instructed in social-distancing and hygiene. The townspeople (“outsiders”, with type T = 3) on the other hand have only average ability to social distance, and so the contagion hits the town before the centre. The goal of this example is to investigate the vulnerability of the residential centre to internal contagion starting in the outside town.
The benchmark network parameters are given in Table 1, together with numerical implementation parameters omegamax = 60, Ndft = 256. The buffers Δ and exposures Ω are all taken to be Gamma-distributed with shape parameter k = 3, and means μ and standard deviations that depend on type.
Table 1.
Benchmark Parameters: Note that is the expected daily number of social contacts of a type T′ individual to type T individuals.
Resident T = 1 | Worker T = 2 | Outsider T = 3 | |
---|---|---|---|
γ(T) | 0.09 | 0.09 | 0.09 |
β(T) | 0.3 | 0.3 | 0.3 |
z(T) | 0.20 | 0.20 | 0.20 |
0.01 | 0.005 | 0.985 | |
4 | 5 | 0 | |
10 | 5 | 4 | |
0 | 0.0203 | 20 | |
μΩ(1, T) | 7 | 7 | 7 |
μΩ(2, T) | 4 | 4 | 4 |
μΩ(3, T) | 6 | 6 | 6 |
μΔ(T) | 20 | 30 | 30 |
The upper left plot of Fig. 1 shows the daily exposed, infective and removed fractions for the three types, in the benchmark SEIR model without further policy interventions, plotted from the day that the number of exposed outsiders exceeds 1% of the population. We see that the contagion starts in the outside community, but rapidly invades the centre, resulting in similar infection rates, with a time delay. One can interpret the result as overlapping sub-epidemics: the first hits the outside community, while a second and third hit the residence workers and residents about 16 and 22 days later, respectively. One can see that the strategy failed for two reasons: first, the contagion was allowed to gain a foothold in the centre and infect a resident; second, the hygiene within the centre was not adequate to contain the resulting seed infection.
Fig. 1.
Fractional contagion size by type and compartment in the Senior’s Residential Centre Model of Section 6. Top Left: Benchmark strategy; top right: Strategy A; bottom left: Strategy B; bottom right: both Strategies A and B.
What further policy improvements implemented by the management might lead to a better result? The remaining plots in Fig. 1 show the results for several combinations of policy interventions. Strategy A is to improve internal hygiene by quarantining all residents and dramatically reducing contacts between workers: λ(1, 1) changes from 4 to 0.5 and λ(2, 2) changes from 5 to 1. Strategy B is to dramatically reduce the connectivity between the centre and the outside: λ(2, 3) changes from 4 to 0.5. We observe Strategy A manages to reduce the contagion to about 10% of the residents, but allows a continual reintroduction of infection from outside. Strategy B fails outright: reducing the connections to outsiders simply delays the onset of contagion within the centre by about 30 days. However, the combination of both strategies A and B led to a success in keeping 97% of the residents healthy.
These policy interventions target the social connectivity in the network through social distancing and quarantine. Another important channel would be to reduce the mean viral exposures entering in the exposure PDFs, by measures such as encouraging more cleanliness and the use of masks. Yet another channel is to improve individual immunity buffers by vaccination or other health improvements.
Large N networks typically exhibit “resilient” states that are intrinsically resistant to contagion and “susceptible” states that amplify any introduced infection. Moreover they can be made to transition discontinuously from a resilient state to a susceptible state by varying a key parameter, such as the infective contact parameter z that measures the degree of social distancing in the network. Fig. 2 shows the long-time values of the removed fractions, as functions of z. One sees the remarkable transition from resilient to susceptible at a critical value z∗ ~ 0.106. This single graph shows clearly the general principle that any contagion can be prevented at the outset by sufficiently strong restrictions on social interactions.
Fig. 2.
Final removed fractions as a function of z for the benchmark Senior’s Residential Centre Model of Section 6.
7. Discussion
The primary intention of this paper is to set out the fundamental assumptions and their consequences, for a novel network approach to epidemic modelling in very heterogeneous settings. To keep focussed on this aim, many potential examples and avenues of inquiry have not been explored here. Instead, let us end this paper by discussing briefly how the novel features of the IRSN framework can be used in different fields to improve our understanding of COVID-19.
-
1.
To inform health policy: A wide variety of scenarios such as the spread of disease between communities can be explored within this framework. Once the IRSN model has been fully specified and calibrated to a real world setting, the analytical algorithm is straightforward to run. Since the IRSN starts with very different assumptions to standard tools such as the compartment ODE models, the exercise of implementing the IRSN forces policy makers to think in a different way about epidemics. This kind of modelling exercise will lead to more robust and reliable decisions that depend less on specific underlying assumptions.
-
2.
To inform health research: The IRSN framework can be extended to encompass a broad set of characteristics that describe the immunology of COVID, the behaviour of human society and the effect of public health policy. Many details of the disease, particularly those connected with the threshold picture of viral transmission, are still inadequately understood. The IRSN can be used by researchers to study which gaps in data and knowledge may be leading to the greatest uncertainties in projections. This will suggest where scarce research funding should be best deployed.
-
3.
To inform network science: The large N analytical shortcut used in this paper is well known in network science, but has not yet been used in disease modelling. The meaning, accuracy and limitations of this shortcut will be of interest to other network modellers. A particularly interesting subtlety worthy of further research is to better understand the selection biases inherent in cascade dynamics on stochastic networks. As well, the IRSN setting, being very analogous to the Inhomogeneous Random Financial Networks introduced in Hurd (2019), should be deployable in many other network applications. Modellers will observe that computational complexity is determined by the parameters Ndft, M, and it is of interest to explore the tradeoffs when allocating computational resources to a complex modelling problem at hand.
-
4.
As a teaching tool: Being easy to run on MATLAB or Python, the IRSN framework can be used in higher education as a learning and visualization tool that focusses on mathematical modelling assumptions for epidemics and their consequences. More broadly, these tools may be helpful in fostering public awareness of the most important societal issues, notably the effectiveness of targeted social distancing, that successful COVID health policy must address.
Author statement
This paper is the sole work of the named author.
Acknowledgements
This project was funded by the Natural Sciences and Engineering Research Council of Canada and the McMaster University COVID-19 Research Fund. The author is grateful to Hassan Chehaitli, Vladimir Nosov, Weijie Pang and Irshaad Oozeer for extensive discussions during the writing of this paper.
Handling editor: Dr. J Wu
Footnotes
Peer review under responsibility of KeAi Communications Co., Ltd.
References
- Anderson Roy M., Robert M. Oxford University Press; Oxford: 1992. Infectious Diseases of humans: Dynamics and control. May. [Google Scholar]
- Bollobás Béla, Janson Svante, Riordan Oliver. The phase transition in inhomogeneous random graphs. Random Structures and Algorithms. August 2007;31(1):3–122. [Google Scholar]
- Chung Fan, Lu Linyuan. Connected components in random graphs with given expected degree sequences. Annals of Combinatorics. 2002;6(2):125–145. ISSN 0218-0006. [Google Scholar]
- Danon Leon, Ford Ashley P., House Thomas, Jewell Chris P., Keeling Matt J., Roberts Gareth O., Ross Joshua V., Vernon Matthew C. Networks and the epidemiology of infectious disease. Interdiscip Perspect Infect Dis. 2011:284909. doi: 10.1155/2011/284909. 2011. ISSN 1687-7098 (Electronic); 1687-708X (Print); 1687-708X (Linking) [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferguson Neil, Ghani Azra C. Imperial College COVID-19 Response Team; March 2020. The global impact of covid-19 and strategies for mitigation and suppression. Technical Report 12. [Google Scholar]
- Haas Charles N. Microbial dose response modeling: Past, present, and future. Environmental Science & Technology. 2015;49(3):1245–1259. doi: 10.1021/es504422q. [DOI] [PubMed] [Google Scholar]
- van der Hofstad R. 2016. Random graphs and complex networks: Volumes I and II.http://www.win.tue.nl/rhofstad/NotesRGCN.html Book, to be published. [Google Scholar]
- Hurd T.R. McMaster University; 2019. Systemic cascades on inhomogeneous random financial networks: Analytics. Technical report. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hurd T.R., Gleeson James P. On Watts cascade model with random link weights. Journal of Complex Networks. 2013;1(1):25–43. [Google Scholar]
- Jones Terry C., Mühlemann Barbara, Veith Talitha, Guido Biele, Zuchowski Marta, Hoffmann Jörg, Stein Angela, Edelmann Anke, Corman Victor Max, Drosten Christian. 2020. An Analysis of Sars-Cov-2 Viral Load by Patient Age.https://www.medrxiv.org/content/early/2020/06/09/2020.06.08.20125484 medRxiv. [DOI] [Google Scholar]
- Keeling Matt J., Eames Ken T.D. Networks and epidemic models. Journal of The Royal Society Interface. 2005;2(4):295–307. doi: 10.1098/rsif.2005.0051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kermack William Ogilvy, McKendrick A.G., Walker Gilbert Thomas. A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society of London - Series A: Containing Papers of a Mathematical and Physical Character. 1927;115(772):700–721. doi: 10.1098/rspa.1927.0118. https://royalsocietypublishing.org/doi/abs/10.1098/rspa.1927.0118 [DOI] [Google Scholar]
- Mistry Dina, Litvinova Maria, Piontti Ana Pastore y, Chinazzi Matteo, Fumanelli Laura, Marcelo F., Gomes C., Haque Syed A., Liu Quan-Hui, Mu Kunpeng, Xiong Xinyue, Elizabeth Halloran M., Ira M., Longini, Merler Stefano, Ajelli Marco, Vespignani Alessandro. 2020. Inferring high-resolution human mixing patterns for disease modeling. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mossong Jol, Hens Niel, Jit Mark, Beutels Philippe, Auranen Kari, Mikolajczyk Rafael, Massari Marco, Salmaso Stefania, Tomba Gianpaolo Scalia, Wallinga Jacco, Heijne Janneke, Sadkowska-Todys Malgorzata, Rosinska Magdalena, John Edmunds W. Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Medicine. 03 2008;5(3) doi: 10.1371/journal.pmed.0050074. 1–1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Newman M. Oxford University Press; Oxford/New York: 2010. Networks: An introduction. [Google Scholar]
- Pastor-Satorras Romualdo, Castellano Claudio, Van Mieghem Piet, Vespignani Alessandro. Epidemic processes in complex networks. Reviews of Modern Physics. Aug 2015;87:925–979. doi: 10.1103/RevModPhys.87.925. https://link.aps.org/doi/10.1103/RevModPhys.87.925 [DOI] [Google Scholar]
- Pellis Lorenzo, Ball Frank, Bansal Shweta, Eames Ken, House Thomas, Isham Valerie, Trapman Pieter. Eight challenges for network epidemic models. Epidemics. 2015;10(58 – 62) doi: 10.1016/j.epidem.2014.07.003. ISSN 1755-4365. Challenges in Modelling Infectious Disease Dynamics. [DOI] [PubMed] [Google Scholar]
- Prem Kiesha, Cook Alex R., Jit Mark. Projecting social contact matrices in 152 countries using contact surveys and demographic data. PLoS Computational Biology. 09 2017;13(9):1–21. doi: 10.1371/journal.pcbi.1005697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Watts Duncan J. A simple model of global cascades on random networks. Proceedings of the National Academy of Sciences. 2002;99(9):5766–5771. doi: 10.1073/pnas.082090499. [DOI] [PMC free article] [PubMed] [Google Scholar]