Second look at the spread of epidemics on networks

Eben Kenah; James M Robins

doi:10.1103/PhysRevE.76.036113

. Author manuscript; available in PMC: 2008 Jan 28.

Published in final edited form as: Phys Rev E Stat Nonlin Soft Matter Phys. 2007 Sep 25;76(3 Pt 2):036113. doi: 10.1103/PhysRevE.76.036113

Second look at the spread of epidemics on networks

Eben Kenah ^1,^*, James M Robins ¹

PMCID: PMC2215389 NIHMSID: NIHMS36857 PMID: 17930312

Abstract

In an important paper, M.E.J. Newman claimed that a general network-based stochastic Susceptible-Infectious-Removed (SIR) epidemic model is isomorphic to a bond percolation model, where the bonds are the edges of the contact network and the bond occupation probability is equal to the marginal probability of transmission from an infected node to a susceptible neighbor. In this paper, we show that this isomorphism is incorrect and define a semi-directed random network we call the epidemic percolation network that is exactly isomorphic to the SIR epidemic model in any finite population. In the limit of a large population, (i) the distribution of (self-limited) outbreak sizes is identical to the size distribution of (small) out-components, (ii) the epidemic threshold corresponds to the phase transition where a giant strongly-connected component appears, (iii) the probability of a large epidemic is equal to the probability that an initial infection occurs in the giant in-component, and (iv) the relative final size of an epidemic is equal to the proportion of the network contained in the giant out-component. For the SIR model considered by Newman, we show that the epidemic percolation network predicts the same mean outbreak size below the epidemic threshold, the same epidemic threshold, and the same final size of an epidemic as the bond percolation model. However, the bond percolation model fails to predict the correct outbreak size distribution and probability of an epidemic when there is a nondegenerate infectious period distribution. We confirm our findings by comparing predictions from percolation networks and bond percolation models to the results of simulations. In an appendix, we show that an isomorphism to an epidemic percolation network can be defined for any time-homogeneous stochastic SIR model.

1 Introduction

In an important paper, M. E. J. Newman studied a network-based Susceptible-Infectious-Removed (SIR) epidemic model in which infection is transmitted through a network of contacts between individuals [1]. The contact network itself is a random undirected network with an arbitrary degree distribution of the form studied by Newman, Strogatz, and Watts [2]. Given the degree distribution, these networks are maximally random, so they have no small loops and no degree correlations in the limit of a large population [2-4].

In the stochastic SIR model considered by Newman, the probability that an infected node i makes infectious contact with a neighbor j is given by T_ij = 1 − exp(− β_ijτ_i), where β_ij is the rate of infectious contact from i to j and τ_i is the time that i remains infectious. (We use infectious contact to mean a contact that results in infection if and only if the recipient is susceptible.) The infectious period τ_i is a random variable with the cumulative distribution function (cdf) F(τ), and the infectious contact rate β_ij is a random variable with the cdf F(β). The infectious periods for all individuals are independent and identically distributed (iid), and the infectious contact rates for all ordered pairs of individuals are iid.

Under these assumptions, Newman claimed that the spread of disease on the contact network is exactly isomorphic to a bond percolation model on the contact network with bond occupation probability equal to the a priori probability of disease transmission between any two connected nodes in the contact network [1]. This probability is called the transmissibility and denoted by T:

T = 〈 T_{i j} 〉 = \int_{0}^{\infty} \int_{0}^{\infty} (1 - e^{- β_{i j} τ_{i}}) d F (β_{i j}) d F (τ_{i}) .

(1)

Newman used this bond percolation model to derive the distribution of finite outbreak sizes, the critical transmissibility T_c that defines the epidemic (i.e., percolation) threshold, and the probability and relative final size of an epidemic (i.e., an outbreak that never goes extinct).

As a counterexample, consider a contact network where each subject has exactly two contacts. Assume that (i) τ_i = τ₀ > 0 with probability p and τ_i = 0 with probability 1 − p and (ii) β_ij = β₀ > 0 with probability one for all ij. Under the SIR model, the probability that the infection of a randomly chosen node results in an outbreak of size one is $p_{1} = 1 - p + p e^{- 2 β_{0} τ_{0}}$ , which is the sum of the probability 1−p that τ = 0 and the probability $p e^{- 2 β_{0} τ_{0}}$ that τ = τ₀ and disease is not transmitted to either contact. Under the bond percolation model, the probability of a cluster of size one is $p_{1}^{bond} = {(1 - p + p e^{- β_{0} τ_{0}})}^{2}$ , corresponding to the probability that neither of the bonds incident to the node are occupied. Since

p_{1} - p_{1}^{bond} = p (1 - p) {(1 - e^{- β_{0} τ_{0}})}^{2},

the bond percolation model correctly predicts the probability of an outbreak of size one only if p = 0 or p = 1. When the infectious period is not constant, it underestimates this probability. The supremum of the error is 0.25, which occurs when p = 0.5 and τ₀ → ∞. In this limit, the SIR model corresponds to a site percolation model rather than a bond percolation model.

When the distribution of infectious periods is nondegenerate, there is no bond occupation probability that will make the bond percolation model isomorphic to the SIR model. To see why, suppose node i has infectious period τ_i and degree n_i in the contact network. In the epidemic model, the conditional probability that i transmits infection to a neighbor j in the contact network given τ_i is

T_{τ_{i}} = \int_{0}^{\infty} (1 - e^{- β_{i j} τ_{i}}) d F (β_{i j}) .

(2)

Since the contact rate pairs for all n_i edges incident to i are iid, the transmission events across these edges are (conditionally) independent Bernoulli(T_{τ_i}) random variables; but the transmission probabilities are strictly increasing in τ_i, so the transmission events are (marginally) dependent unless τ_i = τ₀ with probability one for some fixed τ₀. In contrast, the bond percolation model treats the infections generated by node i as n_i (marginally) independent Bernoulli(T) random variables regardless of the distribution of τ_i. Neither counterexample assumes anything about the global properties of the contact network, so Newman’s claim cannot be justified as an approximation in the limit of a large network with no small loops.

In Section 2, we define a semi-directed random network called the epidemic percolation network and show how it can be used to predict the outbreak size distribution, the epidemic threshold, and the probability and final size of an epidemic in the limit of a large population for any time-homogeneous SIR model. In Section 3, we show that the network-based stochastic SIR model from [1] can be analyzed correctly using a semi-directed random network of the type studied by Boguñá and Serrano [3]. In Section 4, we show that it predicts the same epidemic threshold, mean outbreak size below the epidemic threshold, and relative final size of an epidemic as the bond percolation model. In Section 5, we show that the bond percolation model fails to predict the distribution of outbreak sizes and the probability of an epidemic when the distribution of infectious periods is nondegenerate. In Section 6, we compare predictions made by epidemic percolation networks and bond percolation models to the results of simulations. In an appendix, we define epidemic percolation networks for a very general time-homogeneous stochastic SIR model and show that their out-components are isomorphic to the distribution of possible outcomes of the SIR model for any given set of imported infections.

2 Epidemic percolation networks

Consider a node i with degree n_i in the contact network and infectious period τ_i. In the SIR model defined above, the number of people who will transmit infection to i if they become infectious has a binomial(n_i, T) distribution regardless of τ_i. If i is infected along one of the n_i edges, then the number of people to whom i will transmit infection has a binomial(n_i − 1, T_{τ_i}) distribution. In order to produce the correct joint distribution of the number of people who will transmit infection to i and the number of people to whom i will transmit infection, we represent the former by directed edges that terminate at i and the latter by directed edges that originate at i. Since there can be at most one transmission of infection between any two persons, we replace pairs of directed edges between two nodes with a single undirected edge.

Starting from the contact network, a single realization of the epidemic percolation network can be generated as follows:

Choose a recovery period τ_i for every node i in the network and choose a contact rate β_ij for every ordered pair of connected nodes i and j in the contact network.
For each pair of connected nodes i and j in the contact network, convert the undirected edge between them to a directed edge from i to j with probability $(1 - e^{- β_{i j} τ_{i}}) e^{- β_{j i} τ_{j}}$ , to a directed edge from j to i with probability $e^{- β_{i j} τ_{i}} (1 - e^{- β_{j i} τ_{j}})$ , and erase the edge completely with probability $e^{- β_{i j} τ_{i} - β_{j i} τ_{j}}$ . The edge remains undirected with probability $(1 - e^{- β_{i j} τ_{i}}) (1 - e^{- β_{j i} τ_{j}})$ .

The epidemic percolation network is a semi-directed random network that represents a single realization of the infectious contact process for each connected pair of nodes, so 4^m possible percolation networks exist for a contact network with m edges. The probability of each possible network is determined by the underlying SIR model. The epidemic percolation network is very similar to the locally dependent random graph defined by Kuulasmaa [5] for an epidemic on a d-dimensional lattice. There are two important differences: First, the underlying structure of the contact network is not assumed to be a lattice. Second, we replace pairs of (occupied) directed edges between two nodes with a single undirected edge so that its component structure can be analyzed using a generating function formalism.

In the Appendix, we prove that the size distribution of outbreaks starting from any node in a time-homogeneous stochastic SIR model is identical to the distribution of its out-component sizes in the corresponding probability space of percolation networks. Since this result applies to any time-homogeneous SIR model, it can be used to analyze network-based models, fully-mixed models (see [6]), and models with multiple levels of mixing.

2.1 Components of semi-directed networks

In this section, we briefly review the structure of directed and semi-directed networks as discussed in [3, 4, 7, 8]. In the next section, we relate this to the possible outcomes of an SIR model.

The indegree and outdegree of node i are the number of incoming and outgoing directed edges incident to i. Since each directed edge is an outgoing edge for one node and an incoming edge for another node, the mean indegree and outdegree are equal. The undirected degree of node i is the number of undirected edges incident to i. The size of a component is the number of nodes it contains and its relative size is its size divided by the total size of the network.

The out-component of node i includes i and all nodes that can be reached from i by following a series of edges in the proper direction (undirected edges are bidirectional). The in-component of node i includes i and all nodes from which i can be reached by following a series of edges in the proper direction. By definition, node i is in the in-component of node j if and only if j is in the out-component of i. Therefore, the mean in- and out-component sizes in any (semi-)directed network are equal.

The strongly-connected component of a node i is the intersection of its in- and out-components; it is the set of all nodes that can be reached from node i and from which node i can be reached. All nodes in a strongly-connected component have the same in-component and the same out-component. The weakly-connected component of node i is the set of nodes that are connected to i when the direction of the edges is ignored.

For giant components, we use the definitions given in [8, 9]. Giant components have asymptotically positive relative size in the limit of a large population. All other components are “small” in the sense that they have asymptotically zero relative size. There are two phase transitions in a semi-directed network: One where a unique giant weakly-connected component (GWCC) emerges and another where unique giant in-, out-, and strongly-connected components (GIN, GOUT, and GSCC) emerge. The GWCC contains the other three giant components. The GSCC is the intersection of the GIN and the GOUT, which are the common in- and out-components of nodes in the GSCC. Tendrils are components in the GWCC that are outside the GIN and the GOUT. Tubes are directed paths from the GIN to the GOUT that do not intersect the GSCC. All tendrils and tubes are small components. A schematic representation of these components is shown in Figure (1).

Schematic diagram of the giant components, tendrils, and tubes of a supercritical semi-directed network. Adapted from Broder *et al.* [7] and Dorogovtsev *et al.* [8].

2.2 Epidemic percolation networks and epidemics

An outbreak begins when one or more nodes are infected from outside the population. These are called imported infections. The final size of an outbreak is the number of nodes that are infected before the end of transmission, and its relative final size is its final size divided by the total size of the network. In the epidemic percolation network, the nodes infected in the outbreak can be identified with the nodes in the out-components of the imported infections. This identification is made mathematically precise in the Appendix.

Informally, we define a self-limited outbreak to be an outbreak whose relative final size approaches zero in the limit of a large population and an epidemic to be an outbreak whose relative final size is positive in the limit of a large population. There is a critical transmissibility T_c that defines the epidemic threshold: The probability of an epidemic is zero when T ≤ T_c, and the probability and final size of an epidemic are positive when T > T_c [1, 10-12].

If all out-components in the epidemic percolation network are small, then only self-limited outbreaks are possible. If the percolation network contains a GSCC, then any infection in the GIN will lead to the infection of the entire GOUT. Therefore, the epidemic threshold corresponds to the emergence of the GSCC in the percolation network. For any finite set of imported infections, the probability of an epidemic is equal to the probability that at least one imported infection occurs in the GIN. The relative final size of an epidemic is equal to the proportion of the network contained in the GOUT. Although some nodes outside the GOUT may be infected (e.g. nodes in tendrils and tubes), they constitute a finite number of small components whose total relative size is asymptotically zero.

3 Analysis of the SIR model

To analyze the SIR model from [1], we first calculate the probability generating function (pgf) of the degree distribution of the corresponding epidemic percolation network. Then we use methods developed by Boguñá and Serrano [3] and Meyers et al. [4] to calculate the in- and out-component size distributions and the relative sizes of the GIN, GOUT, and GSCC.

3.1 Degree distribution

If p_n is the probability that a node has degree n in the contact network, then

G (z) = \sum_{n = 1}^{\infty} p_{n} z^{n}

is the probability generating function (pgf) for the degree distribution of the contact network. If p_jkm is the probability that a node in the epidemic percolation network has j incoming edges, k outgoing edges, and m undirected edges, then

G (x, y, u) = \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} \sum_{m = 0}^{\infty} p_{j k m} x^{j} y^{k} u^{m}

is the pgf for the degree distribution of the percolation network. Suppose nodes i and j are connected in the contact network with contact rates (β_ij, β_ji) and infectious periods τ_i and τ_j. Let $g (x, y, u ∣ β_{i j}, β_{j i}, τ_{i}, τ_{j})$ be the conditional pgf for the number of incoming, outgoing, and undirected edges incident to i that appear between i and j in the percolation network. Then

\begin{array}{l} g (x, y, u ∣ β_{i j}, β_{j i}, τ_{i}, τ_{j}) = e^{- β_{i j} τ_{i} - β_{j i} τ_{j}} + e^{- β_{i j} τ_{i}} (1 - e^{- β_{j i} τ_{j}}) x \\ + (1 - e^{- β_{i j} τ_{i}}) e^{- β_{j i} τ_{j}} y + (1 - e^{- β_{i j} τ_{i}}) (1 - e^{- β_{j i} τ_{j}}) u . \end{array}

Given τ_i, the conditional pgf for the number of incoming, outgoing, and undirected edges incident to i that appear in the percolation network between i and any neighbor of i in the contact network is

\begin{array}{l} g (x, y, u ∣ τ_{i}) = \int_{0}^{\infty} \int_{0}^{\infty} \int_{0}^{\infty} g (x, y, u ∣ β_{i j}, β_{j i}, τ_{i}, τ_{j}) d F (β_{i j}) d F (β_{j i}) d F (τ_{j}) \\ = (1 - T_{τ_{i}}) (1 - T) + (1 - T_{τ_{i}}) T x + T_{τ_{i}} (1 - T) y + T_{τ_{i}} T_{u} . \end{array}

(3)

The pgf for the degree distribution of a node with infectious period τ_i is

G (x, y, u ∣ τ_{i}) = {\sum_{n = o}^{\infty} p_{n} (g (x, y, u ∣ τ_{i}))}^{n} = G (g (x, y, u ∣ τ_{i})) .

(4)

Finally, the pgf for the degree distribution of the epidemic percolation network is

G (x, y, u) = \int_{0}^{\infty} G (x, y, u ∣ τ_{i}) d F (τ_{i}) .

(5)

If a, b, and c are nonnegative integers, let G^(a,b,c)(x,y,u) be the derivative obtained after differentiating a times with respect to x, b times with respect to y, and c times with respect to u. Then the mean indegree and outdegree of the percolation network are

〈 k_{d} 〉 = G^{(1, 0, 0)} (1, 1, 1) = G^{(0, 1, 0)} (1, 1, 1) = (1 - T) G^{'} (1),

and the mean undirected degree is

〈 k_{u} 〉 = G^{(0, 0, 1)} (1, 1, 1) = T^{2} G^{'} (1)

3.2 Generating functions

When the contact network underlying an SIR epidemic model is an undirected random network with an arbitrary degree distribution, the pgf of its degree distribution can be used to calculate the distribution of small component sizes, the percolation threshold, and the relative sizes of the GIN, GOUT, and GSCC using methods developed by Boguñá and Serrano [3] and Meyers et al. [4]. These methods generalize earlier methods for undirected and purely directed networks [1, 2, 13-16]. In this section, we review these results and introduce notation that will be used in the rest of the paper. We discuss the case of networks with no two-point degree correlations, which is sufficient to analyze the SIR model from [1].

Let G_f (x, y, u) be the pgf for the degree distribution of a node reached by going forward along a directed edge, excluding the edge used to reach the node. Since the probability of reaching any node by following a directed edge is proportional to its indegree.

G_{f} (x, y, u) = \frac{1}{〈 k_{d} 〉} \sum_{j, k, m} j p_{j k m} x^{j - 1} y^{k} u^{m} = \frac{1}{〈 k_{d} 〉} G^{(1, 0, 0)} (x, y, u) .

(6)

Similarly, the pgf for the degree distribution of a node reached by going in reverse along a directed edge (excluding the edge used to reach the node) is

G_{r} (x, y, u) = \frac{1}{〈 k_{d} 〉} G^{(0, 1, 0)} (x, y, u),

(7)

and the pgf for the degree distribution of a node reached by going to the end of an undirected edge (excluding the edge used to reach the node) is

G_{u} (x, y, u) = \frac{1}{〈 k_{u} 〉} G^{(0, 0, 1)} (x, y, u) .

(8)

3.2.1 Out-components

Let $H_{f}^{out} (z)$ be the pgf for the size of the out-component at the end of a directed edge and $H_{u}^{out} (z)$ be the pgf for the size of the out-component at the “end” of an undirected edge. Then, in the limit of a large population,

H_{f}^{out} (z) = z G_{f} (1, H_{f}^{out} (z), H_{u}^{out} (z)),

(9a)

H_{u}^{out} (z) = z G_{u} (1, H_{f}^{out} (z), H_{u}^{out} (z)) .

(9b)

The pgf for the out-component size of a randomly chosen node is

H^{out} (z) = z G (1, H_{f}^{out} (z), H_{u}^{out} (z)) .

(10)

The probability that a node has a finite out-component in the limit of a large population is H^out(1), so the probability that a randomly chosen node is in the GIN is 1 − H^out(1).

The coefficients on z⁰ in $H_{f}^{out} (z)$ and $H_{u}^{out} (z)$ are G_f(1,0,0) and G_u(1,0,0) respectively. Therefore, power series for $H_{f}^{out} (z)$ and $H_{u}^{out} (z)$ can be computed to any desired order by iterating equations (9a) and (9b). A power series for $H^{out} (z)$ can then be obtained using equation (10). For any z ∈ [0, 1], $H_{f}^{out} (z)$ and $H_{u}^{out} (z)$ can be calculated with arbitrary precision by iterating equations (9a) and (9b) starting from initial values y₀, u₀ ∈ [0, 1). Estimates of $H_{f}^{out} (z)$ and $H_{u}^{out} (z)$ can be used to estimate $H^{out} (z)$ with arbitrary precision.

The expected size of the out-component of a randomly chosen node below the epidemic threshold is $H^{out'} (1)$ . Taking derivatives in (10) yields

H^{out'} (1) = 1 + 〈 k_{d} 〉 H_{f}^{out'} (1) + (k_{u}) H_{u}^{out'} (1) .

(11)

Taking derivatives in equations (9a) and (9b) and using the fact that $H_{f}^{out} (1) = H_{u}^{out} (1) = 1$ below the epidemic threshold yields a set of linear equations for $H_{f}^{out'} (1)$ and $H_{u}^{out'} (1)$ . These can be solved to yield

H_{f}^{out'} (1) = \frac{1 + G_{f}^{(0, 0, 1)} - G_{u}^{(0, 0, 1)}}{(1 - G_{f}^{(0, 1, 0)}) (1 - G_{u}^{(0, 0, 1)}) - G_{f}^{(0, 0, 1)} G_{u}^{(0, 1, 0)}}

(12)

and

H_{u}^{out'} (1) = \frac{1 - G_{f}^{(0, 1, 0)} + G_{u}^{(0, 1, 0)}}{(1 - G_{f}^{(0, 1, 0)}) (1 - G_{u}^{(0, 0, 1)}) - G_{f}^{(0, 0, 1)} G_{u}^{(0, 1, 0)}},

(13)

where the argument of all derivatives is (1, 1, 1).

3.2.2 In-components

The in-component size distribution of a semi-directed network can be derived using the same logic used to find the out-component size distribution, except that we consider going backwards along directed edges. Let $H_{r}^{in} (z)$ be the pgf for the size of the in-component at the beginning of a directed edge, $H_{u}^{in} (z)$ be the pgf for the size of the in-component at the “beginning” of an undirected edge, and $H^{in} (z)$ be the pgf for the in-component size of a randomly chosen node. Then, in the limit of a large population,

H_{r}^{in} (z) = z G_{r} (H_{r}^{in} (z), 1, H_{u}^{in} (z)),

(14a)

H_{u}^{in} (z) = z G_{u} (H_{r}^{in} (z), 1, H_{u}^{in} (z)),

(14b)

H^{in} (z) = z G (H_{r}^{in} (z), 1, H_{u}^{in} (z)) .

(14c)

The probability that a node has a finite in-component is Hⁱⁿ(1), so the probability that a randomly chosen node is in the GOUT is 1 − Hⁱⁿ(1). The expected size of the in-component of a randomly chosen node is H^in′(1). Power series and numerical estimates for $H_{r}^{in} (z)$ , $H_{u}^{in} (z)$ and Hⁱⁿ(z) can be obtained by iterating these equations.

The expected size of the out-component of a randomly chosen node below the epidemic threshold is H^in′(1). Taking derivatives in equation (14c) yields

H^{in'} (1) = 1 + 〈 k_{d} 〉 H_{r}^{in'} (1) + 〈 k_{u} 〉 H_{u}^{in'} (1) .

(15)

Taking derivatives in equations (14a) and (14b) and using the fact that $H_{r}^{in} (1) = H_{u}^{in} (1) = 1$ in a subcritical network yields

H_{r}^{in'} (1) = \frac{1 + G_{r}^{(0, 0, 1)} - G_{u}^{(0, 0, 1)}}{(1 - G_{r}^{(1, 0, 0)}) (1 - G_{u}^{(0, 0, 1)}) - G_{r}^{(0, 0, 1)} G_{u}^{(1, 0, 0)}}

(16)

and

H_{u}^{in'} (1) = \frac{1 - G_{r}^{(1, 0, 0)} + G_{u}^{(1, 0, 0)}}{(1 - G_{r}^{(1, 0, 0)}) (1 - G_{u}^{(0, 0, 1)}) - G_{r}^{(0, 0, 1)} G_{u}^{(1, 0, 0)}},

(17)

where the argument of all derivatives is (1, 1, 1).

3.2.3 Epidemic threshold

The epidemic threshold occurs when the expected size of the in- and out-components in the network becomes infinite. This occurs when the denominators in equations (12) and (13) and equations (16) and (17) approach zero.

From the definitions of G_f(x, y, u), G_r(x, y, u) and G_u(x, y, u), both conditions are equivalent to

(1 - \frac{1}{〈 k_{d} 〉} G^{(1, 1, 0)}) (1 - \frac{1}{〈 k_{u} 〉} G^{(0, 0, 2)}) - \frac{1}{〈 k_{d} 〉 〈 k_{u} 〉} G^{(1, 0, 1)} G^{(0, 1, 1)} = 0 .

Therefore, there is a single epidemic threshold where the GSCC, the GIN, and the GOUT appear simultaneously in both purely directed networks [1,2,13-16] and semi-directed networks [3,4].

3.2.4 Giant strongly-connected component

A node is in the GSCC if its in- and out-components are both infinite. A randomly chosen node has a finite in-component with probability $G (H_{r}^{in} (1), 1, H_{u}^{in} (1))$ and a finite out-component with probability $G (1, H_{f}^{out} (1), H_{u}^{out} (1))$ . The probability that a node reached by following an undirected edge has finite in- and out-components is the solution to the equation

υ = G_{u} (H_{r}^{in} (1), H_{f}^{out} (1), υ),

and the probability that a randomly chosen node has finite in- and out-components is $G (H_{r}^{in} (1), H_{u}^{in} (1), υ)$ [3]. Thus, the relative size of the GSCC is

1 - G (H_{r}^{in} (1), 1, H_{u}^{in} (1)) - G (1, H_{f}^{out} (1), H_{u}^{out} (1)) + G (H_{r}^{in} (1), H_{f}^{out} (1), υ) .

4 In-components

In this section, we prove that the in-component size distribution of the epidemic percolation network for the SIR model from [1] is identical to the component size distribution of the bond percolation model with bond occupation probability T. The probability generating function for the total number of incoming and undirected edges incident to any node i is

G (x, 1, x ∣ τ_{i}) = G (g (x, 1, x ∣ τ_{i})) = G (1 - T + T x),

which is independent of τ_i. If node i has degree n_i in the contact network, then the number of nodes we can reach by going in reverse along a directed edge or an undirected edge has a binomial(n_i, T) distribution regardless of τ_i. If we reach node i by going backwards along edges, the number of nodes we can reach from i by continuing to go backwards (excluding the node from which we arrived) has a binomial(n_i − 1, T) distribution. Therefore, the in-component of any node in the percolation network is exactly like a component of a bond percolation model with occupation probability T. This argument was used to justify the mapping from an epidemic model to a bond percolation model in [1], but it does not apply to the out-components of the epidemic percolation network.

Methods of calculating the component size distribution of an undirected random network with an arbitrary degree distribution using the pgf of its degree distribution were developed by Newman et al. [2, 13-16]. These methods were used to analyze the bond percolation model of disease transmission [1], obtaining results similar to those obtained by Andersson [17] for the epidemic threshold and the final size of an epidemic. In this paragraph, we review these results and introduce notation that will be used in this section. Let $G (u)$ be the pgf for the degree distribution of the contact network. Then the pgf for the degree of a node reached by following an edge (excluding the edge used to reach that node) is $ G_{1} (u) = {〈 n 〉}^{- 1} G^{'} (u),$ , where $〈 n 〉 = G^{'} (1)$ is the mean degree of the contact network. With bond occupation probability T, the number of occupied edges adjacent to a randomly chosen node has the pgf $G (1 - T + T u)$ and the number of occupied edges from which infection can leave a node that has been infected along an edge has the pgf $G_{1} (1 - T + T u)$ . The pgf for the size of the component at the end of an edge is

H_{1} (z) = z G_{1} (1 - T + T H_{1} (z))

(18)

and the pgf for the size of the component of a randomly chosen node is

H_{0} (z) = z G (1 - T + T H_{1} (z)) .

(19)

The proportion of the network contained in the giant component is 1 − H₀(1), and the mean size of components below the percolation threshold is $H_{0}^{'} (1)$ . H₀(z) and H₁(z) can be expanded as power series to any desired degree by iterating equations (18) and (19), and their value for any fixed z ∈ [0, 1] can be found by iteration from an initial value z₀ ∈ [0, 1).

We can now prove that the distribution of component sizes in the bond percolation model is identical to the distribution of in-component sizes in the epidemic percolation network.

Lemma 1 G_r(x, y, u) = G_u(x, y, u) for all x, y, u.

Proof. From equation (7),

\begin{array}{l} G_{r} (x, y, u) = \frac{1}{T (1 - T) G^{'} (1)} G^{(0, 1, 0)} (x, y, u) \\ = \frac{1}{T G^{'} (1)} \int_{0}^{\infty} G^{'} (g (x, y, u ∣ τ_{i})) T_{τ_{i}} d F (τ_{i}) . \end{array}

From equation (8),

\begin{array}{l} G_{u} (x, y, u) = \frac{1}{T^{2} G^{'} (1)} G^{(0, 0, 1)} (x, y, u) \\ = \frac{1}{T G^{'} (1)} \int_{0}^{\infty} G^{'} (g (x, y, u ∣ τ_{i})) T_{τ_{i}} d F (τ_{i}) . \end{array}

Thus, the degree distribution of a node reached by going backwards along an edge is independent of whether it was a directed or undirected edge. ■

Lemma 2 $H_{r}^{in} (z) = H_{u}^{in} (z) = H_{1} (z)$ for all z.

Proof. From equations (14a) and (14b),

\begin{array}{l} H_{r}^{in} (z) = z G_{r} (H_{r}^{in} (z), 1, H_{u}^{in} (z)) \\ = z G_{u} (H_{r}^{in} (z), 1, H_{u}^{in} (z)) = H_{u}^{in} (z) . \end{array}

Let $H_{*}^{in} (z) = H_{u}^{in} (z) = H_{r}^{in} (z)$ . Since $g (x, 1, x ∣ τ_{i}) = 1 - T + T x$ for all τ_i,

\begin{array}{l} H_{*}^{in} (z) = \frac{z}{T G^{'} (1)} \int_{0}^{\infty} G^{'} (1 - T + T H_{*}^{in} (z)) T_{τ_{i}} d F (τ_{i}) \\ = \frac{z}{G^{'} (1)} G^{'} (1 - T + T H_{*}^{in} (z)) . \end{array}

From equation (18), we have

H_{1} (z) = \frac{z}{G^{'} (1)} G^{'} (1 - T + T H_{1}^{in} (z)) .

Since there is a unique pgf that solves this equation, $H_{*}^{in} (z) = H_{1} (z)$ . Thus, the in-component size distribution at the beginning of an edge is the same for directed and undirected edges, and it is identical to the distribution of component sizes at the end of an occupied edge in the bond percolation model. ■

Theorem 3 $H^{in} (z) = H_{0} (z)$ .

Proof. Let $H_{*}^{in} (z) = H_{r}^{in} (z) = H_{u}^{in} (Z)$ . From equation (14c), the probability generating function for the distribution of in-component sizes in the percolation network is

\begin{array}{l} H^{in} (z) = z G (H_{*}^{in} (z), 1, H_{*}^{in} (z)) \\ = z \int_{0}^{\infty} G (g (H_{*}^{in} (z), 1, H_{*}^{in} (z) ∣ τ_{i})) d F (τ_{i}) \\ = z G (1 - T + T H_{*}^{in} (z)) . \end{array}

When H₁(z) is substituted for $H_{*}^{in} (z)$ (which is justified by the previous Lemma), this is identical to equation (19) for H₀(z) in the bond percolation model. Since there is a unique pgf solution to this equation, Hⁱⁿ(z) = H₀(z), so the distribution of in-components in the percolation network is identical to the distribution of component sizes in the bond percolation model. ■

Since the mean size of out-components is equal to the mean size of in-components in any semi-directed network, the bond percolation model correctly predicts the mean size of outbreaks below the epidemic threshold. Since the mean sizes of in- and out-components diverge simultaneously, the bond percolation model also correctly predicts the critical transmissibility T_c. Since the probability of having a finite in-component in the percolation model is equal to the probability of being in a finite component of the bond percolation model, the bond percolation model also correctly predicts the final size of an epidemic.

5 Out-components

In this section, we prove that the distribution of out-component sizes in the epidemic percolation network for the SIR model from [1] is always different than the distribution of in-component sizes when there is a nondegenerate distribution of infectious periods. As a corollary, we find that the probability of an epidemic in the SIR model from the Introduction is always less than or equal to its final size, with equality only when epidemics have probability zero or the infectious period is constant. This is similar to a result obtained by Kuulasmaa and Zachary [18], who found that an SIR model defined on the d-dimensional integer lattice reduced to a bond percolation process if and only if the infectious period is constant.

The probability generating function for the total number of outgoing and undirected edges incident to a node i with infectious period τ_i is

G (1, y, y ∣ τ_{i}) = G (g (1, y, y ∣ τ_{i})) = G (1 - T_{τ_{i}} + T_{τ_{i}} y),

where T_{τ_i} is the conditional probability of transmission across each edge given τ_i, as defined in equation (2). The number of nodes we can reach by going forwards along edges starting from i has a Binomial(n_i, T_{τ_i}) distribution. If we reach a node j by following an edge, then the number of nodes we can reach from j by continuing to go forwards (excluding the node from which we arrived) has a binomial(k_j − 1, T_{τ_j}) distribution. Unless τ_i is constant, the out-components of the epidemic percolation network are not like the components of a bond percolation model.

Suppose i and j are connected in the contact network. The conditional transmission probability from j to i given τ_i is always T. Thus, an edge across which we leave any node is directed (i.e., outgoing) with probability 1 − T and undirected with probability T. This allows us to calculate the pgfs of the out-component distributions without differentiating between outgoing and undirected edges: Let

\begin{array}{l} G_{o} (x, y, u) = (1 - T) G_{f} (x, y, u) + T G_{u} (x, y, u) \\ = \frac{1}{G^{'} (1)} \int_{0}^{\infty} G^{'} (g (x, y, u ∣ τ_{i})) d F (τ_{i}) \end{array}

be the probability generating function for the degree distribution of a node that we reach by going forward along an outgoing or undirected edge (excluding the edge along which we arrived). Let

H_{*}^{out} (z) = (1 - T) H_{f}^{out} (z) + T H_{u}^{out} (z)

be the probability generating function for the size of the out-component at the end of an outgoing or undirected edge.

Lemma 4 For the SIR model from [1],

\begin{array}{l} H_{f}^{out} (z) = z G_{f} (1, H_{*}^{out} (z), H_{*}^{out} (z)), \\ H_{u}^{out} (z) = z G_{u} (1, H_{*}^{out} (z), H_{*}^{out} (z)), \\ H^{out} (z) = z G (1, H_{*}^{out} (z), H_{*}^{out} (z)), \end{array}

and we have the following self-similarity equation:

H_{*}^{out} (z) = z G_{o} (1, H_{*}^{out} (z), H_{*}^{out} (z)) .

Proof. From equation (3), we have

\begin{array}{l} g (1, (1 - T) y + T u, (1 - T) y + T u ∣ τ_{i}) = 1 - T_{τ_{i}} + T_{τ_{i}} [(1 - T) y + T u] \\ = g (1, y, u ∣ τ_{i}) \end{array}

for all y, u, and τ_i. This allows us to rewrite equation (9a):

\begin{array}{l} H_{f}^{out} (z) = z G_{f} (1, H_{f}^{out} (z), H_{u}^{out} (z)) \\ = \frac{z}{(1 - T) G^{'} (1)} \int_{0}^{\infty} G^{'} (g (1, H_{f}^{out} (z), H_{u}^{out} (z) ∣ τ_{i})) (1 - T_{τ_{i}}) d F (τ_{i}) \\ = \frac{z}{(1 - T) G^{'} (1)} \int_{0}^{\infty} G^{'} (g (1, H_{*}^{out} (z), H_{*}^{out} (z) ∣ τ_{i})) (1 - T_{τ_{i}}) d F (τ_{i}) \\ = z G_{f} (1, H_{*}^{out} (z), H_{*}^{out} (z)) . \end{array}

Similarly, we can rewrite equation (9b):

\begin{array}{l} H_{u}^{out} (z) = z G_{u} (1, H_{f}^{out} (z), H_{u}^{out} (z)) \\ = \frac{z}{T G^{'} (1)} \int_{0}^{\infty} G^{'} (g (1, H_{f}^{out} (z), H_{u}^{out} (z) ∣ τ_{i})) T_{τ_{i}} d F (τ_{i}) \\ = \frac{z}{T G^{'} (1)} \int_{0}^{\infty} G^{'} (g (1, H_{*}^{out} (z), H_{*}^{out} (z) ∣ τ_{i})) T_{τ_{i}} d F (τ_{i}) \\ = z G_{u} (1, H_{*}^{out} (z), H_{*}^{out} (z)) . \end{array}

Finally, we can rewrite equation (10):

\begin{array}{l} H^{out} (z) = z G (1, H_{f}^{out} (z), H_{u}^{out} (z)) \\ = z \int_{0}^{\infty} G (g (1, H_{f}^{out} (z), H_{u}^{out} (z) ∣ τ_{i})) d F (τ_{i}) \\ = z \int_{0}^{\infty} G (g (1, H_{*}^{out} (z), H_{*}^{out} (z) ∣ τ_{i})) d F (τ_{i}) \\ = z G (1, H_{*}^{out} (z), H_{*}^{out} (z)); \end{array}

but then

\begin{array}{l} H_{*}^{out} (z) = (1 - T) H_{f}^{out} (z) + H_{u}^{out} (z) \\ = z [(1 - T) G_{f} (1, H_{*}^{out} (z), H_{*}^{out} (z)) + T G_{u} (1, H_{*}^{out} (z), H_{*}^{out} (z))] \\ = z G_{o} (1, H_{*}^{out} (z), H_{*}^{out} (z)) . \end{array}

As a corollary, we find that the analysis in Ref. [1] can be corrected if we let G₀(x) = G(1, x, x) and G₁(x) = G_o(1, x, x) (see equations (13) and (14) in [1]). ■

Lemma 5 $H_{*}^{in} (z) \leq H_{*}^{out} (z)$ for all z ∈ [0, 1].

Proof. Since $G^{'}$ is convex,

\begin{array}{l} H_{*}^{out} (z) = z G_{o} (1, H_{*}^{out} (z), H_{*}^{out} (z)) \\ = \frac{z}{G^{'} (1)} \int_{0}^{\infty} G^{'} (1 - T_{τ_{i}} + T_{τ_{i}} H_{*}^{out} (z)) d F (τ_{i}) \\ \geq \frac{z}{G^{'} (1)} G^{'} (1 - T + H_{*}^{out} (z)) \end{array}

by Jensen’s inequality. Equality holds only if z = 0, $H_{*}^{out} (z) = 1, G^{'}$ is constant, or τ_i is constant. Since $H_{*}^{in} (z)$ is the solution to

H_{*}^{in} (z) = \frac{z}{G^{'} (1)} G^{'} (1 - T + T H_{*}^{in} (z)),

we must have $H_{*}^{out} (z) \geq H_{*}^{in} (z)$ . This can be seen by fixing z and considering the graphs of $y = z G_{o} (1, x, x)$ and $y = \frac{z}{G^{'} (1)} G^{'} (1 - T + T x)$ . $H_{*}^{out} (z)$ is the value of x at which $y = z G_{o} (1, x, x)$ intersects the line y = x. $H_{*}^{in} (z)$ is the value of x at which $y = \frac{z}{G^{'} (1)} G^{'} (1 - T + T x)$ intersects the line y = x. Since $z G_{o} (1, x, x) \geq \frac{z}{G^{'} (1)} G^{'} (1 - T + T x)$ , We must have $H_{*}^{out} (z) \geq H_{*}^{in} (z)$ . ■

Theorem 6 $H^{in} (z) \leq H^{out} (z)$ for all z ∈ [0, 1]. Equality holds only when z = 0, z = 1 and the percolation network is subcritical, or the infectious period is constant.

Proof. From equation (14c),

\begin{array}{l} H^{in} (z) = z G (H_{*}^{in} (z), 1, H_{*}^{in} (z)) \\ = z G (1 - T + T H_{*}^{in} (z)) . \end{array}

From equation (10),

\begin{array}{l} H^{out} (z) = z G (1, H_{*}^{out} (z), H_{*}^{out} (z)) \\ = z \int_{0}^{\infty} G (1 - T_{τ_{i}} + T_{τ_{i}} H_{*}^{out} (z)) d F (τ_{i}) \\ \geq z G (1 - T + T H_{*}^{out} (z)) \\ \geq z G (1 - T + T H_{*}^{in} (z)) . \end{array}

The first inequality follows from the convexity of $G$ and Jensen’s inequality. The second follows from the fact that $G$ is nondecreasing and $H_{*}^{out} (z) \geq H_{*}^{in} (z)$ . Equality holds in both inequalities only if z = 0, $G$ is constant, $H_{*}^{in} (z) = 1$ , or τ_i is constant. ■

Since the probability of an epidemic is 1 − H^out(1) and the final size of an epidemic is 1 − Hⁱⁿ(1), it follows that the probability of an epidemic is always less than or equal to its final size in the SIR model from [1]. When the infectious period is constant, H^out(z) = Hⁱⁿ(z) for all z ∈ [0, 1], so the in- and out-component size distributions are identical and the probability and final size of an epidemic are equal. When the infectious period has a nondegenerate distribution and the percolation network is subcritical, H^out(z) > Hⁱⁿ(z) for all z ∈ (0, 1) (so the in- and out-components have dissimilar size distributions) but H^out(1) = Hⁱⁿ(1) = 1 (so the probability and final size of an epidemic are both zero). If the network is supercritical and the infectious period is nonconstant, H^out(z) > Hⁱⁿ(z) for all z ∈ [0, 1], so in- and out-components have dissimilar size distributions and the probability of an epidemic is strictly less than its final size.

Since the bond percolation model predicts the distribution of in-component sizes, it cannot predict the distribution of out-component sizes or the probability of an epidemic for any SIR model with a nonconstant infectious period. However, it does establish an upper limit for the probability of an epidemic in an SIR model. We have recently become aware of independent work [19] that shows similar results for more general sources of variation in infectiousness and susceptibility in a model where these are independent and uses Jensen’s inequality to establish a lower bound for the probability and final size of an epidemic. The lower bound corresponds to a site percolation model with site occupation probability T, which is the model that minimized the probability of no transmission in the Introduction.

6 Simulations

In a series of simulations, the bond percolation model correctly predicted the mean outbreak size (below the epidemic threshold), the epidemic threshold, and the final size of an epidemic [1]. In Section 4, we showed that the epidemic percolation network generates the same predictions for these quantities.

In Newman’s simulations, the contact network had a power-law degree distribution with an exponential cutoff around degree κ, so the probability that a node has degree k is proportional to k^−αe^−1/κ for all k ≥ 1. This distribution was chosen to reflect degree distributions observed in real-world networks [1, 13-15]. The probability generating function for this degree distribution is

G (z) = \frac{{Li}_{α} (z e^{- 1 / κ})}{{Li}_{α} (e^{- 1 / κ})},

where Li_α(z) is the α-polylogarithm of z. In [1], Newman used α = 2.

In our simulations, we retained the same contact network but used a contact model adapted from the counterexample in the Introduction. We fixed β_ij = β₀ = 0.1 for all ij and let τ_i = 1 with probability 0.5 and τ_i = τ_max > 1 with probability 0.5 for all i. The predicted probability of an outbreak of size one is G(1, 0, 0) in the epidemic percolation network and G(0, 1, 0) in the bond percolation model. The predicted probability of an epidemic is 1 − H^out(1) in the epidemic percolation network and 1 − Hⁱⁿ(1) in the bond percolation model. In all simulations, an epidemic was declared when at least 100 persons were infected (this low cutoff produces a slight overestimate of the probability of an epidemic in the simulations, favoring the bond percolation model). Figures 2 and 3 show that percolation networks accurately predicted the probability of an outbreak of size one for all (n, κ, τ_max) combinations, whereas the bond percolation model consistently underestimated these probabilities. Figures 4 and 5 show that the bond percolation model significantly overestimated the probability of an epidemic for all (n, κ, τ_max) combinations. The percolation network predictions were far closer to the observed values.

The predicted and observed probabilities of an outbreak of size one on a contact network with κ = 10 as a function of τ_max. Models were run for τ_max = 10, 12, 14, 16, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100. Each observed value is based on 10, 000 simulations in a population of size n. For n = 10, 000, 1, 000 simulations were conducted on each of 10 contact networks. For n = 1, 000, 100 simulations were conducted on each of 100 contact networks.

The predicted and observed probabilities of an outbreak of size one on a contact network with κ = 20 as a function of τ_max. Models were run for τ_max = 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 25, 30, 40, 50, 60, 70, 80, 90, and 100. Each observed value is based on 10, 000 simulations in a population of size n. For n = 10, 000, 1000 simulations were conducted on each of ten contact networks. For n = 1000, 100 simulations were conducted on each of 100 contact networks.

The predicted and observed probabilities of an epidemic on a contact network with κ = 10 as a function of τ_max. Models were run for τ_max = 10, 12, 14, 16, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, and 100. Each observed value is based on 10, 000 simulations in a population of size n. For n = 10, 000, 1000 simulations were conducted on each of ten contact networks. For n = 1000, 100 simulations were conducted on each of 100 contact networks.

The predicted and observed probabilities of an epidemic on a contact network with κ = 20 as a function of τ_max. Models were run for τ_max = 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 25, 30, 40, 50, 60, 70, 80, 90, and 100. Each observed value is based on 10, 000 simulations in a population of size n. For n = 10, 000, 1000 simulations were conducted on each of ten contact networks. For n = 1000, 100 simulations were conducted on each of 100 contact networks.

7 Discussion

For any time-homogeneous SIR epidemic model, the problem of analyzing its final outcomes can be reduced to the problem of analyzing the components of an epidemic percolation network. The distribution of outbreak sizes starting from a node i is identical to the distribution of its out-component sizes in the probability space of percolation networks. Calculating this distribution may be extremely difficult for a finite population, but it simplifies enormously in the limit of a large population for many SIR models. For a single randomly chosen imported infection in the limit of a large population, the distribution of self-limited outbreak sizes is equal to the distribution of small out-component sizes and the probability of an epidemic is equal to the relative size of the GIN. For any finite set of imported infections, the relative final size of an epidemic is equal to the relative size of the GOUT.

In this paper, we used epidemic percolation networks to reanalyze the SIR epidemic model studied in [1]. The mapping to a bond percolation model correctly predicts the distribution of in-component sizes, the critical transmissibility, and the final size of an epidemic. However, it fails to predict the correct distribution of outbreak sizes and overestimates the probability of an epidemic when the infectious period is nonconstant. Since all known infectious diseases have nonconstant infectious periods and heterogeneity in infectiousness has important consequences in real epidemics [20-22], it is important to be able to analyze such models correctly.

The exact finite-population isomorphism between a time-homogeneous SIR model and our semi-directed epidemic percolation network is not only useful because it provides a rigorous foundation for the application of percolation methods to a large class of SIR epidemic models (including fully-mixed models as well as network-based models), but also because it provides further insight into the epidemic model. For example, we used the mapping to an epidemic percolation network to show that the distribution of in- and out-component sizes in the SIR model from [1] could be calculated by treating the incoming and outgoing infectious contact processes as separate directed percolation processes, as in [19]. However, in contrast with [19], the semi-directed epidemic percolation network isolates the fundamental role of the GSCC in the emergence of epidemics. The design of interventions to reduce the probability and final size of an epidemic is a central concern of infectious disease epidemiology. In a forthcoming paper, we analyze both fully-mixed and network-based SIR models in which vaccinating those nodes most likely to be in the GSCC is shown to be the most effective strategy for reducing both the probability and final size of an epidemic. If the incoming and outgoing contact processes are treated separately, the notion of the GSCC is lost.

Acknowledgments

This work was supported by the US National Institutes of Health cooperative agreement 5U01GM076497 “Models of Infectious Disease Agent Study” (E.K.) and Ruth L. Kirchstein National Research Service Award 5T32AI007535 “Epidemiology of Infectious Diseases and Biodefense” (E.K.), as well as a research grant from the Institute for Quantitative Social Sciences at Harvard University (E.K.). Joel C. Miller’s comments on the proofs in Sections 3 and 4 were extremely valuable, and we are also grateful for the comments of Marc Lipsitch, James H. Maguire, and the anonymous referees of PRE. E.K. would also like to thank Charles Larson and Stephen P. Luby of the Health Systems and Infectious Diseases Division at ICDDR,B (Dhaka, Bangladesh).

A Epidemic percolation networks

It is possible to define epidemic percolation networks for a much larger class of stochastic SIR epidemic models than the one from [1]. First, we specify an SIR model using probability distributions for recovery periods in individuals and times from infection to infectious contact in ordered pairs of individuals. Second, we outline time-homogeneity assumptions under which the epidemic percolation network is well-defined. Finally, we define infection networks and use them to show that the final outcome of the SIR model depends only on the set of imported infections and the epidemic percolation network.

A.1 Model specification

Suppose there is a closed population in which every susceptible person is assigned an index i ∈ {1, …, n}. A susceptible person is infected upon infectious contact, and infection leads to recovery with immunity or death. Each person i is infected at his or her infection time t_i, with t_i = ∞ if i is never infected. Person i is removed (i.e., recovers from infectiousness or dies) at time t_i + r_i, where the recovery period r_i is a random sample from a probability distribution f_i(r). The recovery period r_i may be the sum of a latent period, when i is infected but not yet infectious, and an infectious period, when i can transmit infection. We assume that all infected persons have a finite recovery period. Let S(t) = {i : t_i > t} be the set of susceptible individuals at time t. Let t₍₁₎ ≤ t₍₂₎ ≤ … ≤ t_(n) be the order statistics of t₁, …, t_n, and let i_(k) be the index of the k^th person infected.

When person i is infected, he or she makes infectious contact with person j ≠ i after an infectious contact interval τ_ij. Each τ_ij is a random sample from a conditional probability density f_ij(τ∣r_i). Let τ_ij = ∞ if person i never makes infectious contact with person j, so f_ij(τ∣r_i) has a probability mass concentrated at infinity. Person i cannot transmit disease before being infected or after recovering, so f_ij(τ∣r_i) = 0 for all τ < 0 and all τ ∈ [r_i, ∞). The infectious contact time t_ij = t_i + t_ij is the time at which person i makes infectious contact with person j. If person j is susceptible at time t_ij, then i infects j and t_j = t_ij. If t_ij < ∞, then t_j ≤ t_ij because person j avoids infection at t_ij only if he or she has already been infected.

For each person i, let his or her importation time t_0i be the first time at which he or she experiences infectious contact from outside the population, with t_0i = ∞ if this never occurs. Let F₀(t₀) be the cumulative distribution function of the importation time vector t₀ = (t₀₁, t₀₂, …,t_0n).

A.2 Epidemic algorithm

First, an importation time vector t₀ is chosen. The epidemic begins with the introduction of infection at time t₍₁₎ = min_i(t_0i). Person i₍₁₎ is assigned a recovery period r_i₍₁₎. Every person j ∈ S(t₍₁₎) is assigned an infectious contact time t_i₍₁₎j = t₍₁₎ + τ_i₍₁₎j. We assume that there are no tied infectious contact times less than infinity. The second infection occurs at $t_{(2)} = {min}_{j \in S (t_{(1)})} min (t_{0 j}, t_{i (1) j})$ , which is the time of the first infectious contact after person i₍₁₎ is infected. Person i₍₂₎ is assigned a recovery period r_i₍₂₎. After the second infection, each of the remaining susceptibles is assigned an infectious contact time $t_{i_{(2)} j} = t_{(2)} + τ_{i_{(2)} j}$ . The third infection occurs at $t_{(3)} = {min}_{j \in S (t_{(2)})} min (t_{0 j}, t_{i_{(1)} j}, t_{i_{(2)} j}),$ and so on. After k infections, the next infection occurs at $t_{(k + 1)} = {min}_{j \in S (t_{(k)})} min (t_{0 j}, t_{i_{(1)} j}, …, t_{i_{(k)} j})$ . The epidemic stops after m infections if and only if t_(m+1) = ∞.

A.3 Time homogeneity assumptions

In principle, the above epidemic algorithm could allow the infectious period and outgoing infectious contact intervals for individual i to depend on all information about the epidemic available up to time t_i. In order to generate an epidemic percolation network, we must ensure that the joint distributions of recovery periods and infectious contact intervals are defined a priori. The following restrictions are sufficient:

We assume that the distribution of the recovery period vector r = (r₁, r₂, …, r_n) does not depend on the importation time vector t₀, the contact interval matrix τ = [τ_ij], or the history of the epidemic.
We assume that the distribution of the infectious contact interval matrix τ does not depend on t₀ or the history of the epidemic.

With these time-homogeneity assumptions, the cumulative distributions functions F(r) of recovery periods and F(τ∣r) of infectious contact intervals are completely specified a priori. Given r and τ, the epidemic percolation network is a semi-directed network in which there is a directed edge from i to j iff τ_ij < ∞ and τ_ji = ∞, a directed edge from j to i iff τ_ij = ∞ and τ_ji < ∞, and an undirected edge between i and j iff τ_ij < ∞ and τ_ij < ∞. The entire time course of the epidemic is determined by r, τ, and t₀. However, its final size depends only on the set {i : t_0i < ∞} of possible imported infections and the epidemic percolation network corresponding to τ. In order to prove this, we first define the infection network, which records the chain of infection from a single realization of the epidemic model.

A.4 Infection networks

Let υ_i be the index of the person who infected person i, with υ_i = 0 for imported infections and υ_i = ∞ for uninfected nodes. If tied finite infectious contact times are possible, then choose υ_i from all j such that t_ji = t_i. The infection network has the edge set {υ_ii : 0 < υ_i < ∞}. It is a purely directed subgraph of the epidemic percolation network corresponding to τ because τ_{τ_i}i < ∞ for every edge υ_ij. Since each node has at most one incoming edge, all components of the infection network are trees or isolated nodes. Every imported case is either the root node of a tree or an isolated node. Every person infected through transmission within the population is a nonroot node in a tree. Uninfected persons are isolated nodes.

The infection network can be represented by a vector v = (υ₁, ..,υ_n), as in Ref. [23]. If υ_j = 0, then t_j = t_0j. If 0 < υ_j < ∞, then j is in a component of the infection network with a root node imp_j and its infection time is

t_{j} = t_{{imp}_{j}} + \sum_{k = 1}^{m} τ_{i_{k} j_{k}},

where the edges i₁j₁, …, i_mj_m form a directed path from imp_j to j. This path is unique because all nontrivial components of the infection network are trees. If υ_j = ∞, then t_j = ∞. The removal time of each node i is t_i + r_i. If there is more than one possible infection network, they must all be consistent with (t₁, …, t_n) by definition of υ_i. Therefore, the entire time course of the epidemic is determined by the importation time vector t₀, the recovery period vector r, and the infectious contact interval matrix τ.

A.5 Final outcomes and epidemic percolation networks

Theorem 7 In an epidemic with infectious contact interval matrix τ, a node is infected if and only if it is in the out-component of a node i with t_0i < ∞ in the percolation network. (Equivalently, a node is infected if and only if its in-component includes a node i with t_0i < ∞.) Therefore, the final outcome of the SIR model depends only on the set of imported infections and the epidemic percolation network corresponding to τ.

Proof. Suppose that person j is in the out-component of a node i with t_0i < ∞ in the epidemic percolation network corresponding to τ. Then there is a sequence i₁j₁, …, i_mj_m such that i₁ = i, j_m = j, and τ_{i_kj_k} < ∞ for 1 ≤ k ≤ m, so

t_{j} \leq t_{0 i} + \sum_{k = 1}^{m} τ_{i_{k} j_{k}} < \infty,

and j must be infected during the epidemic. Now suppose that t_j < ∞. Then there exists an imported case i and a sequence i₁j₁, …, i_mj_m such that i₁ = i, j_m = j, and

t_{j} = t_{i} + \sum_{k = 1}^{m} τ_{i_{k} j_{k}} .

Since t_j < ∞, it follows that τ_{i_kj_k} < ∞ for all k. But then the epidemic percolation network corresponding to τ has an edge with the proper direction or an undirected edge between i_k and j_k for all k, so j is in the out-component of i. ■

By the law of iterated expectation (conditioning on τ), this result implies that the distribution of outbreak sizes caused by the introduction of infection to node i is identical to the distribution of his or her out-component sizes in the probability space of epidemic percolation networks. Furthermore, the probability that person i gets infected in an epidemic is equal to the probability that his or her in-component contains at least one imported infection. This isomorphism holds in any finite population. In the limit of a large population, the probability that node i is infected in an epidemic is equal to the probability that he or she is in the GOUT and the probability that an epidemic results from the infection of node i is equal to the probability that he or she is in the GIN. This logic can be extended to predict the mean size of self-limited outbreaks and the probability and final size of an epidemic for outbreaks started by any given set of imported infections.

References

1.Newman MEJ. Spread of epidemic disease on networks. Physical Review E. 2002;66:016128. doi: 10.1103/PhysRevE.66.016128. [DOI] [PubMed] [Google Scholar]
2.Newman MEJ, Strogatz SH, Watts DJ. Random graphs with arbitrary degree distributions and their applications. Physical Review E. 2001;64:026118. doi: 10.1103/PhysRevE.64.026118. [DOI] [PubMed] [Google Scholar]
3.Boguñá M, Serrano MA. Generalized percolation in random directed networks. Physical Review E. 2005;72:016106. doi: 10.1103/PhysRevE.72.016106. [DOI] [PubMed] [Google Scholar]
4.Meyers LA, Newman MEJ, Pourbohloul B. Predicting epidemics on directed contact networks. Journal of Theoretical Biology. 2006;240(3):400–418. doi: 10.1016/j.jtbi.2005.10.004. [DOI] [PubMed] [Google Scholar]
5.Kuulasmaa K. The spatial general epidemic and locally dependent random graphs. Journal of Applied Probability. 1982;19(4):745–758. [Google Scholar]
6.Kenah E, Robins J. Network-based analysis of stochastic SIR epidemic models with random and proportionate mixing. 2007 doi: 10.1016/j.jtbi.2007.09.011. arXiv:q-bio.QM/0702027. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Weiner J. Graph structure in the Web. Computer Networks. 2000;33:309–320. [Google Scholar]
8.Dorogovtsev SN, Mendes JFF, Sakhunin AN. Giant strongly connected component of directed networks. Physical Review E. 2001;64:025101(R). doi: 10.1103/PhysRevE.64.025101. [DOI] [PubMed] [Google Scholar]
9.Schwartz N, Cohen R, ben-Avraham D, Barabási A-L, Havlin S. Percolation in directed scale-free networks. Physical Review E. 2002;66:015104(R). doi: 10.1103/PhysRevE.66.015104. [DOI] [PubMed] [Google Scholar]
10.Andersson H, Britton T. Stochastic Epidemic Models and Their Statistical Analysis (Lecture Notes in Statistics v.151) New York: Springer-Verlag; 2000. [Google Scholar]
11.Diekmann O, Heesterbeek JAP. Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation. Chichester (UK): John Wiley & Sons; 2000. [Google Scholar]
12.Sander LM, Warren CP, Sokolov IM, Simon C, Koopman J. Percolation on heterogeneous networks as a model for epidemics. Mathematical Biosciences. 2002;180:293–305. doi: 10.1016/s0025-5564(02)00117-7. [DOI] [PubMed] [Google Scholar]
13.Albert R, Barabási A-L. Statistical mechanics of complex networks. Reviews of Modern Physics. 2002;74:47–97. [Google Scholar]
14.Newman MEJ. The structure and function of complex networks. SIAM Reviews. 2003;45(2):167–256. [Google Scholar]
15.Newman MEJ. Random graphs as models of networks. In: Bornholdt S, Schuster HG, editors. Handbook of Graphs and Networks. Berlin: Wiley-VCH; 2003. pp. 35–68. [Google Scholar]
16.Newman MEJ, Barabási A-L, Watts DJ. The Structure and Dynamics of Networks (Princeton Studies in Complexity) Princeton: Princeton University Press; 2006. [Google Scholar]
17.Andersson H. Limit theorems for a random graph epidemic model. The Annals of Applied Probability. 1998;8(4):1331–1349. [Google Scholar]
18.Kuulasmaa K, Zachary S. On spatial general epidemics and bond percolation processes. Journal of Applied Probability. 1984;21(4):911–914. [Google Scholar]
19.Miller J. Predicting the size and probability of epidemics in populations with heterogeneous infectiousness and susceptibility. Physical Review E. 2007;76:010101(R). doi: 10.1103/PhysRevE.76.010101. [DOI] [PubMed] [Google Scholar]
20.Lipsitch M, Cohen T, Cooper B, Robins JM, Ma S, James L, Gopalakrishna G, Chew SK, Tan CC, Samore MH, Fishman D, Murray M. Transmission dynamics and control of Severe Acute Respiratory Syndrome. Science. 2003;300:1966–1970. doi: 10.1126/science.1086616. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Riley S, Fraser C, Donnelly CA, Ghani AC, Abu-Raddad LJ, Hedley AJ, Leung GM, Ho L-M, Lam T-H, Thach TQ, Chau P, Chan K-P, Lo S-V, Leung P-Y, Tsang T, Ho W, Lee K-H, Lau EMC, Ferguson NM, Anderson RM. Transmission dynamics of the etiological agent of SARS in Hong Kong: Impact of public health interventions. Science. 2003;300:1961–1966. doi: 10.1126/science.1086478. [DOI] [PubMed] [Google Scholar]
22.Dye C, Gay N. Modeling the SARS epidemic. Science. 2003;300:1884–1885. doi: 10.1126/science.1086925. [DOI] [PubMed] [Google Scholar]
23.Wallinga J, Teunis P. Different epidemic curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures. American Journal of Epidemiology. 2004;160(6):509–516. doi: 10.1093/aje/kwh255. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Newman MEJ. Spread of epidemic disease on networks. Physical Review E. 2002;66:016128. doi: 10.1103/PhysRevE.66.016128. [DOI] [PubMed] [Google Scholar]

[R2] 2.Newman MEJ, Strogatz SH, Watts DJ. Random graphs with arbitrary degree distributions and their applications. Physical Review E. 2001;64:026118. doi: 10.1103/PhysRevE.64.026118. [DOI] [PubMed] [Google Scholar]

[R3] 3.Boguñá M, Serrano MA. Generalized percolation in random directed networks. Physical Review E. 2005;72:016106. doi: 10.1103/PhysRevE.72.016106. [DOI] [PubMed] [Google Scholar]

[R4] 4.Meyers LA, Newman MEJ, Pourbohloul B. Predicting epidemics on directed contact networks. Journal of Theoretical Biology. 2006;240(3):400–418. doi: 10.1016/j.jtbi.2005.10.004. [DOI] [PubMed] [Google Scholar]

[R5] 5.Kuulasmaa K. The spatial general epidemic and locally dependent random graphs. Journal of Applied Probability. 1982;19(4):745–758. [Google Scholar]

[R6] 6.Kenah E, Robins J. Network-based analysis of stochastic SIR epidemic models with random and proportionate mixing. 2007 doi: 10.1016/j.jtbi.2007.09.011. arXiv:q-bio.QM/0702027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Weiner J. Graph structure in the Web. Computer Networks. 2000;33:309–320. [Google Scholar]

[R8] 8.Dorogovtsev SN, Mendes JFF, Sakhunin AN. Giant strongly connected component of directed networks. Physical Review E. 2001;64:025101(R). doi: 10.1103/PhysRevE.64.025101. [DOI] [PubMed] [Google Scholar]

[R9] 9.Schwartz N, Cohen R, ben-Avraham D, Barabási A-L, Havlin S. Percolation in directed scale-free networks. Physical Review E. 2002;66:015104(R). doi: 10.1103/PhysRevE.66.015104. [DOI] [PubMed] [Google Scholar]

[R10] 10.Andersson H, Britton T. Stochastic Epidemic Models and Their Statistical Analysis (Lecture Notes in Statistics v.151) New York: Springer-Verlag; 2000. [Google Scholar]

[R11] 11.Diekmann O, Heesterbeek JAP. Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation. Chichester (UK): John Wiley & Sons; 2000. [Google Scholar]

[R12] 12.Sander LM, Warren CP, Sokolov IM, Simon C, Koopman J. Percolation on heterogeneous networks as a model for epidemics. Mathematical Biosciences. 2002;180:293–305. doi: 10.1016/s0025-5564(02)00117-7. [DOI] [PubMed] [Google Scholar]

[R13] 13.Albert R, Barabási A-L. Statistical mechanics of complex networks. Reviews of Modern Physics. 2002;74:47–97. [Google Scholar]

[R14] 14.Newman MEJ. The structure and function of complex networks. SIAM Reviews. 2003;45(2):167–256. [Google Scholar]

[R15] 15.Newman MEJ. Random graphs as models of networks. In: Bornholdt S, Schuster HG, editors. Handbook of Graphs and Networks. Berlin: Wiley-VCH; 2003. pp. 35–68. [Google Scholar]

[R16] 16.Newman MEJ, Barabási A-L, Watts DJ. The Structure and Dynamics of Networks (Princeton Studies in Complexity) Princeton: Princeton University Press; 2006. [Google Scholar]

[R17] 17.Andersson H. Limit theorems for a random graph epidemic model. The Annals of Applied Probability. 1998;8(4):1331–1349. [Google Scholar]

[R18] 18.Kuulasmaa K, Zachary S. On spatial general epidemics and bond percolation processes. Journal of Applied Probability. 1984;21(4):911–914. [Google Scholar]

[R19] 19.Miller J. Predicting the size and probability of epidemics in populations with heterogeneous infectiousness and susceptibility. Physical Review E. 2007;76:010101(R). doi: 10.1103/PhysRevE.76.010101. [DOI] [PubMed] [Google Scholar]

[R20] 20.Lipsitch M, Cohen T, Cooper B, Robins JM, Ma S, James L, Gopalakrishna G, Chew SK, Tan CC, Samore MH, Fishman D, Murray M. Transmission dynamics and control of Severe Acute Respiratory Syndrome. Science. 2003;300:1966–1970. doi: 10.1126/science.1086616. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Riley S, Fraser C, Donnelly CA, Ghani AC, Abu-Raddad LJ, Hedley AJ, Leung GM, Ho L-M, Lam T-H, Thach TQ, Chau P, Chan K-P, Lo S-V, Leung P-Y, Tsang T, Ho W, Lee K-H, Lau EMC, Ferguson NM, Anderson RM. Transmission dynamics of the etiological agent of SARS in Hong Kong: Impact of public health interventions. Science. 2003;300:1961–1966. doi: 10.1126/science.1086478. [DOI] [PubMed] [Google Scholar]

[R22] 22.Dye C, Gay N. Modeling the SARS epidemic. Science. 2003;300:1884–1885. doi: 10.1126/science.1086925. [DOI] [PubMed] [Google Scholar]

[R23] 23.Wallinga J, Teunis P. Different epidemic curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures. American Journal of Epidemiology. 2004;160(6):509–516. doi: 10.1093/aje/kwh255. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Second look at the spread of epidemics on networks

Eben Kenah

James M Robins

Abstract

1 Introduction

2 Epidemic percolation networks

2.1 Components of semi-directed networks

Figure 1.

2.2 Epidemic percolation networks and epidemics

3 Analysis of the SIR model

3.1 Degree distribution

3.2 Generating functions

3.2.1 Out-components

3.2.2 In-components

3.2.3 Epidemic threshold

3.2.4 Giant strongly-connected component

4 In-components

5 Out-components

6 Simulations

Figure 2.

Figure 3.

Figure 4.

Figure 5.

7 Discussion

Acknowledgments

A Epidemic percolation networks

A.1 Model specification

A.2 Epidemic algorithm

A.3 Time homogeneity assumptions

A.4 Infection networks

A.5 Final outcomes and epidemic percolation networks

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Second look at the spread of epidemics on networks

Eben Kenah

James M Robins

Abstract

1 Introduction

2 Epidemic percolation networks

2.1 Components of semi-directed networks

Figure 1.

2.2 Epidemic percolation networks and epidemics

3 Analysis of the SIR model

3.1 Degree distribution

3.2 Generating functions

3.2.1 Out-components

3.2.2 In-components

3.2.3 Epidemic threshold

3.2.4 Giant strongly-connected component

4 In-components

5 Out-components

6 Simulations

Figure 2.

Figure 3.

Figure 4.

Figure 5.

7 Discussion

Acknowledgments

A Epidemic percolation networks

A.1 Model specification

A.2 Epidemic algorithm

A.3 Time homogeneity assumptions

A.4 Infection networks

A.5 Final outcomes and epidemic percolation networks

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases